Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI

807 points by lairv a day ago on hackernews | 212 comments

Announcement

We are happy to announce that ggml.ai (the founding team of llama.cpp) are joining Hugging Face in order to keep future AI truly open.

Georgi and team are joining HF with the goal of scaling and supporting the ggml/llama.cpp community as Local AI continues to make exponential progress in the coming years.

Summary / Key-points

The ggml-org projects remain open and community driven as always
The ggml team continues to lead, maintain and support full-time the ggml and llama.cpp libraries and related open-source projects
The new partnership ensures long-term sustainability of the projects and will help foster new opportunities for users and contributors
Additional focus will be dedicated on improving user experience and integration with the Hugging Face transformers library for improved model support

Why this change?

Since its foundation in 2023, the core mission of ggml.ai has continuously been to support the development and the adoption of the ggml machine learning library. Over the past 3 years, the small team behind the company has been doing its best to grow the open-source developer community and to help establish ggml as the definitive standard for efficient local AI inference. This was achieved through strong collaboration with individual contributors, as well as with partnerships with model providers and independent hardware vendors. As a result, today llama.cpp has become the fundamental building block in countless projects and products, enabling private and easily-accessible AI on consumer hardware.

Throughout this development, Hugging Face stood out as the strongest and most supportive partner of this initiative. During the course of the last couple of years, HF engineers (notably @ngxson and @allozaur) have:

Contributed several core functionalities to ggml and llama.cpp
Built a solid inference server with polished user interface
Introduced multi-modal support to llama.cpp
Integrated llama.cpp into the Hugging Face Inference Endpoints
Improved compatibility of the GGUF file format with the Hugging Face platform
Implemented multiple model architectures into llama.cpp
Helped ggml projects with general maintenance, PR reviews and more

The teamwork between our teams has always been smooth and efficient. Both sides, as well as the community, have benefited from these joint efforts. It only makes sense to formalize this collaboration and make it stronger in the future.

What will change for `ggml`/`llama.cpp`, the open source project and the community?

Not much – Georgi and team will continue to dedicate 100% of their time maintaining ggml/llama.cpp. The community will continue to operate fully autonomously and make technical and architectural decisions as usual. Hugging Face is providing the project with long-term sustainable resources, improving the chances of the project to grow and thrive. The project will continue to be 100% open-source and community driven as it is now. Expect your favorite quants to be supported even faster once a model is released.

Technical focus

Going forward, our joint efforts will be geared towards the following objectives:

Towards seamless “single-click” integration with the transformers library
The transformers framework has established itself as the ‘source of truth’ for AI model definitions. Improving the compatibility between the transformers and the ggml ecosystems is essential for wider model support and quality control.
Better packaging and user experience of ggml-based software
As we enter the phase in which local inference becomes a meaningful and competitive alternative to cloud inference, it is crucial to improve and simplify the way in which casual users deploy and access local models. We will work towards making llama.cpp ubiquitous and readily available everywhere, and continue partnering with great downstream projects.

Long term vision

Our shared goal is to provide the building blocks to make open-source superintelligence accessible to the world over the coming years. We will achieve this together with the growing Local AI community, as we continue to build the ultimate inference stack that runs as efficiently as possible on our devices.