Global network
Cloud Flare, the leading content delivery network and cloud security platform, wants to make AI accessible to developers. It added GPU-based infrastructure and model serving capabilities to its edge network, making cutting-edge core models available to the general public. Any developer can access Cloudflare’s AI platform with a simple REST API call.
Cloudflare introduced Workers, a serverless edge computing platform, in 2017. Developers can use this serverless platform to create JavaScript Service Workers that run directly in Cloudflare edge locations around the world. With a Worker, a developer can modify a site’s HTTP requests and responses, make parallel requests, and even respond directly from the edge. Cloudflare Workers uses an API similar to the W3C Service Workers standard.
The rise of generative AI has prompted Cloudflare to increase its AI capabilities for its workers. The platform has three new elements to support AI inference:
- Workers AI runs on NVIDIA GPUs within Cloudflare’s global network, enabling the serverless model for AI. Users only pay for what they use, allowing them to spend less time managing infrastructure and more time on their applications.
- Vectorize, a vector database, enables simple, fast and cost-effective vector indexing and storage, supporting use cases that require access not only to operational models but also to custom data.
- AI Gateway allows organizations to cache, throttle, and monitor their AI deployments regardless of hosting environment.
Cloudflare has partnered with NVIDIA, Microsoft, Hugging Face, Databricks and Meta to bring GPU infrastructure and core models to its advantage. The platform also hosts integration templates for converting text to vectors. The Vectorize database can be used to store, index, and query vectors to add context to LLMs to reduce hallucinations in responses. AI Gateway provides observability, rate limiting, and frequent query caching, reducing costs while improving application performance.
The template catalog for Workers AI offers the latest basic templates and some of the best. From Meta’s Llama 2 to Stable Diffusion XL to Mistral 7B, it has everything developers need to create modern applications powered by generative AI.
Model catalog
Behind the scenes, Cloudflare uses ONNX Runtime, an open neural network exchange runtime, an open source project led by Microsoft, to optimize model execution in resource-constrained environments. This is the same technology that Microsoft relies on to run basic models on Windows.
Although developers can use JavaScript to write AI inference code and deploy it to Cloudflare’s edge network, it is possible to invoke the models through a simple REST API using any language. This makes it easier to integrate generative AI into web, desktop, and mobile applications that run in various environments.
In September 2023, Workers AI initially launched with inference capabilities in seven cities. However, Cloudflare’s ambitious goal was to support Workers AI inference in 100 cities by the end of the year, with near-ubiquitous coverage by the end of 2024.
Cloudflare’s footprint
Cloudflare is one of the first CDN and edge networking providers to enhance its edge network with AI capabilities with GPU-based AI Workers, Vector Database, and AI Gateway for management of AI deployment. Partnering with tech giants like Meta and Microsoft, it offers a large catalog of templates and ONNX Runtime optimization.