About the Author

Lee Calcote

Lee Calcote is an innovative product and technology leader, passionate about empowering engineers and enabling organizations. As Founder of Layer5, he is at the forefront of the cloud native movement. Open source, advanced and emerging technologies have been a consistent focus through Calcote’s time at SolarWinds, Seagate, Cisco and Schneider Electric. An advisor, author, and speaker, Calcote is active in the community as a Docker Captain, Cloud Native Ambassador and GSoC, GSoD, and LFX Mentor.

Meshery

Meshery is the world's only collaborative cloud manager.

In our last post in this series, explored Docker Model Runner's OCI-based model management and its performance-centric execution model, we now turn our attention to another critical area for engineers: its API architecture and connectivity options. How do your applications actually talk to the models running locally via Model Runner? The answer lies in a thoughtfully designed API layer, with OpenAI compatibility at its core, and flexible connection methods to suit diverse development scenarios.

For engineers, a well-defined and accessible API is paramount. It dictates the ease of integration, the reusability of existing code, and the overall developer experience when building AI-powered applications.

The Heart of the Engine: llama.cpp and a Pluggable Future

In its initial Beta release, Docker Model Runner's inference capabilities are powered by an integrated engine built on llama.cpp. This open-source project is renowned for its efficient execution of LLMs across various hardware, making it a solid foundation for local inference.

When you interact with Model Runner, you're essentially communicating with this llama.cpp-based server, which runs as a native host process. The API paths often reflect this underlying engine, for example, with endpoints structured under /engines/llama.cpp/v1/... or a more generalized /engines/v1/....
While llama.cpp provides a robust initial backbone, the API path structure (e.g., /engines/...) hints at a potentially pluggable architecture. This is a common design pattern that could allow Docker to integrate other inference engines or model serving technologies in the future. This foresight means Model Runner could evolve to support a wider array of model types, quantization methods, or hardware acceleration frameworks without requiring a fundamental redesign of its API interaction model.

The "Superpower": OpenAI-Compatible API

Perhaps the most strategically significant aspect of Model Runner's API is its OpenAI compatibility. This is a game-changer for several reasons:

  1. Leverage Existing SDKs and Tools: Engineers can use their existing OpenAI SDKs (Python, Node.js, etc.) and a vast ecosystem of compatible tools like LangChain or LlamaIndex with minimal, if any, code changes. This dramatically lowers the barrier to adoption.
  2. Simplified Migration: If you've been developing against OpenAI's cloud APIs, transitioning to local models with Model Runner can often be as simple as changing the baseURL in your client configuration. This seamless switch accelerates local development and testing.
  3. Reduced Learning Curve: There's no need to learn a new, proprietary API. The familiar OpenAI request/response structures for tasks like chat completions (/chat/completions) or embeddings (/embeddings) remain consistent.

This adherence to a de facto industry standard API is a deliberate choice by Docker to maximize interoperability and ease of integration, allowing developers to focus on application logic rather than wrestling with new API paradigms.

Connecting Your Applications: A Multi-Pronged Approach

Docker Model Runner offers several ways for your applications and tools to connect to the local inference engine, providing flexibility for different development setups:

  1. Internal DNS for Containerized Applications (model-runner.docker.internal):
    • How it works: For applications running as Docker containers themselves (e.g., a backend API service), Model Runner provides a stable internal DNS name: http://model-runner.docker.internal.
    • Benefit for Engineers: This is incredibly convenient. Your containerized service can simply target this DNS name to reach the Model Runner API, without needing to know the host's IP address or worry about dynamic port mappings. It simplifies network configuration within your Docker environment.
    • Endpoint Example: http://model-runner.docker.internal/engines/v1/chat/completions
  2. Host TCP Port for Direct Access:
    • How it works: You can configure Model Runner to listen on a specific TCP port on your host machine. This is typically done via a Docker Desktop setting or a command like docker desktop enable model-runner --tcp \<port> (e.g., port 12434).
    • Benefit for Engineers: This allows applications running directly on your host (outside of Docker containers)—such as IDEs, local scripts, or standalone Java applications using Spring AI—to connect to the Model Runner.
    • Endpoint Example: http://localhost:12434/engines/v1/chat/completions
  3. Docker Socket (Advanced/CLI Use):
    • How it works: For direct interactions via the Docker API or for certain CLI scripting scenarios, the Docker socket (/var/run/docker.sock on Linux/macOS) can be used. API calls through the socket might have a specific path prefix (e.g., /exp/vDD4.40/... as seen in early versions).
    • Benefit for Engineers: This offers a lower-level interface, useful for automation scripts or tools that integrate deeply with the Docker daemon.

This multi-faceted approach to connectivity ensures that whether your application is containerized, running natively on the host, or interacting via CLI tools, there's a clear and supported path to communicate with the local AI models managed by Docker Model Runner.

Understanding these API mechanics and connection options is crucial for effectively integrating Docker Model Runner into your development workflows. It allows you to choose the most appropriate method for your specific application architecture and leverage the power of local AI models with ease.
In our next post, we'll explore how Docker Model Runner integrates with Docker Compose, enabling the orchestration of complex, multi-service AI applications locally.

This blog post is based on information about Docker Model Runner, a Beta feature. Features, commands, and APIs are subject to change.

Related Blogs

Layer5, the cloud native management company