Getting Started

Getting Started with Toolio

Toolio is an OpenAI-like HTTP server API implementation that supports structured LLM response generation and reliable tool calling. It's built on the MLX framework for Apple Silicon, making it exclusive to Mac platforms with M1/M2/M3/M4 chips.

Prerequisites

Apple Silicon Mac (M1, M2, M3, or M4)
Note: You can install Toolio on other OSes, for example to use the client library or toolio_request to access a Toolio server. In this case, you will not be able use any features which involve loading LLMs.
Python 3.10 or more recent (tested only on 3.11 or more recent)

To verify you're on an Apple Silicon Mac you can run:

python -c "import platform; assert 'arm64' in platform.platform()"

Installation

Install Toolio using pip:

pip install toolio

For some built-in tools, you'll need additional dependencies:

pip install -Ur requirements-extra.txt

Quick Start

1. Host a Toolio Server

Launch a Toolio server using an MLX-format LLM:

toolio_server --model=mlx-community/Hermes-2-Theta-Llama-3-8B-4bit

This command downloads and hosts the specified model.

2. Make a Basic Request

Use the toolio_request command-line tool:

toolio_request --apibase="http://localhost:8000" --prompt="I am thinking of a number between 1 and 10. Guess what it is."

3. Use Structured Output

Constrain the LLM's output using a JSON schema:

export LMPROMPT='Which countries are mentioned in the sentence "Adamma went home to Nigeria for the hols"? Your answer should be only JSON, according to this schema: {json_schema}'
export LMSCHEMA='{"type": "array", "items": {"type": "object", "properties": {"name": {"type": "string"}, "continent": {"type": "string"}}}}'
toolio_request --apibase="http://localhost:8000" --prompt=$LMPROMPT --schema=$LMSCHEMA

4. Tool Calling

Use built-in or custom tools:

toolio_request --apibase="http://localhost:8000" --tool=toolio.tool.math.calculator --loglevel=DEBUG \
--prompt='Usain Bolt ran the 100m race in 9.58s. What was his average velocity?'

5. Python API Usage

Use Toolio directly in Python:

import asyncio
from toolio.llm_helper import model_manager, extract_content

toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')

async def say_hello(tmm):
    msgs = [{"role": "user", "content": "Hello! How are you?"}]
    async for chunk in extract_content(tmm.complete(msgs)):
        print(chunk, end='')

asyncio.run(say_hello(toolio_mm))

Next Steps

Check out the demo directory for more examples
Explore creating custom tools
Learn about LLM-specific flows and flags