---
title: "Ultimate Guide: Run GLM-OCR Locally on MacBook Fast"
slug: "run-glm-ocr-macbook-ollama"
published: "2026-02-10"
updated: "2026-04-06"
validated: "2026-02-15"
categories:
  - "AI"
tags:
  - "GLM-OCR"
  - "Ollama"
  - "local OCR mac"
  - "run glm-ocr on macbook"
  - "num_ctx 16384"
  - "OpenAI-compatible API"
  - "Homebrew"
  - "M1 Pro OCR"
  - "document OCR"
  - "python standard library"
  - "vision OCR model"
llm-intent: "reference"
audience-level: "beginner"
framework-versions:
  - "ollama"
  - "glm-ocr"
  - "homebrew"
  - "python (standard library)"
  - "openai python sdk"
status: "stable"
llm-purpose: "GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…"
llm-prereqs:
  - "Access to Ollama"
  - "Access to GLM-OCR"
  - "Access to Homebrew"
  - "Access to Python (standard library)"
  - "Access to OpenAI Python SDK"
llm-outputs:
  - "Completed outcome: GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…"
---

**Summary Triples**
- (Ultimate Guide: Run GLM-OCR Locally on MacBook Fast, focuses-on, GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…)
- (Ultimate Guide: Run GLM-OCR Locally on MacBook Fast, category, general)

### {GOAL}
GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…

### {PREREQS}
- Access to Ollama
- Access to GLM-OCR
- Access to Homebrew
- Access to Python (standard library)
- Access to OpenAI Python SDK

### {STEPS}
1. Install Ollama via Homebrew
2. Start and verify Ollama service
3. Pull the GLM-OCR model
4. Set num_ctx to avoid crashes
5. Quick CLI test with an image
6. Call the Ollama API from Python
7. Use OpenAI-compatible endpoint if desired
8. Prompt formats and structured extraction

<!-- llm:goal="GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…" -->
<!-- llm:prereq="Access to Ollama" -->
<!-- llm:prereq="Access to GLM-OCR" -->
<!-- llm:prereq="Access to Homebrew" -->
<!-- llm:prereq="Access to Python (standard library)" -->
<!-- llm:prereq="Access to OpenAI Python SDK" -->
<!-- llm:output="Completed outcome: GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…" -->

# Ultimate Guide: Run GLM-OCR Locally on MacBook Fast
> GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…
Matija Žiberna · 2026-02-10

I spent an afternoon setting up GLM-OCR on RunPod Serverless with vLLM, custom Dockerfiles, CUDA version mismatches, and RunPod handler scripts. Then I realized the model is only 0.9B parameters and uses 2.5GB of memory. It runs on a MacBook.

If you just need document OCR for development, testing, or even light production use, you do not need cloud GPUs. This guide shows you how to go from zero to a working OCR API on your Mac in about five minutes. The same steps work for most models in the Ollama library.

## Install Ollama

Ollama is a tool for running language models locally. It handles model downloads, quantization, and serves an API that is compatible with the OpenAI format. On macOS, install it with Homebrew.

```bash
brew install ollama
```

Start the Ollama service in the background so it runs automatically on login.

```bash
brew services start ollama
```

Ollama is now listening on `http://localhost:11434`. You can verify it is running with a quick health check.

```bash
curl http://localhost:11434/
```

You should see `Ollama is running` in the response.

## Pull the GLM-OCR model

GLM-OCR is a 0.9B parameter vision model from Team GLM, designed specifically for document OCR. It handles text recognition, table extraction, formula parsing, and structured information extraction. The quantized version that Ollama downloads is about 2.2GB.

```bash
ollama pull glm-ocr
```

Once downloaded, confirm the model is available.

```bash
ollama list
```

You should see `glm-ocr:latest` in the output with a size of approximately 2.2GB.

## The context size gotcha

This is the one thing that will trip you up. Ollama defaults to a context size of 4096 tokens, which is not enough for processing images. When GLM-OCR tries to encode an image with the default context, you get a cryptic crash.

```
GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
```

The fix is to set `num_ctx` to at least 16384 when making requests. I will show this in every example below so you do not have to debug it yourself.

## Test from the command line

The simplest way to test is with Ollama's built-in CLI. Drag an image file into your terminal after the prompt.

```bash
ollama run glm-ocr "Text Recognition: ./path/to/your/document.png"
```

For a quick test, download a sample image first.

```bash
curl -sL -o /tmp/receipt.jpg "https://upload.wikimedia.org/wikipedia/commons/0/0b/ReceiptSwiss.jpg"
ollama run glm-ocr "Text Recognition: /tmp/receipt.jpg"
```

The model should return the text content of the receipt, including items, prices, and totals.

## Use the API with Python

For integration into your own applications, Ollama serves an API on port 11434. Here is a complete working example that sends an image and gets back the recognized text.

```python
# File: test_ocr.py
import base64
import json
import urllib.request


def ocr_image(image_path, prompt="Text Recognition:"):
    with open(image_path, "rb") as f:
        img_b64 = base64.b64encode(f.read()).decode()

    data = json.dumps({
        "model": "glm-ocr",
        "messages": [
            {
                "role": "user",
                "content": prompt,
                "images": [img_b64]
            }
        ],
        "stream": False,
        "options": {"num_ctx": 16384}
    }).encode()

    req = urllib.request.Request(
        "http://localhost:11434/api/chat",
        data=data,
        headers={"Content-Type": "application/json"}
    )
    resp = urllib.request.urlopen(req, timeout=120)
    result = json.loads(resp.read().decode())
    return result["message"]["content"]


if __name__ == "__main__":
    text = ocr_image("/tmp/receipt.jpg")
    print(text)
```

Run it with `python3 test_ocr.py`. On an M1 Pro, expect about 40-50 seconds for image processing and a few seconds for text generation. The `num_ctx: 16384` option in the request is critical. Without it, the model crashes on any non-trivial image.

The script uses only standard library modules so there is nothing extra to install. If you prefer the `requests` library or the official OpenAI Python SDK, those work too since Ollama serves an OpenAI-compatible API.

## Use the OpenAI-compatible API

Ollama also serves an OpenAI-compatible endpoint at `http://localhost:11434/v1`. This means you can use the OpenAI Python SDK or any tool that supports custom API base URLs.

```python
# File: test_ocr_openai.py
import base64
from openai import OpenAI

client = OpenAI(
    api_key="ollama",
    base_url="http://localhost:11434/v1",
)

with open("/tmp/receipt.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="glm-ocr",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{img_b64}"
                    }
                },
                {
                    "type": "text",
                    "text": "Text Recognition:"
                }
            ]
        }
    ],
)

print(response.choices[0].message.content)
```

This requires `pip install openai` but gives you the standard OpenAI interface. If you later move to a cloud-hosted model, you only change the `base_url` and `api_key`.

## Supported prompts

GLM-OCR is not a general-purpose vision model. It responds to specific prompt formats.

For document parsing, use one of these exact strings as the text content:

- `Text Recognition:` extracts raw text from the image
- `Formula Recognition:` extracts mathematical formulas as LaTeX
- `Table Recognition:` extracts table structures

For structured information extraction, provide a JSON schema. The model fills in the values from the document.

```python
prompt = """Please output the information in the image in the following JSON format:
{"name": "", "date": "", "total": "", "items": []}"""

result = ocr_image("/tmp/receipt.jpg", prompt=prompt)
print(result)
```

The model returns a JSON object matching your schema with values extracted from the image. This is particularly useful for invoices, ID cards, and forms where you know the structure upfront.

## Performance on Apple Silicon

On my M1 Pro MacBook, GLM-OCR processes a typical document image in about 40-50 seconds. Most of that time is spent encoding the image. Text generation is fast at around 60 tokens per second.

The model uses about 2.5GB of memory during inference. Any Mac with 8GB or more of unified memory will run it comfortably.

If speed is critical for production workloads, a cloud GPU will process images in 2-3 seconds instead of 40-50. But for development, testing, and low-volume use, running locally saves you from managing infrastructure entirely.

## Running other models

Everything in this guide applies to any model in the Ollama library. To try a different OCR or vision model, just swap the model name.

```bash
ollama pull llama3.2-vision
ollama run llama3.2-vision "Describe this image: ./photo.jpg"
```

The API calls are identical. Change the `model` field in your requests and everything else stays the same.

## Wrapping up

GLM-OCR runs locally on a MacBook with Ollama in about five minutes of setup. Install Ollama, pull the model, set `num_ctx` to 16384 so it does not crash on images, and you have a working OCR API on localhost. No cloud accounts, no Docker, no GPU drivers.

The model handles text, tables, formulas, and structured extraction well for English and Chinese documents. For other languages, you will want a different model since GLM-OCR is bilingual only.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

## LLM Response Snippet
```json
{
  "goal": "GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…",
  "responses": [
    {
      "question": "What does the article \"Ultimate Guide: Run GLM-OCR Locally on MacBook Fast\" cover?",
      "answer": "GLM-OCR: Install Ollama on macOS, pull the 0.9B model, set num_ctx=16384 to prevent crashes, and run a local OpenAI-compatible OCR API — follow the quick…"
    }
  ]
}
```