Trending Tech

Needle: A 26M Parameter Model That Handles Tool Calling

Huma Shazia13 May 2026 at 5:38 am4 min read

Key Takeaways

Needle is a 26 million parameter model distilled from Gemini 3.1, designed for single-shot function calling
The model runs at 6,000 tokens per second prefill and 1,200 decode speed on Cactus infrastructure
Weights and training data are fully open-source, with local finetuning supported on Mac and PC

What Needle Does

Cactus Compute has released Needle, a 26 million parameter model built specifically for function calling on resource-constrained devices. The company distilled Google's Gemini 3.1 into what they call a "Simple Attention Network" that can run on phones, watches, and smart glasses.

The model is designed for a narrow task: converting natural language queries into structured tool calls. Ask it "What's the weather in San Francisco?" with a weather tool definition, and it returns the correct JSON function call. That's it. No conversation, no reasoning, no general knowledge.

6,000 tokens/sec

Needle's prefill speed on Cactus infrastructure, with 1,200 tokens/sec decode speed

Architecture and Training

Needle uses an encoder-decoder architecture with 12 encoder layers and 8 decoder layers. The model has a dimension of 512, uses 8 attention heads with 4 key-value heads (grouped query attention), and a BPE vocabulary of 8,192 tokens.

The encoder processes the text query. The decoder handles tool definitions and generates the function call output. Cross-attention connects the two. The design skips feed-forward networks entirely, relying only on attention and gated residual connections.

Pretraining took 27 hours on 16 TPU v6e chips, covering 200 billion tokens. Post-training on the function calling dataset used 2 billion tokens and finished in 45 minutes.

How It Compares

Cactus claims Needle beats several larger models on single-shot function calling: FunctionGemma at 270M parameters, Qwen at 600M, Granite at 350M, and LFM2.5 at 350M. All of these are 10x to 23x larger than Needle.

The company is upfront about limitations. Those larger models "have more scope/capacity and excel in conversational settings." Needle does one thing. If you need multi-turn conversation, general Q&A, or anything beyond structured tool calls, look elsewhere.

Small models can also be inconsistent. The team recommends testing with your specific tools and finetuning as needed.

Getting Started

The quickstart is straightforward. Clone the repository, run setup, and launch the playground UI. The web interface at localhost:7860 lets you test custom tools and finetune with your own data.

bash

git clone https://github.com/cactus-compute/needle.git
cd needle && source ./setup
needle playground

For Python integration, the API is minimal. Load the checkpoint, initialize the model and tokenizer, then call generate with your query and tool definitions.

python

from needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer

params, config = load_checkpoint("checkpoints/needle.pkl")
model = SimpleAttentionNetwork(config)
tokenizer = get_tokenizer()

result = generate(
    model, params, tokenizer,
    query="What's the weather in San Francisco?",
    tools='[{"name":"get_weather","parameters":{"location":"string"}}]',
    stream=False
)
print(result)
# [{"name":"get_weather","arguments":{"location":"San Francisco"}}]

Finetuning on Your Own Data

The playground UI handles the full finetuning workflow: data generation via Gemini, training, evaluation, and bundling the result. For command-line users, pass a JSONL file to the finetune command.

Weights download automatically. The CLI supports single inference, full training runs, pretraining on synthetic data, and checkpoint evaluation.

Why This Matters for Edge AI

Most AI assistants route function calls through cloud APIs. Every request hits a server. That adds latency, requires connectivity, and raises privacy concerns for sensitive queries.

A 26M parameter model can run entirely on-device. Consumer phones have more than enough compute. Even wearables could handle inference at this scale.

The trade-off is capability. Needle won't hold a conversation or answer general questions. It's a specialist. For personal AI assistants that need to control smart home devices, query calendars, or trigger app actions, that specialization might be enough.

Logicity's Take

Frequently Asked Questions

How big is the Needle model?

Needle has 26 million parameters, making it roughly 10x smaller than comparable function-calling models like FunctionGemma (270M) or Qwen (600M).

Can I run Needle on my local machine?

Yes. The model can be finetuned locally on Mac or PC. Weights download automatically when you run the setup.

What is Needle designed to do?

Needle handles single-shot function calling. It converts natural language queries into structured JSON tool calls. It does not support multi-turn conversation or general Q&A.

Is Needle open source?

Yes. The weights are available on Hugging Face under Cactus-Compute/needle, and the dataset generation code is also open.

How fast does Needle run?

On Cactus infrastructure, Needle achieves 6,000 tokens per second for prefill and 1,200 tokens per second for decode.

ℹ️

Need Help Implementing This?

Source: Hacker News: Best

Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

7 Apr 2026

Trending Tech·10 min

Apple's App Store Empire Under Siege: The Battle for the Future of Tech

The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

7 Apr 2026

Trending Tech·8 min

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself

The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.

7 Apr 2026

Also Read

Gaming·4 min

Sid Meier's Railroads Deserves a Modern Remake

PC Gamer's archive dive resurfaces a 2009 love letter to Sid Meier's Railroads!, the 2006 train business sim that was 'cruelly ignored upon release.' Nearly 20 years later, the game still has 108 concurrent Steam players, and fans argue it's overdue for the same remake treatment Firaxis gave other Meier classics.

Huma Shazia·13 May 2026

Hacks & Workarounds·4 min

5 Hands-Free Work Lights That Make Repair Jobs Easier

Holding a flashlight in your teeth while working under a sink is nobody's idea of fun. These five cordless work lights from Ryobi, Milwaukee, DeWalt, Makita, and Ridgid hang, stick, or prop themselves up so both hands stay free for the actual repair.

Manaal Khan·13 May 2026

Trending Tech·4 min

Samsung Strike Looms: Union Rejects Pay Deal After Talks Fail

Samsung Electronics and its South Korean labor union have failed to reach a pay agreement after marathon negotiations. The union plans an 18-day strike starting May 21, threatening production of AI chips and other semiconductors.

Huma Shazia·13 May 2026

Needle: A 26M Parameter Model That Handles Tool Calling

Key Takeaways

What Needle Does

Architecture and Training

How It Compares

Getting Started

Finetuning on Your Own Data

Why This Matters for Edge AI

Logicity's Take

Frequently Asked Questions

Need Help Implementing This?

Related Articles

Robotaxi Companies Are Hiding How Often Humans Take the Wheel

Wisconsin Governor Throws a Wrench in Age Verification Plans

Apple's App Store Empire Under Siege: The Battle for the Future of Tech

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself

Also Read

Sid Meier's Railroads Deserves a Modern Remake

5 Hands-Free Work Lights That Make Repair Jobs Easier

Samsung Strike Looms: Union Rejects Pay Deal After Talks Fail