Asking Tiny Questions: Local LLMs for PowerShell Devs

Mar 4, 2025 · 13 min read ·

Share on:

Recently, I was trying to figure out if the content of two headlines overlapped. I needed a script that would answer if Pacific Palisades is part of Los Angeles, but I couldn't use regex because there's no pattern that matches "Pacific Palisades is part of Los Angeles." There's also no simple string comparison that connects these two locations—it requires actual world knowledge.

I'd heard that you can run local AI models—basically smaller versions of ChatGPT that run right on your own machine without the cloud—and wondered if I could just ask "Is Pacific Palisades part of Los Angeles?" and get back a clean, structured true/false response. To my surprise, it was actually straightforward using REST and one of my favorite AI features, structured output.

Here's how I did it:

 1# Ask a simple true/false question
 2$question = "Is Pacific Palisades part of Los Angeles?"
 3
 4# Define schema for true/false response
 5$schema = @{
 6    type = "object"
 7    properties = @{
 8        answer = @{
 9            type = "boolean"
10        }
11    }
12    required = @("answer")
13}
14
15# Create the payload
16$body = @{
17    model = "llama3.1"
18    messages = @(
19        @{
20            role    = "user"
21            content = $question
22        }
23    )
24    stream = $false
25    format = $schema
26} | ConvertTo-Json -Depth 3
27
28# Get the response
29$response = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -Body $body
30
31# See raw output
32$response.message.content # { "answer" : true }
33
34# Output the result as a PowerShell object
35$response.message.content | ConvertFrom-Json

That's it! A true/false JSON response. This isn't just using PowerShell to have a convo with a chatbot - I'm now working with PowerShell objects that I can pipe, filter, and change like any other object.

Oh, and if you're wondering how I knew to create the JSON schema, I just asked ChatGPT to look at Ollama's json schema requirements and told it what I needed.

What's a local LLM?

Local LLMs are AI models that run on your own computer or network. They're generally just smaller AI models than the ones that power ChatGPT and Claude. They need fewer resources, so they can run on consumer hardware without melting people's CPUs.

Local LLMs that most of us are capable of running aren't as smart as say, Claude, but they're surprisingly good at answering specific, focused questions. And they're getting better/faster, literally every day.

Beyond regular expressions

People wonder why I don't think AI is just hype, and it's because while regex and string manipulation handle basic pattern matching, LLMs can:

Understand context and semantic meaning
Infer missing information
Handle exceptions and variations naturally
Don't require tedious rule maintenance

Imagine how useful this can be for cleaning up messy filenames, extracting structured data from logs, or answering factual queries that would otherwise need impossible logic or database lookups.

This is exactly what I want in my workflow as an automation engineer. Sure, right now, they might not be as accurate as we need, but I know that soon they will be.

Structured output is awesome

I know a lot of us already use PowerShell for chatting with OpenAI's API, but I think we'll soon be using them for their ability to return this type of structured data. By using schemas (similar to creating tables in SQL), we can transform messy, unstructured information into clean PowerShell objects.

When you ask for structured output, you're telling the model exactly what format you want the answer in. This might be:

A simple boolean (true/false)
An object with multiple properties
An array of structured items

I first used structured output when OpenAI added support to their models, and was pumped when I found out that Ollama supported them as well. This meant that I could use structured output without a subscription or in offline scenarios.

To get started with Ollama, just download and install it, then run these two commands to pull Meta's llama3.1 model:

1# Pull the model
2ollama pull llama3.1
3
4# Start chatting
5ollama run llama3.1

This model isn't as small or fast as a model like tinyllama (seen later), but it's way smarter.

A real-world example: MP3 filename cleaning

Let's see structured output in action with a practical example - cleaning up messy MP3 filenames. I chose this example because it's relatable and you can extrapolate to renaming log files or any other type of file, based on random metadata.

 1# Sample list of messy MP3 filenames
 2$messyMp3Files = @(
 3    "01_bohrhap_queen.mp3",
 4    "material_girl-madonna85.mp3",
 5    "hotel_cali_eagles1976.mp3",
 6    "IMAGINE-J-LENNON-track2.mp3",
 7    "hey_jude_(beetles)_1968_.mp3",
 8    "billiejean_MJ_thriller.mp3",
 9    "sweet_child_of_mine_gnr87.mp3",
10    "shake_it_off-taylorswift.mp3",
11    "purple-haze-jimmy_hendrix_1967.mp3",
12    "bohemian(queen)rhaps.mp3",
13    "smells_like_teen_spirit_nirvana91.mp3",
14    "halo_beyonce_2008.mp3"
15)
16
17# Define the schema for the JSON response
18$schema = @{
19    type = "object"
20    properties = @{
21        artist = @{ type = "string" }
22        song = @{ type = "string" }
23    }
24    required = @("artist", "song")
25}
26
27$prompt = @"
28You are an AI that extracts artist and song names from messy MP3 filenames.
29
30Examples:
311. "hotel_cali_eagles1976.mp3" → {"artist": "Eagles", "song": "Hotel California"}
322. "rolling_in_the_deep-adele_2011.mp3" → {"artist": "Adele", "song": "Rolling in the Deep"}
333. "californication-RHCP.mp3" → {"artist": "Red Hot Chili Peppers", "song": "Californication"}
34
35Now, extract from this filename:
36"@
37
38# Process each MP3 file
39foreach ($file in $messyMp3Files) {
40    # Create message for LLM
41    $msg = "$prompt $file"
42
43    # Create the payload with the schema object
44    $payload = @{
45        model = "llama3.1"
46        messages = @(@{role="user"; content=$msg})
47        stream = $false
48        format = $schema
49    } | ConvertTo-Json -Depth 6 -Compress
50
51    # Call the local LLM API
52    $response = Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post -Body $payload
53    $info = $response.message.content | ConvertFrom-Json
54
55    # Output the result
56    [pscustomobject]@{
57        Filename = $file
58        NewFilename  = "$($info.artist) - $($info.song).mp3"
59    }
60}

With any luck, your output should look like mine:

 1Filename                              NewFilename
 2--------                              -----------
 301_bohrhap_queen.mp3                  Queen - Bohemian Rhapsody.mp3
 4material_girl-madonna85.mp3           Madonna - Material Girl.mp3
 5hotel_cali_eagles1976.mp3             Eagles - Hotel California.mp3
 6IMAGINE-J-LENNON-track2.mp3           John Lennon - Imagine.mp3
 7hey_jude_(beetles)_1968_.mp3          The Beatles - Hey Jude.mp3
 8billiejean_MJ_thriller.mp3            Michael Jackson - Billie Jean.mp3
 9sweet_child_of_mine_gnr87.mp3         Guns N' Roses - Sweet Child O' Mine.mp3
10shake_it_off-taylorswift.mp3          Taylor Swift - Shake It Off.mp3
11purple-haze-jimmy_hendrix_1967.mp3    Jimi Hendrix - Purple Haze.mp3
12bohemian(queen)rhaps.mp3              Queen - Bohemian Rhapsody.mp3
13smells_like_teen_spirit_nirvana91.mp3 Nirvana - Smells Like Teen Spirit.mp3
14halo_beyonce_2008.mp3                 Beyoncé - Halo.mp3

So intelligent, it even fixes spelling mistakes like "beetles" → "Beatles"! And it outputs clean PowerShell objects we can pipe anywhere.

Ask tiny questions, not big ones

You probably noticed that the code above uses a simple foreach loop, and that's exactly the point—I'm intentionally making 10 separate API calls, one for each filename. When I say "tiny questions," I mean it literally: each request asks about just one filename.

You might wonder, "Why not batch all filenames into a single API call?" This is where local models differ from cloud-based ones like Claude or GPT-4o and even GPT-4o-mini. With cloud models, you could send all filenames at once and expect a structured array back. But local models running on consumer hardware simply don't have the context window (memory) or processing capacity to handle that much input and maintain accuracy.

 1# This works with cloud models, but will overwhelm most local LLMs
 2$badIdea = @{
 3    model = "llama3.1"
 4    messages = @(@{
 5        role = "user"
 6        content = "Process all these filenames at once: $($messyMp3Files -join ', ')"
 7    })
 8    stream = $false
 9    format = @{ ... array schema ... }
10}

By sending too much data and asking too big of a question, your local model will timeout or give hallucinated answers, at best. By keeping questions tiny—focused on one piece of data at a time—you'll get much more reliable results.

Regarding the prompt I used - the first one I tried produced terrible results with hallucinations.

1$msg = "Convert this MP3 filename into artist and song information: $file"

Then I asked ChatGPT for a better one and it gave me the one you see in the sample code:

 1$prompt = @"
 2You are an AI that extracts artist and song names from messy MP3 filenames.
 3
 4Examples:
 51. "hotel_cali_eagles1976.mp3" → {"artist": "Eagles", "song": "Hotel California"}
 62. "rolling_in_the_deep-adele_2011.mp3" → {"artist": "Adele", "song": "Rolling in the Deep"}
 73. "californication-RHCP.mp3" → {"artist": "Red Hot Chili Peppers", "song": "Californication"}
 8
 9Now, extract from this filename:
10"@

That produced immediately better results, most likely because of the examples. MODELS LOVE EXAMPLES, so when you're crafting your prompt, be sure to include examples of what you're looking for. Or just ask ChatGPT, which is what I did to perfect my results.

Performance considerations

When using llama3.1, processing 10 filenames took about 2 minutes on my SQL Server-optimized Azure VM. That's the trade-off for running these models on standard hardware. They're getting faster, however, and if you can leave your script running overnight, you can process thousands of items in batch.

HUGE UPDATE: I tried this on my 2021 Macbook Pro M1 Max and it ONLY TOOK 5 SECONDS, NOT TWO MINUTES! I hate using the phrase "game-changer" but that's about as fast as using a cloud-based solution. Total game-changer! My laptop's specs are:

MacBook Pro 2021 Apple M1 Max with 10-core CPU 32-core GPU, 16-core Neural Engine 32GB unified memory

Back in 2021, my wife encouraged me to get a top-of-the-line laptop instead of the used lowered powered ones I usually get. I regretted it for years — until I started working with AI. Now I'm PUMPED that I have something that could power a nation.

Why this matters

For PowerShell developers, local LLMs open up interesting possibilities:

No internet connection needed
No API costs
Complete privacy for sensitive data
Perfect for repetitive tasks with clear rules

What makes local models useful is their focus on specific tasks with obvious patterns. They're not replacing ChatGPT for open-ended creative work, but for structured data extraction and transformation, they're pretty damn good.

As these models improve and hardware gets faster, local AI will keep catching up to the cloud giants. By learning to work with local LLMs now, you're building skills that'll matter more and more - at least until the planet burns up or whatever.

Comparing major local LLM and SLM options

When choosing a local LLM for PowerShell automation, I recommend just looking around ollama.com and trying out a ton of models. Here are some popular options that ChatGPT recommends:

Model	Parameters	Description
LLaMA (Meta)	7B, 13B, 70B	My favorite. The 70B variant achieves excellent factual accuracy but needs serious hardware. The 7B/13B versions are more accessible but may hallucinate more. Solid instruction-following capabilities.
Mistral 7B	7B	Highly efficient, outperforming many larger models. Notable for lower hallucination rates, making it more reliable for factual queries.
Gemma (Google)	2B, 7B	2B variant designed for CPU/mobile use. The instruction-tuned 7B model competes well with other 7B models. Known for a friendly style and structured output using bullet points and Markdown.

"Parameters" in this table refer to the number of adjustable values (weights) in a machine learning model, determining its complexity and capacity to learn. Bigger models, like LLaMA 70B, generally produce more accurate results but demand serious hardware. Smaller ones, like Mistral 7B, are faster and more efficient but may have limits in reasoning. Think of parameters like neurons in a brain—more neurons, more intelligence (hopefully).

Small Language Models (SLMs)

SLMs, another local option, are just smaller AI models with fewer parameters than Local LLMs. They need less computing power, so they can run on regular hardware without melting your CPU.

When choosing a local SLM for PowerShell automation, several models stand out:

Model	Parameters	Description
Phi-4 (Microsoft)	14B	The latest in Microsoft's Phi series, Phi-4 is a 14 billion parameter small language model (SLM) that excels at complex reasoning tasks, including mathematics and language processing.
Mixtral 8×7B (Mistral AI)	56B	Mixtral 8×7B is a customized and efficient model that balances performance with resource requirements. It utilizes a mixture of experts (MoE) architecture, combining eight 7 billion parameter models, totaling 56 billion parameters, with a subset activated per task to optimize efficiency.
Gemma 2 (Google)	2B, 9B, 27B	An evolution of Google's Gemma series, Gemma 2 offers competitive performance in the SLM space. It formats responses with bullet points and Markdown, which works well for structured output.
TinyLlama	1.1B	A lightweight model designed for ultra-low resource environments. The TinyLlama project aims to pretrain a 1.1 billion parameter Llama model on 3 trillion tokens.

I've found that it's often hard to find success with ultra-tiny SLMs like tinyllama (~638MB on disk). To demonstrate the difference in quality compared to the Llama 3.1 results we saw earlier, try running the same MP3 filename cleaning task with TinyLlama and see how it bombs. Just pull tinyllama then replace the model listed in the foreach.

 1# Pull tinyllama
 2ollama pull tinyllama
 3...
 4foreach ($file in $messyMp3Files) {
 5    $msg = "$prompt $file"
 6
 7    $payload = @{
 8        model = "tinyllama"
 9        messages = @(@{role="user"; content=$msg})
10        stream = $false
11        format = $schema
12    } | ConvertTo-Json -Depth 6 -Compress
13}
14...

My own test output made me laugh:

 1Filename                              NewFilename
 2--------                              -----------
 301_bohrhap_queen.mp3                  Bohrhap - Queen.mp3
 4material_girl-madonna85.mp3           Material Girl - Madonna.mp3
 5hotel_cali_eagles1976.mp3             Eaglez - Hotel California.mp3
 6IMAGINE-J-LENNON-track2.mp3           John Lennon - Imagine.mp3
 7hey_jude_(beetles)_1968_.mp3          Beetlejuice - Heysudan.mp3
 8billiejean_MJ_thriller.mp3            Billie Eilish - Thriller.mp3
 9sweet_child_of_mine_gnr87.mp3         Guns N' Roses - Sweet Child of Mine (Guns N' Roses).mp3
10shake_it_off-taylorswift.mp3          Taylor Swift - Shake It Off.mp3
11purple-haze-jimmy_hendrix_1967.mp3    Jimmie Henдриux - Purple Haze (Jimi Hendrix).mp3
12bohemian(queen)rhaps.mp3              Queen - Bohemian Rhapsody.mp3
13smells_like_teen_spirit_nirvana91.mp3 Nirvana - Smells Like Teen Spirit.mp3
14halo_beyonce_2008.mp3                 Beyoncé - Halo.mp3

Not even close and no cigar! Microsoft's Phi3 SLM model (~2.2GB) did get a LOT closer and it's less than half the size of Llama 3.1, which is nice. It did get one wrong though.

1Filename                        NewFilename
2--------                        -----------
3sweet_child_of_mine_gnr87.mp3   Deep Purple - Sweet Child of Mine.mp3

The rest were good, so that's promising. It was faster by at least 20 seconds, too. I do find that the smaller the model, the harder it is to prompt well enough to give accurate results, so keep that in mind as you're tryin to squeeze out peformance.

Next steps

The example code here runs on pretty much any computer with a decent amount of RAM, and you can adapt it to whatever weird data problems you're facing.

Yes, they're slower than cloud models, but they're free, private, and getting faster with each release. For batch processing jobs or working with sensitive data, that trade-off makes a lot of sense.

If you're already comfortable with PowerShell and APIs, you'll pick this up in minutes.

This post is part of my AI Integration for Automation Engineers series. If you enjoyed it, check out these other posts:

Also, if you liked the content in this blog post, most of it came from my research for my upcoming AI book, published by Manning. You can use the code gaipbl45 to get 45% off.