Asking Bigger Questions: Remote LLMs for Automation Engineers

Mar 16, 2025 · 17 min read ·

Share on:

If you read my post on asking tiny questions, you're already familiar with how structured output turns LLMs into pretty reliable data processors. We saw how requesting boolean responses or specific JSON formats made local LLMs actually useful for automation tasks.

In this article, we're taking that same structured output concept but applying it to a much bigger challenge: parsing larger amounts of data. Instead of tiny true/false questions or simple metadata extraction from filenames, we're now letting AI handle entire command outputs - changing unstructured text dumps into clean PowerShell objects.

Let's look at a concrete example. When we run netstat, we get output like this:

1$netstat = netstat -n
2$netstat

This outputs:

 1Active Connections
 2
 3  Proto  Local Address            Foreign Address        State
 4  TCP    192.168.0.4:3389         24.0.175.22:64549      ESTABLISHED
 5  TCP    192.168.0.4:49913        178.63.129.16:80       ESTABLISHED
 6  TCP    192.168.0.4:50127        14.207.247.137:443     ESTABLISHED
 7  TCP    192.168.0.4:50984        69.33.185.197:443      ESTABLISHED
 8  TCP    192.168.0.4:51415        53.112.122.39:443      ESTABLISHED
 9  TCP    192.168.0.4:51635        53.112.122.47:443      ESTABLISHED
10  TCP    192.168.0.4:52351        23.19.176.90:443       CLOSE_WAIT

Traditionally, if we wanted to filter these connections or work with them as objects, we'd need to write complex regular expressions to parse each line. But what if we could just tell AI to do this for us?

By using structured objects in more intelligent LLMs, this raw text gets transformed into structured PowerShell objects:

1$netstat | ConvertTo-Object | Select-Object -ExpandProperty connections

And the output:

1protocol local_ip  local_port foreign_ip      foreign_port state
2-------- --------  ---------- ----------      ------------ -----
3TCP      192.168.0.4       3389  24.0.175.222          64549 ESTABLISHED
4TCP      192.168.0.4      49913  178.63.129.16            80 ESTABLISHED
5TCP      192.168.0.4      50127  14.207.247.137          443 ESTABLISHED
6TCP      192.168.0.4      50984  69.33.185.197           443 ESTABLISHED
7TCP      192.168.0.4      51415  53.112.122.39           443 ESTABLISHED
8TCP      192.168.0.4      51635  53.112.122.47           443 ESTABLISHED
9TCP      192.168.0.4      52351  23.19.176.90            443 CLOSE_WAIT

Now we can work with these connections just like any other PowerShell object. Want to find all connections to a specific IP? Easy:

1$netstat | ConvertTo-Object |
2    Select-Object -ExpandProperty connections |
3    Where-Object foreign_ip -eq "52.112.122.39"

Or maybe you want to count connections by state:

1$netstat | ConvertTo-Object |
2    Select-Object -ExpandProperty connections |
3    Group-Object state |
4    Select-Object Name, Count

No regex, no string manipulation, no custom parsing logic. Just clean PowerShell objects you can immediately work with.

And I know that PowerShell already has Get-NetTCPConnection for this specific example, but I'm using netstat because it's universally familiar and helps make the concept tangible.

The parsing problem

How many times have you run a command like netstat, ipconfig, or some legacy tool and then spent hours writing regex patterns to extract the data you need? It's a tedious, error-prone process that breaks whenever the output format changes slightly.

Previous approaches require:

Complex regex patterns
String splitting and manipulation
Hardcoded indices and positions
Custom parsers for each command

Wouldn't it be nice if we could just skip all that and go straight from command output to PowerShell objects?

The ConvertTo-Object function

I've created a PowerShell function that uses AI to turn any command output into PowerShell objects. Almost like PowerShell Crescendo, but without having to define any patterns manually:

 1function ConvertTo-Object {
 2    [CmdletBinding()]
 3    param (
 4        [Parameter(ValueFromPipeline)]
 5        [AllowEmptyString()]
 6        [string[]]$InputObject = @(),
 7        [string]$ApiKey = $env:OPENAI_API_KEY,
 8        [string]$Model = "gpt-4o-mini",
 9        [string]$ApiBase = $env:OPENAI_API_BASE,
10        [string]$Schema # Optional pre-defined schema
11    )
12    begin {
13        $msg = @()
14    }
15    process {
16        if("$InputObject") {
17            Write-Verbose "$InputObject"
18            $msg += "$InputObject"
19        }
20    }
21    end {
22        try {
23            # If no schema was provided, we'll need to generate one
24            if(-not $Schema) {
25                Write-Verbose "No schema provided, generating one using Generate-OutputSchema..."
26                $splat = @{
27                  InputObject = $msg
28                  ApiKey      = $ApiKey
29                  Model       = $Model
30                  ApiBase     = $ApiBase
31                }
32
33                $Schema = Generate-OutputSchema @splat
34            } else {
35                Write-Verbose "Using provided schema"
36            }
37
38            # Use the schema to parse the command output
39            Write-Verbose "Using schema to parse command output..."
40
41            $parseParams = @{
42                Model = $Model
43                Message = "Parse this output into structured data based on the attached schema: $msg$"
44                Format = "json_schema"
45                JsonSchema = $Schema
46                ApiKey = $ApiKey
47            }
48
49            if ($ApiBase) {
50                $parseParams.ApiBase = $ApiBase
51            }
52
53            # Convert the JSON string to PowerShell objects
54            (Request-ChatCompletion @parseParams).Answer | ConvertFrom-Json
55        }
56        catch {
57            Write-Error "Failed to convert command output to objects: $_"
58        }
59    }
60}

What makes this approach powerful is the two-step process:

First, we ask the AI to generate a JSON schema based on the command output structure
Then, we use that schema to constrain the AI when parsing the actual data

This means the function can adapt to virtually any command output format without pre-configuration.

The Generate-OutputSchema function

Behind the scenes, the heavy lifting happens in this function that analyzes command output and creates a proper schema:

  1function Generate-OutputSchema {
  2    [CmdletBinding()]
  3    param (
  4        [Parameter(ValueFromPipeline)]
  5        [AllowEmptyString()]
  6        [string[]]$InputObject = @(),
  7        [string]$ApiKey = $env:OPENAI_API_KEY,
  8        [string]$Model = "gpt-4o-mini",
  9        [string]$ApiBase = $env:OPENAI_API_BASE
 10    )
 11    begin {
 12        # Schema for generating a JSON schema (meta, I know)
 13        $schemaGeneratorSchema = @{
 14            name   = "schema_generator"
 15            strict = $true
 16            schema = @{
 17                type       = "object"
 18                properties = @{
 19                    jsonoutput = @{
 20                        type        = "string"
 21                        description = "The generated JSON schema as a string"
 22                    }
 23                }
 24                required = @("jsonoutput")
 25                additionalProperties = $false
 26            }
 27        }
 28
 29        $msg = @()
 30    }
 31    process {
 32        if("$InputObject") {
 33            Write-Verbose "$InputObject"
 34            $msg += "$InputObject"
 35        }
 36    }
 37    end {
 38        try {
 39            # Ask the AI to generate a schema for this command output
 40            Write-Verbose "Generating schema for the provided command output..."
 41
 42            # Create the schema prompt. AI *loves* examples, so give it a thorough one.
 43            # It doesn't have to be an example of whatever you're pasting in, I was just being lazy
 44            $schemaPrompt = 'You are an AI designed to generate structured JSON schemas for unstructured text. Analyze the following command output and generate a well-formed JSON Schema that represents its structure:
 45
 46            $msg
 47
 48            The schema should:
 49            1. Accurately model the structure into an appropriate JSON format
 50            2. Use appropriate data types (string, number, boolean, array, object)
 51            3. Include meaningful property names that describe each field
 52            4. Include descriptions for each property
 53            5. Use enum values where applicable
 54            6. Specify required fields based on essential elements
 55            7. Set additionalProperties to false
 56
 57            REQUIRED FIELDS: name, additionalProperties, requiredproperties
 58
 59            For example
 60
 61            {
 62              "name": "file_list",
 63              "strict": false,
 64              "schema": {
 65                "type": "object",
 66                "properties": {
 67                  "filenames": {
 68                    "type": "array",
 69                    "description": "A list of filenames related to the generated JSON schema",
 70                    "items": {
 71                      "type": "string"
 72                    }
 73                  }
 74                },
 75                "required": ["filenames"],
 76                "additionalProperties": false
 77              }
 78            }
 79
 80            Such as:
 81
 82
 83            Active Connections
 84
 85              Proto  Local Address         Foreign Address        State
 86              TCP    192.168.0.4:3389      24.0.175.222:61389     ESTABLISHED
 87              TCP    192.168.0.4:49913     168.63.129.16:80       ESTABLISHED
 88              TCP    192.168.0.4:50127     4.207.247.137:443      ESTABLISHED
 89              TCP    192.168.0.4:50573     40.99.201.162:443      ESTABLISHED
 90              TCP    192.168.0.4:55447     76.223.92.165:443      ESTABLISHED
 91
 92            Should return:
 93
 94            {
 95              "name": "netstat_active_connections",
 96              "strict": false,
 97              "schema": {
 98                "type": "object",
 99                "properties": {
100                  "connections": {
101                    "type": "array",
102                    "description": "List of active network connections",
103                    "items": {
104                      "type": "object",
105                      "properties": {
106                        "protocol": {
107                          "type": "string",
108                          "description": "The protocol used (e.g., TCP, UDP)",
109                          "enum": ["TCP", "UDP"]
110                        },
111                        "local_ip": {
112                          "type": "string",
113                          "description": "The local IP address"
114                        },
115                        "local_port": {
116                          "type": "integer",
117                          "description": "The local port number"
118                        },
119                        "foreign_ip": {
120                          "type": "string",
121                          "description": "The foreign (remote) IP address"
122                        },
123                        "foreign_port": {
124                          "type": "integer",
125                          "description": "The foreign (remote) port number"
126                        },
127                        "state": {
128                          "type": "string",
129                          "description": "The state of the TCP connection",
130                          "enum": [
131                            "ESTABLISHED", "SYN_SENT", "SYN_RECV", "FIN_WAIT1", "FIN_WAIT2",
132                            "TIME_WAIT", "CLOSED", "CLOSE_WAIT", "LAST_ACK", "LISTEN", "CLOSING"
133                          ]
134                        }
135                      },
136                      "required": [
137                        "protocol",
138                        "local_ip",
139                        "local_port",
140                        "foreign_ip",
141                        "foreign_port",
142                        "state"
143                      ],
144                      "additionalProperties": false
145                    }
146                  }
147                },
148                "required": ["connections"],
149                "additionalProperties": false
150              }
151            }'
152
153            # API call to get the schema
154            $schemaParams = @{
155                Model = $Model
156                Message = $schemaPrompt
157                Format = "json_schema"
158                Temperature = 0.2  # Lower temperature for more deterministic results
159                JsonSchema = ($schemaGeneratorSchema | ConvertTo-Json -Depth 99 -Compress)
160                ApiKey = $ApiKey
161            }
162
163            if ($ApiBase) {
164                $schemaParams.ApiBase = $ApiBase
165            }
166
167            $schemaResult = Request-ChatCompletion @schemaParams
168            $generatedSchema = ($schemaResult.Answer | ConvertFrom-Json).jsonoutput
169
170            Write-Verbose "Schema generated successfully!
171
172            $generatedSchema"
173
174            return $generatedSchema
175        }
176        catch {
177            Write-Error "Failed to generate schema: $_"
178        }
179    }
180}

And as usual, I use two examples in the prompt. Examples are important to AI - they make nearly every response better by giving the model a clear pattern to follow. I've found that including even one example dramatically improves the consistency and accuracy of the results.

Having a separate schema generation function gives you several advantages:

Reuse schemas for familiar commands: Generate a schema once, save it, and reuse it to avoid redundant API calls
Share schemas with your team: Create a repository of schemas for commands commonly used in your environment
Version control your schemas: Track changes as command outputs evolve over time
Fine-tune schemas manually: Generate a base schema and then customize it for special cases

For complex command outputs that you parse frequently, this modular approach can significantly reduce API costs and processing time while maintaining the flexibility to handle new output formats whenever they appear.

Were I to use this in production, I'd probably just use that command to build my schema once then attach it every time after. If you just want to play around and understand AI, though, running it each time is fine and costs a fraction of a penny.

Seeing it in action

Let's look at a practical example using netstat:

1# Run netstat and pipe the output to our function
2netstat -an | ConvertTo-Object | Where-Object state -eq ESTABLISHED

The result? Clean PowerShell objects with properties like protocol, local_address, foreign_address, and state - all without writing a single regex pattern.

But the real power comes when you use it with commands that have complex, multi-line outputs that would normally be a nightmare to parse:

1# Parse ipconfig output
2ipconfig /all | ConvertTo-Object

Why local LLMs need tiny questions (but cloud models don't)

The key difference between local LLMs and cloud models is context window size - essentially how much text they can process at once:

Local models (llama3.1, tinyllama): 4K-8K token windows. Great for single queries but overwhelmed by complex inputs.
Cloud models (gpt-4o-mini): 128K token windows (16-30x larger). Can handle entire log files or hundreds of filenames at once.

This isn't just about model size but architectural design. When using local LLMs, you trade API costs for smaller context windows and less intelligence. That's why local models need "tiny questions" while cloud models can handle "big questions" like parsing complex command outputs.

ConvertTo-Object uses cloud models because we need that larger context window for unpredictable command outputs. You pay for the API, but gain capabilities no local model can currently match.

Performance and cost considerations

This approach does have trade-offs:

Performance: Each conversion requires 1-2 API calls, which adds latency compared to regex
Cost: Using OpenAI's API incurs charges (currently around $0.15/1M tokens for gpt-4o-mini input)
Internet connectivity: Unlike regex, this requires an internet connection

But for many automation tasks, these trade-offs are worth it, especially when:

You're working with complex command outputs
The format changes regularly, breaking your regex
You need to quickly parse one-off command results
You're prototyping scripts and want to skip the regex phase

Using with PSOpenAI

The function above uses the excellent PSOpenAI module, which wraps the OpenAI API in PowerShell-friendly cmdlets. Make sure you have it installed:

1Install-Module -Name PSOpenAI

And set your API key:

1$env:OPENAI_API_KEY = "your-api-key-here"

This API key can be from GitHub Models (tutorial below), Azure OpenAI Service,OpenAI-compatible REST APIs for Amazon Bedrock or Google Gemini. Just make sure whatever model you choose supports OpenAI-style json_schema formats.

Using with GitHub Models for testing

One huge advantage of GitHub Models is that it lets you experiment with AI APIs for free. Every GitHub user already has access to these models, making it perfect for testing before deploying to production with paid APIs.

Here's how to set it up:

Install PSOpenAI and get a GitHub PAT:

1Install-Module -Name PSOpenAI -Scope CurrentUser
2# Then generate a GitHub PAT at: https://github.com/settings/tokens
3$env:GITHUB_PAT = 'your_github_pat_here'

Update our function to use GitHub Models:

1# Basic usage with GitHub Models
2netstat -an | ConvertTo-Object -ApiBase https://models.inference.ai.azure.com -ApiKey $env:GITHUB_PAT -Model gpt-4o-mini

GitHub Models have rate limits that vary by subscription plan. On the free and pro plans, you get:

15 requests per minute / 150 requests per day for low-tier models
10 requests per minute / 50 requests per day for high-tier models
Specific limits for models like DeepSeek-R1 (1 request per minute / 8 per day)

While these limits make GitHub Models unsuitable for production use, they're perfect for prototyping and testing your parsing logic before committing to a paid API.

For more details on rate limits and capabilities, check out the GitHub Models documentation.

GPT-4o-mini vs GPT-4o: Performance Comparison

When implementing this solution, I tested whether the cheaper gpt-4o-mini model would be sufficient compared to the more expensive gpt-4o:

1# GPT-4o processing time: 90s
2$netstat | ConvertTo-Object -Model gpt-4o | Select-Object -ExpandProperty connections | Measure-Object
3# Count : 57
4
5# GPT-4o-mini processing time: 41s
6$netstat | ConvertTo-Object -Model gpt-4o-mini | Select-Object -ExpandProperty connections | Measure-Object
7# Count : 57

The results are clear: gpt-4o-mini processed the same netstat output in less than half the time (41s vs 90s), with identical results - both correctly identified all 57 network connections.

This makes gpt-4o-mini the obvious choice for this use case:

More than twice as fast
Approximately 10x cheaper per token
Identical parsing results

For structured data parsing tasks like this, the mini model delivers the same quality at a fraction of the cost and time.

And this is just the beginning - in the future, we can expect this to be wildly faster. What takes 40+ seconds today will probably take like 5 seconds in a year or two. As these models get more optimized, this technique becomes even MORE valuable because the speed/cost tradeoff just keeps getting better.

Moving from GitHub Models to Azure OpenAI Service

After finding success with GitHub Models, you might want to switch to Azure OpenAI Service for production use. The transition is straightforward:

 1# Set up Azure OpenAI
 2$params = @{
 3    ApiType = 'Azure'
 4    Model = 'your-deployment-name'
 5    Message = 'Hello Azure OpenAI'
 6    ApiKey = $env:AZURE_OPENAI_API_KEY
 7    ApiBase = 'https://your-resource-name.openai.azure.com/'
 8    AuthType = 'azure'
 9}
10
11Request-ChatCompletion @params

Azure OpenAI even supports Entra ID authentication, allowing you to use MFA and avoid storing API keys in your scripts. For more details on setting up Azure OpenAI with PSOpenAI, check out the complete guide.

Processing in Bulk: Remote vs Local LLMs

Remember in my post on asking tiny questions how we had to loop through each filename individually? That's one of the key differences between local and cloud models.

With cloud models like gpt-4o-mini, we can take the exact same MP3 filename cleanup task and process all files at once:

 1# Sample list of messy MP3 filenames - same as before
 2$messyMp3Files = @(
 3    "01_bohrhap_queen.mp3",
 4    "material_girl-madonna85.mp3",
 5    "hotel_cali_eagles1976.mp3",
 6    "IMAGINE-J-LENNON-track2.mp3",
 7    "hey_jude_(beetles)_1968_.mp3",
 8    "billiejean_MJ_thriller.mp3",
 9    "sweet_child_of_mine_gnr87.mp3",
10    "shake_it_off-taylorswift.mp3",
11    "purple-haze-jimmy_hendrix_1967.mp3",
12    "bohemian(queen)rhaps.mp3",
13    "smells_like_teen_spirit_nirvana91.mp3",
14    "halo_beyonce_2008.mp3"
15)
16
17# Create a prompt for processing MP3 filenames
18$prompt = "Extract the artist and song title from these MP3 filenames: $messyMp3Files"
19
20# Define a custom schema for our MP3 collection
21$mp3Schema = @{
22    name = "mp3_collection"
23    strict = $true
24    schema = @{
25        type = "object"
26        properties = @{
27            files = @{
28                type = "array"
29                description = "Collection of processed MP3 files"
30                items = @{
31                    type = "object"
32                    properties = @{
33                        original_filename = @{
34                            type = "string"
35                            description = "The original messy filename"
36                        }
37                        artist = @{
38                            type = "string"
39                            description = "The artist name extracted from the filename"
40                        }
41                        song = @{
42                            type = "string"
43                            description = "The song title extracted from the filename"
44                        }
45                    }
46                    required = @("original_filename", "artist", "song")
47                    additionalProperties = $false
48                }
49            }
50        }
51        required = @("files")
52        additionalProperties = $false
53    }
54} | ConvertTo-Json -Depth 10
55
56# Process all filenames at once using ConvertTo-Object
57$results = $prompt | ConvertTo-Object -Schema $mp3Schema -Model gpt-4o-mini
58
59# Display the results
60$results.files | ForEach-Object {
61    [PSCustomObject]@{
62        Filename = $_.original_filename
63        NewFilename = "$($_.artist) - $($_.song).mp3"
64    }
65}

The difference is striking. With TinyLlama in the previous post, we got hallucinations like "Beetlejuice - Heysudan.mp3" and we had to process each file individually. With gpt-4o-mini, we can send all 12 filenames at once and get back a structured array containing each file's information. No loops, no repetitive API calls—just one request and one response.

Also note the prompt: unlike the local model, which required a few examples to be accurate, the cloud model figured out what we needed with a simple instruction. This seems to contradict my earlier statement that "MODELS LOVE EXAMPLES," but there's an important distinction here. While examples almost always improve results (even with cloud models), more powerful models like gpt-4o-mini can often infer what you want from context alone, especially for common tasks like filename parsing. For critical or complex tasks, I'd still include examples even with cloud models - but it's nice to have the flexibility when you're feeling lazy.

This approach is:

Significantly faster (one API call instead of twelve)
More cost-effective (fewer tokens used in total)
Cleaner code (no foreach loops)
More consistent (the model maintains context across all filenames)

When testing this with our messy MP3 files, gpt-4o-mini processed all 12 filenames in about 5 seconds, compared to 2+ minutes for processing them one at a time with a local LLM.

This is why cloud models excel at batch processing and complex tasks, despite their API costs. That massive context window lets them "see" all the data at once and maintain consistency across the entire dataset. Though I'd still batch cloud-based processes. I'd trust gpt-4o-mini with perhaps 100 filenames but not 1000.

In the end, it's about choosing the right tool for the job:

Local LLMs for small, privacy-sensitive tasks where latency isn't critical
Cloud models for complex, high-volume processing where speed and consistency matter

And as cloud models get smarter, faster and cheaper (remember, gpt-4o-mini is already 10x cheaper than gpt-4o), using AI will become increasingly practical for all kinds of automation tasks.

Real-world application: Intelligent document processing

The principles I've outlined for handling command outputs can be extended to much larger text processing tasks. In my post on Building an AI-Powered Document Processor with Azure Functions, I show how this same approach unlocks practical applications for document intelligence:

Extract text from PDFs and Office documents
Use the same structured output technique to categorize documents
Extract important metadata like parties, dates, and document types
Integrate with SharePoint for automatic document classification

The same techniques we're using to parse netstat and ipconfig outputs can be scaled up to handle legal contracts, technical documentation, and virtually any text-based content.

This is where ultra intelligent cloud-based models shine - their expanded context windows and enhanced intelligence allow them to process entire documents just as easily as they process command outputs.

Conclusion

AI isn't going to replace your automation skills or eliminate the need for PowerShell. But it can make certain tasks much easier (and it's a lot of fun).

The ConvertTo-Object function shows a practical application of AI for everyday automation tasks. It's not replacing your expertise; it's letting you focus on higher-value work by eliminating tedious parsing tasks. Ultimately, 100% of the software development skillset I've gained from writing advanced PowerShell modules has helped me with applying AI at work.

Next time you find yourself writing complex regex to extract data from command output, consider whether this AI-powered approach might save you time and headaches.

This post is part of my AI Integration for Automation Engineers series. If you found it helpful, check out these other posts:

P.S. If you want to learn more about practical AI applications for IT pros, check out my upcoming book from Manning, where I dive deeper into these techniques. Use code gaipbl45 for 45% off.