What AI Agents Actually Do When They Use Your CLI (And What to Build For Them)
We built and tested an agent-help feature for our CLI. AI agents ignored it. Here's what actually helps agents use CLI tools effectively.
Claude Sonnet 4.5
AI, edited by Harry Mower
AI coding agents are using CLI tools more than ever. GitHub Copilot, Claude Code, and Kiro now invoke command-line interfaces directly to complete developer tasks. But agents interact with CLIs differently than humans do — and most CLI tools aren't designed for that.
We built and tested an "agent-help" feature for the MemNexus CLI to see what actually helps AI agents succeed. The results surprised us.
The Problem: Agents Use CLIs Differently
MemNexus is an AI memory system. Our mx CLI has 14+ command groups: memories, conversations, facts, topics, graphrag, and more. When AI agents use it, they face two challenges:
- Command disambiguation — understanding the difference between similar commands (
recapvsdigest,searchvslist) - Non-interactive automation — avoiding interactive prompts that block agent execution
We assumed agents needed comprehensive documentation optimized for LLM consumption. So we built mx agent-help: a dedicated subcommand outputting curated workflows, disambiguation guides, environment variables, and automation tips.
Then we tested whether agents actually used it.
What We Built
mx agent-help outputs LLM-optimized documentation separate from standard --help. It includes:
- Commonly Confused Commands — disambiguation guide for similar commands
- Common Workflows — task-oriented recipes (not just command reference)
- Environment Variables —
MX_API_KEY,MX_BASE_URL, etc. - Tips for Automation — how to avoid interactive prompts
We also added a hint to mx --help output pointing agents to this feature:
Copilot, Claude, ChatGPT: run `mx agent-help` for workflows,
environment variables, and disambiguation guide.
The key architectural decision: make it a real subcommand, not a flag. Agents scan command lists in --help output. A flag like --agent-help is invisible; a subcommand appears in the command list.
How We Tested
We ran two rounds of testing with three AI agents:
- GitHub Copilot CLI (v0.0.406) — via
gh copilot -pnon-interactive mode - Claude Code — via VS Code extension
- Kiro CLI — via
kiro -pnon-interactive mode
We tested 6+ task types:
- Command discovery ("list my recent memories")
- Disambiguation ("what's the difference between recap and digest?")
- Automation ("create a memory with this content")
- Exploratory ("what can the mx CLI do?")
All testing used non-interactive mode (-p or --no-interactive flags) to simulate real agent behavior.
Round 1: What We Learned
1. Agents prefer --help over agent-help
Both Copilot and Claude Code went straight to subcommand help for straightforward tasks:
# Copilot's actual command sequence for "list my recent memories"
mx memories --help
mx memories list --help
mx memories list --limit 10
They drilled down through the help hierarchy. They did NOT invoke mx agent-help for discovery.
2. Agent-help is for disambiguation, not discovery
Copilot only used mx agent-help when it needed to understand the difference between similar commands:
# Copilot's actual sequence for "what's the difference between recap and digest?"
mx agent-help | grep -E "recap|digest"
For clear tasks ("list memories"), agents used --help. For ambiguous tasks ("recap vs digest"), they used agent-help.
3. The "Commonly Confused Commands" section was most valuable
When agents DID use agent-help, they specifically grepped the disambiguation section. The workflow examples and environment variable docs? Ignored.
This told us where to focus our efforts.
4. Naming specific AI tools in the hint matters
Initial hint text:
AI agents: run `mx --agent-help` for workflows and disambiguation guide.
Updated hint text:
Copilot, Claude, ChatGPT: run `mx agent-help` for workflows,
environment variables, and disambiguation guide.
After the change, Copilot was more likely to notice and act on the hint. Generic "AI agents" was too abstract; specific tool names triggered recognition.
5. One agent read source code instead of running the CLI
For exploratory tasks ("what can the mx CLI do?"), Copilot sometimes explored cli/src/commands/ TypeScript files rather than running mx --help.
This suggests agents use whatever information source is most convenient. If they're already in a codebase, they'll read source. If they're in a shell, they'll run commands.
What We Changed
Based on Round 1 testing, we made two key changes:
1. Added cross-references to --help descriptions
Instead of forcing agents to discover agent-help, we put disambiguation WHERE agents already look:
// Before
.description('Get a recap of recent work grouped by conversation')
// After
.description('Recap of recent work grouped by conversation (see also: digest)')
Now when agents run mx memories recap --help, they immediately see there's a related digest command and can investigate further.
Example cross-references we added:
// recap command
.description('Recap of recent work grouped by conversation (see also: digest)')
// digest command
.description('AI-powered digest of memories matching a query (see also: recap)')
// memories search command
.description('Search memories (keyword, semantic, hybrid; see also: graphrag query)')
// graphrag query command
.description('Execute GraphRAG query (see also: memories search)')
// topics search command
.description('Search topics by query string (see also: discover-related)')
// topics discover-related command
.description('Discover related topics via graph traversal (see also: search)')
2. Trimmed the agent-help output
Removed the auto-introspected Command Reference section (agents get that from --help anyway). Cut from ~200 lines to 129 lines — consumable in full rather than requiring grep.
We kept:
- Commonly Confused Commands (the most valuable part)
- Common Workflows (task-oriented recipes)
- Environment Variables
- Tips for Automation
Round 2: Results
After making these changes, we re-tested the same tasks.
Cross-references eliminated the need for agent-help in disambiguation
Both Copilot and Kiro found the cross-references in mx memories --help and understood the difference between commands without needing agent-help at all:
# Kiro's actual sequence for "difference between recap and digest?"
mx memories recap --help
mx memories digest --help
# Found cross-references, understood the difference, explained to user
The disambiguation information now lives where agents naturally look.
Agent-help still serves a purpose
While agents didn't need it for disambiguation anymore, agent-help remained valuable for:
- Environment variable discovery — agents don't know to
grepforMX_*variables in docs - Automation tips — preventing interactive prompt issues (see below)
Interactive prompts still trip up agents
Kiro tried to create a memory without --conversation-id and got stuck on an interactive prompt:
# What Kiro ran
mx memories create --content "Test memory"
# What happened
? Enter conversation ID (or "NEW" for new conversation): _
# Kiro hung here — can't answer interactive prompts in non-interactive mode
The agent-help Tips section explicitly warns about this:
Tips for Automation:
- Always use --content flag for non-interactive memory creation
- Use --conversation-id "NEW" or a specific ID to avoid prompts
But agents don't proactively read agent-help — they only invoke it when they hit a problem. By then, they're already stuck.
Solution: We should make --conversation-id auto-default to "NEW" when --content is provided. Don't prompt in non-interactive contexts.
Practical Takeaways for CLI Developers
If you're building a CLI that AI agents will use, here's what actually helps:
1. Put disambiguation in --help descriptions, not separate docs
Add cross-references directly in command descriptions:
.command('build')
.description('Build production artifacts (see also: dev, preview)')
.command('dev')
.description('Start development server (see also: build, preview)')
Agents read --help output. Make it self-contained.
2. Make agent-facing features real subcommands, not flags
Agents scan command lists, not tip text. This appears in mx --help:
Commands:
memories Manage memory storage
conversations Manage conversation threads
agent-help LLM-optimized documentation
This doesn't:
Options:
--agent-help Show LLM-optimized documentation
3. Name specific AI tools in hint text
Be explicit:
Copilot, Claude, ChatGPT: run `tool agent-help` for workflows
Not generic:
AI agents: run `tool agent-help` for workflows
4. Avoid interactive prompts in non-interactive contexts
Detect when you're running non-interactively:
const isInteractive = process.stdin.isTTY && process.stdout.isTTY;
if (!isInteractive && !options.conversationId) {
// Auto-default instead of prompting
options.conversationId = "NEW";
}
Or require critical flags when running non-interactively.
5. Test with real agents in non-interactive mode
Run your CLI through Copilot CLI (gh copilot -p), Claude Code, or Kiro (kiro -p). Watch what they actually invoke. You'll be surprised.
What's Next
We're advancing agent-friendliness further:
- Auto-detect non-interactive context — default
--conversation-idto avoid prompts - Expand cross-references — add "see also" to all ambiguous command pairs
- Add usage examples to --help — agents benefit from copy-paste examples in standard help output
The core insight: agents don't need special documentation. They need better standard documentation in the places they already look.
Try it yourself: The MemNexus CLI is open source. Install with npm install -g @memnexus-ai/cli and run mx agent-help to see the full output. Or test with an AI agent: gh copilot -p "list my recent memories using the mx CLI".
Get updates on AI memory and developer tools. No spam.
Related Posts
MCP vs CLI: How Should AI Agents Use MemNexus?
We tested MCP vs CLI across three AI agents. GPT-based agents were 2x faster with MCP. Claude-based Kiro performed equally well with CLI. The right interface depends on the agent.
Teams of Agents Shouldn't Repeat Each Other's Mistakes
One developer, five vertical teams of agents — MCP server team, CommitContext team, SDK team, customer portal team, etc. Before shared memory: they contradicted each other, built incompatible features, rediscovered the same bugs. After: they stay coherent because they can read from the same knowledge base.
Your Agent Shouldn't Have to Ask What You Were Working On
Build-context delivers a structured briefing — active work, key facts, gotchas, recent activity — before your agent starts. One command, under 60 seconds. No more cold starts.