AIMAN began as a proof of concept for an AI-assisted command line interface and evolved into a comprehensive HCI testing framework. It provides researchers with tools to evaluate how AI assistance affects CLI usability through structured experiments.
This framework allows researchers and developers to:
- Conduct HCI experiments on CLI interfaces
- Evaluate the impact of AI assistance on command-line usage
- Collect structured data for human-computer interaction research
- Customize experimental parameters through configuration files
- Node.js
- Yarn
- Docker (for running user tests)
- OpenAI API Key
Before running the application, you must set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_openai_api_key_here- AI-driven CLI interface (proof of concept):
yarn dev- User testing framework (should only be run in Docker):
yarn user:test:dockerAll experiment configuration is stored in configuration files that can be easily modified:
src/evaluation/config/pre-questionnaire.config.ts: Define demographics and background questionssrc/evaluation/config/post-questionnaire.config.ts: Configure experience and satisfaction questions
src/evaluation/config/tests.config.ts: Define CLI commands to test, correct solutions, and categorization
src/evaluation/config/wording.config.ts: Centralized configuration for all text, labels, and UI appearance
The framework includes a comprehensive data tracking system:
The Store class in src/evaluation/store.ts manages all experimental data:
- Session Tracking: Each user session receives a unique ID and timestamps
- Test Results: Records every command attempt, success/failure status, and timing
- Questionnaire Data: Stores pre and post questionnaire responses
- Error Types: Categorizes and tracks different types of errors
Data is stored in JSON format with the following structure:
[
{
"runId": "unique-session-id",
"userName": "participant-name",
"startTime": "ISO timestamp",
"tests": [
{
"testName": "command-test-name",
"description": "test description",
"attempts": [
{
"attemptNumber": 1,
"command": "user-entered-command",
"timestamp": "ISO timestamp",
"stdout": "command output",
"stderr": "error output if any",
"errorType": "categorized error",
"timeMs": 1234,
"success": true/false
}
],
"totalTimeMs": 5678,
"totalAttempts": 3,
"errorTypes": ["syntax", "parameter"],
"startTime": "ISO timestamp",
"endTime": "ISO timestamp",
"isLlmAssisted": true/false,
"category": "test category"
}
],
"preQuestionnaire": { /* questionnaire responses */ },
"postQuestionnaire": { /* questionnaire responses */ },
"conditionOrder": "traditional-first" // or "ai-first"
}
]All collected data is written to the /output directory, which is mounted as a volume when running in Docker, allowing you to access the results after the experiment concludes.
The framework collects key metrics for analyzing CLI usability:
-
Performance Metrics:
- Task completion rates
- Time spent per command
- Number of attempts before success
- Types of errors encountered
-
Questionnaire Metrics:
- User satisfaction ratings
- Perceived ease of use
- Confidence levels
- Frustration scores
- Qualitative feedback
-
Comparative Analysis:
- AI-assisted vs. traditional command-line performance
- User experience differences between conditions
- Learning patterns across test categories
AIMAN implements a within-subjects experiment design where each participant experiences both conditions:
- Condition A: Traditional CLI (no AI assistance)
- Condition B: AI-assisted CLI
The conditionOrder parameter controls the sequence of conditions to counterbalance learning effects:
- "traditional-first": Participant first uses the traditional CLI, then the AI-assisted CLI
- "ai-first": Participant first uses the AI-assisted CLI, then the traditional CLI
This within-groups design allows for:
- Direct comparison of both interfaces by the same user
- Reduced variance by controlling for individual differences
- More statistical power with fewer participants
- Analysis of learning and transfer effects between conditions
The framework automatically balances condition orders across participants and tracks which order was assigned to each session in the stored data.
Additional experimental controls:
- Test counts can be configured via command line arguments
- Questionnaires can be skipped for testing with the
--skip-questionnairesflag - Test categories are balanced across conditions
The framework guides users through:
- Pre-study questionnaire to gather demographics and experience level
- CLI test scenarios in both conditions (with and without AI assistance)
- Post-study questionnaire to evaluate user experience
Data is collected for analysis on command success rates, completion times, and user satisfaction.
The Docker container ensures a consistent testing environment. It mounts an output directory to preserve results:
# Build and run with Docker
docker build --no-cache -t aiman .
docker run -it --rm -e OPENAI_API_KEY -v $(pwd)/output:/app/output aiman /bin/shContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.