Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# DocTo — Agent Guide

## Project Overview

DocTo is a Windows command-line utility written in **Delphi (Object Pascal)** that
converts Microsoft Office documents (Word `.doc`/`.docx`, Excel `.xls`/`.xlsx`,
PowerPoint `.ppt`/`.pptx`) to other formats (PDF, CSV, TXT, RTF, etc.) via COM
Automation. Microsoft Word, Excel, or PowerPoint must be installed on the host machine.


- Repository: https://github.com/tobya/DocTo
- Website: https://tobya.github.io/DocTo/
## Tech Stack

| Layer | Technology |
|-------|-----------|
| Language | Delphi (tested with 10.3; compatible with XE4+) |
| Office integration | Windows COM / Office Interop (Word, Excel, PowerPoint, Visio) |
| Build system | Delphi IDE / `.dproj` project file |
| Tests | Batch scripts (`.bat`) in `/test/` | PHP Pest Tests in `/companion`
| Docs / companion / Test site | Markdown + PHP (`/pages/`, `/companion/`) |

## Repository Layout

```
docTo/
├── src/ # All Delphi source files (.pas, .dpr, .dproj)
│ ├── docto.dpr # Project file (entry point)
│ ├── MainUtils.pas # Core TDocumentConverter class and shared utilities
│ ├── baseConfig.pas # Abstract TParamLoader base class for CLI params
│ ├── configInput.pas # -F / --inputfile parameter handler
│ ├── configOutput.pas # -OX / --outputextension parameter handler
│ ├── WordUtils.pas # Word COM interop helpers
│ ├── ExcelUtils.pas # Excel COM interop helpers
│ ├── PowerPointUtils.pas # PowerPoint COM interop helpers
│ ├── PathUtils.pas # Path/directory utilities
│ ├── ResourceUtils.pas # String resource helpers
│ ├── datamodSSL.* # Data module for SSL/webhook support
│ ├── shared/ # Shared/common units
│ ├── Exceptions/ # Custom exception types
│ └── res/ # Resource files
├── test/ # Manual test scripts and fixture files
│ ├── testDocTo.bat # Main test runner batch script
│ ├── InputFiles/ # Sample Word/RTF/CSV/XLS input files
│ ├── inputfilesxl/ # Excel-specific input fixtures
│ ├── inputfilespp/ # PowerPoint-specific input fixtures
│ ├── GeneratedFiles/ # Output directory for test conversions
│ └── GeneratedTestputFiles/
├── .github/
│ ├── workflows/ # GitHub Actions (greetings bot)
│ └── ISSUE_TEMPLATE/
├── pages/ # GitHub Pages / documentation content
├── companion/ # Companion tooling
├── exe/ # Pre-built binaries
├── readme.md
└── changes.md
```

## Architecture

### Core Pattern: TParamLoader

CLI parameters follow a **registration/dispatch** pattern:

- `TParamLoader` (`baseConfig.pas`) — abstract base class with three responsibilities:
- `RegisterParams(List)` — adds the parameter key(s) it handles (e.g. `-F`, `--INPUTFILE`) to a lookup list
- `Load(Converter, Param, Value)` — applies the parsed value to the `TDocumentConverter` instance
- `ShouldDec` — whether parsing should decrement the argument index after processing
- Each parameter has a dedicated subclass (e.g. `TParamInput`, `TParamOutputExtension`)
- `TDocumentConverter` (`MainUtils.pas`) is the central domain object passed through all param loaders

### Converters

Three COM-based converter paths:
- **Word** (default): use `-WD` flag or omit; format constants from `wdSaveFormat`
- **Excel**: use `-XL` flag; format constants from `xlFileFormat`
- **PowerPoint**: use `-PP` flag

### Logging

Log levels are integers: `1` ERRORS, `2` STANDARD (default), `5` CHATTY, `9` DEBUG, `10` VERBOSE.
Use `Converter.logdebug(msg, LEVEL)` for diagnostic output.

## Building

- **Compiler**: Embarcadero Delphi (tested with 10.3+; XE4 and XE7 also supported)
- **Platform**: Windows only — relies on COM, Word/Excel/PowerPoint interop
- Open `src/docto.dproj` in the Delphi IDE and build, or use the Delphi command-line compiler (`dcc32`)
- Output is a single `docto.exe` binary

No external package manager or build script is present. The project has no Linux/macOS build path.

## Code Structure

- Ensure that If blocks always have a begin end section for all branches even if not strictly neccessary.

## Testing

Tests are `.bat` scripts in `/test/`. They call the compiled `docto.exe` and verify output files are produced. Run them directly from a Windows command prompt with Office installed:

Tests are manual batch scripts in `test/`:

```bat
# Run the main test suite (requires Word/Excel/PowerPoint installed)
.\test\testDocTo.bat
```

- Input fixtures live in `test/InputFiles/`, `test/inputfilesxl/`, `test/inputfilespp/`
- Outputs are written to `test/GeneratedFiles/` and `test/GeneratedTestputFiles/`
- There is no automated unit-test framework; correctness is verified by inspecting generated files

There is no automated test runner — tests must be run manually on a machine with Microsoft Office installed.

Additional Tests are written as Pest Tests in companion Laravel PHP site in the `/companion/` dir

## Key Concepts for Agents

- **Application flags**: `-WD` (Word), `-XL` (Excel), `-PP` (PowerPoint), `-VS` (Visio). Word is the default.
- **Three required parameters**: `-F` (input file/dir), `-O` (output file/dir), `-T` (format type, e.g. `wdFormatPDF`).
- **Format types**: Passed as named constants (e.g. `wdFormatPDF`, `xlCSV`) or integers matching the Office Interop enums.
- **COM errors**: Office automation can raise `EOleException`. The `-X` flag controls whether DocTo halts or continues on COM errors.
- **TLB constants**: `Word_TLB_Constants.pas`, `Excel_TLB_Constants.pas`, and `PowerPoint_TLB_Constants.pas` define the Office format enum values.

## Contribution Guidelines

- Open an issue before large PRs to avoid wasted effort.
- The main development branch is `DocTo` (note: not `main`).
- Looking for help with: Delphi/VBA features, PHP/Laravel/Pest tests, and documentation.
- PRs are welcome.

When adding a new conversion feature, add a corresponding test case to `testDocTo.bat` and provide a sample input file under the appropriate `InputFiles*` directory.

## Error Codes

| Code | Meaning |
|------|---------|
| 200 | Invalid file format specified |
| 201 | Insufficient inputs (need -F, -O, -T at minimum) |
| 202 | Switch requires a value |
| 203 | Unknown switch |
| 204 | Input file does not exist |
| 205 | Invalid parameter value |
| 220 | Word/Excel/PowerPoint COM error |
| 221 | Word/Excel/PowerPoint not installed |
| 400 | Unknown error |

## Adding a New CLI Parameter

1. Create a new unit in `src/` (e.g. `configMyParam.pas`)
2. Declare a class that extends `TParamLoader`
3. Implement `RegisterParams` — add the short and long flag names
4. Implement `Load` — read `Value` and set the appropriate field on `TDocumentConverter`
5. Implement `ShouldDec` — return `false` unless the param consumes an extra token
6. Register the new class in the parameter dispatch table in `MainUtils.pas`
7. Add documentation for the new flag to `readme.md` under "Command Line Help"

## Key Conventions

- Path handling: always call `ExpandFileName` on user-supplied paths to resolve relative references; use `IncludeTrailingBackslash` for directory paths
- Input validation: use `Converter.HaltWithError(code, message)` to exit with a defined error code
- COM errors: wrap COM calls in try/except and honour the `-X` (halt-on-error) flag
- String constants and user-visible messages should be placed in `ResourceUtils.pas` (resource strings), not inlined
- The main branch is named `DocTo` (not `main`)

## External References

- Word SaveAs format constants: https://docs.microsoft.com/en-us/dotnet/api/microsoft.office.interop.word.wdsaveformat
- Excel file format constants: https://docs.microsoft.com/en-us/dotnet/api/microsoft.office.interop.excel.xlfileformat
- Word compatibility mode values: https://msdn.microsoft.com/en-us/library/office/ff192388.aspx
- Releases: https://github.com/tobya/DocTo/releases
- Wiki & examples: https://github.com/tobya/DocTo/wiki
40 changes: 40 additions & 0 deletions companion/app/Console/Commands/docto/CreateConfigObject.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<?php

namespace App\Console\Commands\docto;

use Illuminate\Console\Command;
use Illuminate\Support\Facades\Blade;
use Illuminate\Support\Facades\Storage;

class CreateConfigObject extends Command
{
/**
* The name and signature of the console command.
*
* @var string
*/
protected $signature = 'make:configObject {name} {--paramlist="" : List of parameters accepted (comma separated) eg \'-F,--FileType\' }';

/**
* The console command description.
*
* @var string
*/
protected $description = 'Create a Config obeject in DocTo Directory.';

/**
* Execute the console command.
*/
public function handle()
{
$name = $this->argument('name');
$paramlist = str( $this->option('paramlist'))->explode(',');
$pasfile = Blade::render('docto.pasfile.configObject',['paramlist'=>$paramlist, 'name'=>$name]);
$storageDisk = Storage::build([
'driver' => 'local',
'root' => base_path('../src/ParamObjects'),
]);
$storageDisk->put('config' . $name . '.pas', $pasfile);
$this->info('Create New Parameter Config file. ' . 'config' . $name . '.pas' );
}
}
41 changes: 41 additions & 0 deletions companion/resources/views/docto/pasfile/configObject.blade.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
unit config{{$name}};

interface

uses classes, MainUtils, System.Contnrs, SysUtils,
baseConfig;

type
TParam{{$name}} = class(TParamLoader)
public

procedure Load(Converter : TDocumentConverter; Param, Value : String); override;
function ShouldDec : Boolean; override;
class procedure RegisterParameters(List : TStrings);
end;

implementation

{ TParam{{$name}} }

procedure TParam{{$name}}.Load(Converter: TDocumentConverter; Param, Value: String);
begin
//
end;

class procedure TParam{{$name}}.RegisterParameters(List: TStrings);
begin

@foreach($paramlist as $param )
List.AddPair('{{$param}}', TParam{{$name}}.ClassName, TObject(TParam{{$name}}));
@endforeach

end;


function TParam{{$name}}.ShouldDec: Boolean;
begin
Result := false;
end;

end.
2 changes: 1 addition & 1 deletion companion/tests/Feature/Input/InputFilterPestTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
->build();

$output = \Illuminate\Support\Facades\Process::run($doctocmd);
// print_r($output->output());

$outputDirFiles = collect(\Illuminate\Support\Facades\Storage::allFiles($outputfiledir));

expect($outputDirFiles->count())->toBeGreaterThan(0);
Expand Down
80 changes: 80 additions & 0 deletions companion/tests/Feature/Input/InputSubDirPestTest.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<?php


it('can get all files in subdir', function (){
$inputfiledir = 'inputfiles_sub_'. uniqid();
$outputfiledir = 'outputfiles_subdir' . uniqid();
// setup
$testinputfilesdir_temp = Storage::path($inputfiledir);
$testoutputdir_temp = Storage::path($outputfiledir);

Storage::createDirectory($outputfiledir);

$dirfiles = \App\Services\FileGatherService::GatherFiles('plain', $inputfiledir);
$subdirfiles = \App\Services\FileGatherService::GatherFiles('plain', $inputfiledir . '\\subdir');

$allfilestotal = $dirfiles->count() + $subdirfiles->count();

$doctocmd = \App\Services\DocToCommandBuilder::docto()
->add('-WD')
->add('-f', $testinputfilesdir_temp )
->add('-o', $testoutputdir_temp )
->add('-t', 'wdFormatPDF')
->add('-L',10)
->build();

$output = \Illuminate\Support\Facades\Process::run($doctocmd);
// print_r($output->output());
$outputDirFiles = collect(\Illuminate\Support\Facades\Storage::allFiles($outputfiledir));


expect($outputDirFiles->count())->tobe($allfilestotal);

// ensure -ox parameter is used.
$file1 = $outputDirFiles->first();
expect(str($file1)->endsWith('.pdf'))->toBeTrue();

});

it('can get all files in base dir but not subdir', function ($command){
$inputfiledir = 'inputfiles'. uniqid();
$outputfiledir = 'outputfiles_docz' . uniqid();
// setup
$testinputfilesdir_temp = Storage::path($inputfiledir);
$testoutputdir_temp = Storage::path($outputfiledir);

Storage::createDirectory($outputfiledir);

$dirfiles = \App\Services\FileGatherService::GatherFiles('plain', $inputfiledir);
$subdirfiles = \App\Services\FileGatherService::GatherFiles('plain', $inputfiledir . '\\subdir');

$allfilestotal = $dirfiles->count() + $subdirfiles->count();

$doctocmd = \App\Services\DocToCommandBuilder::docto()
->add('-WD')
->add('-f', $testinputfilesdir_temp )
->add('-o', $testoutputdir_temp )
->add('-t', 'wdFormatText')
->add($command) // should not load files from /subdir
->add('-L',10)
->build();

$output = \Illuminate\Support\Facades\Process::run($doctocmd);
// print_r($output->output());
$outputDirFiles = collect(\Illuminate\Support\Facades\Storage::allFiles($outputfiledir));


expect($dirfiles->count())->toBe(5);
expect($outputDirFiles->count())->toBe($dirfiles->count());
expect($outputDirFiles->count())->toBeLessThan($allfilestotal);

// ensure -ox parameter is used.
$file1 = $outputDirFiles->first();
expect(str($file1)->endsWith('.txt'))->toBeTrue();

})->with([
['--NO-RECURSE'],
['--NO-SUBDIR'],
['--NO-SUBDIRS'],

]);
2 changes: 1 addition & 1 deletion companion/tests/Feature/VersionPestTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
$outputString = $result->output();

// find at begining out output
expect(str($outputString)->take(100)->toString())->toContain('DocTo Version: 1.16.46');
expect(str($outputString)->take(100)->toString())->toContain('DocTo Version: 1.16');


});
Expand Down
Binary file modified src/ExtraFiles.res
Binary file not shown.
Loading