Content & I/O Layer¶
Technical reference for WireFlow's content handling: project structure, input processing, context aggregation, and output management.
Project Structure¶
A WireFlow project is identified by the .workflow/ directory:
my-project/
├── .workflow/ # Project root
│ ├── config # Project configuration
│ ├── project.txt # Project description (optional)
│ ├── output/ # Hardlinks to workflow outputs
│ │ └── <name>.<format> # → run/<name>/output.<format>
│ ├── cache/ # Shared conversion cache
│ │ └── conversions/
│ │ ├── office/ # Office→PDF (.docx, .pptx)
│ │ └── images/ # Image resize cache
│ └── run/ # Individual workflows
│ └── <name>/
│ ├── task.txt # Task prompt
│ ├── config # Workflow configuration
│ ├── output.<format> # Current output
│ └── output-TIMESTAMP.<format> # Backups
└── (project files...)
Key Directories¶
| Directory | Purpose | Created By |
|---|---|---|
.workflow/ |
Project root marker | wfw init |
.workflow/output/ |
Hardlinks to outputs | First workflow run |
.workflow/cache/ |
Shared conversion cache | wfw init |
.workflow/run/<name>/ |
Workflow directory | wfw new |
Implementation¶
Project discovery walks up the directory tree looking for .workflow/:
# lib/utils.sh
find_project_root() {
local dir="$PWD"
while [[ "$dir" != "/" ]]; do
[[ -d "$dir/.workflow" ]] && echo "$dir" && return 0
dir="$(dirname "$dir")"
done
return 1
}
Input Processing¶
File Type Detection¶
Implemented in lib/utils.sh:detect_file_type():
Supported Types¶
| Type | Extensions | Processing | API Format |
|---|---|---|---|
| Text | .md, .txt, .json, .py, etc. |
Read UTF-8 | Document block |
.pdf |
Base64 encode | PDF document block | |
| Image | .png, .jpg, .gif, .webp |
Base64 encode | Image block |
| Image (convert) | .svg, .heic, .tiff |
Convert + base64 | Image block |
| Office | .docx, .pptx |
Convert to PDF | PDF document block |
| Binary | (other) | Skipped | N/A |
Image Processing¶
Native formats pass through directly. Conversion formats require ImageMagick:
| Extension | Converts To | Notes |
|---|---|---|
.svg |
PNG | Rasterized |
.heic, .heif |
JPEG | Apple format |
.tiff, .tif |
PNG | Print/scan |
Implementation:
- lib/utils.sh:get_image_media_type() - MIME type lookup
- lib/utils.sh:needs_format_conversion() - Check if conversion needed
- lib/utils.sh:convert_image_format() - ImageMagick conversion
- lib/utils.sh:build_image_content_block() - API block construction
Office Document Processing¶
Office files (.docx, .pptx) are converted to PDF via LibreOffice:
# lib/utils.sh:convert_office_to_pdf()
soffice --convert-to pdf --outdir "$temp_dir" --headless "$source"
Note: .xlsx not supported - export to CSV for tabular data.
Conversion Caching¶
Conversions are cached at project level with hash-based IDs:
.workflow/cache/conversions/
├── office/
│ ├── <hash>.pdf # Converted PDF
│ └── <hash>.pdf.meta # Metadata sidecar
└── images/
├── <hash>.<ext> # Resized image
└── <hash>.<ext>.meta # Metadata sidecar
Cache validation: 1. Check mtime (fast path) 2. Check content hash for files ≤10MB (slow path)
Implementation: lib/utils.sh:generate_cache_id()
Context Aggregation¶
Semantic Separation¶
WireFlow distinguishes two content types:
| Type | Config Variables | Purpose |
|---|---|---|
| Input | INPUT |
Primary documents to analyze |
| Context | CONTEXT, DEPENDS_ON |
Supporting references |
Aggregation Methods¶
Two methods can be combined:
- Config arrays -
INPUT=(data/*.csv report.pdf)(globs expand at source time) - Dependencies -
DEPENDS_ON=(preprocessing)(context only)
Aggregation Order¶
Content is aggregated stable → volatile:
Context Materials:
1. Config CONTEXT (project-relative, already glob-expanded)
2. CLI -cx/--context (PWD-relative)
Dependencies:
1. DEPENDS_ON outputs from .workflow/output/
Input Documents:
1. Config INPUT (project-relative, already glob-expanded)
2. CLI -in/--input or -- <files> (PWD-relative)
Document Ordering¶
Within aggregated content, documents are ordered for optimal API processing:
- PDFs - Context PDFs, then input PDFs (processed first)
- Text - Context text, dependencies, input text
- Images - All sources (Vision API)
- Task - Always last
Rationale: Per Anthropic guidelines, PDFs before text improves processing.
Content Block Arrays¶
All content is organized into typed arrays:
| Array | Purpose | Citable |
|---|---|---|
CONTEXT_PDF_BLOCKS |
PDF files from context | Yes |
INPUT_PDF_BLOCKS |
PDF files from input | Yes |
CONTEXT_BLOCKS |
Text files from context | Yes |
DEPENDENCY_BLOCKS |
Workflow outputs | Yes |
INPUT_BLOCKS |
Text files from input | Yes |
IMAGE_BLOCKS |
Images from all sources | No |
TASK_BLOCK |
Task prompt | N/A |
Path Resolution¶
| Source | Resolution |
|---|---|
| Config variables | Project-relative |
| CLI flags | PWD-relative |
Implementation¶
Content aggregation is handled by a unified function in lib/execute.sh:
This function:
- Processes all context sources (patterns, files, CLI)
- Resolves dependencies from .workflow/output/
- Processes all input sources (patterns, files, CLI)
- Populates the typed block arrays as side effects
- Handles file type detection and conversion
Note: CLI context is processed after CLI input to ensure input files take precedence when files appear in both sources.
Output Handling¶
Output Formats¶
Any text-based format is supported:
| Format | Extension | Post-Processing |
|---|---|---|
| Markdown | .md |
mdformat if available |
| JSON | .json |
jq pretty-print |
| Plain text | .txt |
None |
| HTML, XML, CSV, YAML | .html, etc. |
None |
| Code | .py, .js, etc. |
None |
Format Hints¶
For non-markdown formats, WireFlow appends a hint to the task:
Output Locations¶
| Location | Purpose |
|---|---|
.workflow/run/<name>/output.<format> |
Primary output |
.workflow/output/<name>.<format> |
Hardlink for quick access |
.workflow/run/<name>/output-TIMESTAMP.<format> |
Backup versions |
Hardlink Strategy¶
# Primary location
output_file=".workflow/run/$name/output.$format"
# Create hardlink
ln "$output_file" ".workflow/output/$name.$format"
Benefits: - Visible in file browsers (unlike symlinks) - Single data storage (no duplication) - Atomic updates
Backup Strategy¶
Before overwriting, create timestamped backup:
Post-Processing¶
Markdown: If mdformat available, format output
JSON: If jq available, pretty-print
Implementation¶
Core functions in lib/execute.sh:
write_output() # Write to file, create hardlink
backup_output() # Create timestamped backup
postprocess_output() # Format-specific processing
Prompt Caching¶
Cache Breakpoint Strategy¶
WireFlow places cache breakpoints (cache_control: {type: "ephemeral"}) strategically:
System blocks: 1. After aggregated system prompts (most stable) 2. After project descriptions
User content blocks: 1. After last PDF block (if PDFs exist) 2. After last text/image block (before task)
Benefits: - 90% cost reduction on cache reads - 5-minute default TTL - Minimum 1024 tokens per cached block
Adaptive Logic¶
PDFs present: [PDFs ✓] → [text docs] → [images ✓] → task
No PDFs: [text docs ✓] → [images ✓] → task
Only task: task (no breakpoints)
See Also¶
- Architecture - System design overview
- Execution - Run/task/batch mode processing
- API Layer - Request building and response handling
- Configuration - Config cascade