Command Line Interface¶
The package installs a ddigraph command with subcommands for schema bootstrap, format detection,
and ingestion. The CLI auto-detects DDI format by default, supporting DDI Codebook,
DDI-L FragmentInstance, and DDI-CDI files.
Commands Overview¶
| Command | Description |
|---|---|
bootstrap |
Create constraints and indexes (Codebook + DDI-L by default; pass --no-include-fragments for codebook-only) |
load |
Stream a DDI XML file into Neo4j (auto-detects format) |
detect |
Detect the DDI format of a file without loading it |
version |
Print the installed ddigraph version |
Schema Bootstrap¶
bootstrap creates the indexes and constraints Neo4j needs before
your first load. It is safe to run more than once.
# Codebook + DDI-L FragmentInstance (default)
ddigraph bootstrap --neo4j-uri bolt://db:7687 --neo4j-user neo4j --neo4j-password password
# Codebook only
ddigraph bootstrap --no-include-fragments
Loading Data¶
The load command auto-detects the DDI format and uses the appropriate loader:
# Auto-detect format (default behavior)
ddigraph load /path/to/survey.xml --dataset-id demo
# Explicitly specify format
ddigraph load /path/to/codebook.xml --format codebook --dataset-id demo
ddigraph load /path/to/questionnaire.xml --format lifecycle
# For DDI-L FragmentInstance, --dataset-id is optional
ddigraph load /path/to/fragments.xml
Load Options¶
ddigraph load FILE [OPTIONS]
Options:
--format {auto,codebook,lifecycle,cdi} DDI format (default: auto)
--dataset-id ID Dataset identifier (required for Codebook)
--dataset-name NAME Human-readable dataset name
--chunk-size N Records per batch (default: 200)
--writer-concurrency N Concurrent writer tasks
--dry-run / --validate-only Parse without writing to Neo4j
--replace Clear existing data before loading
--json Output results as JSON
--tune KEY=VALUE Set any Settings field (repeatable)
--config FILE TOML file of Settings fields
Every dedicated flag above has a long form. You do not need to learn a
flag for each setting. --tune KEY=VALUE sets any Settings field by
name, and you can repeat it. --config FILE reads a flat TOML table of
the same fields. A dedicated flag wins over --tune, which wins over
--config, which wins over environment variables.
Examples¶
# Stream a DDI Codebook with ingestion tuning
ddigraph load /path/to/codebook.xml --dataset-id demo --dataset-name "Demo Survey" \
--chunk-size 500 --writer-concurrency 2 --batch-metrics --log-level DEBUG
# Validate a load without writing (parsing and Cypher plans only)
ddigraph load /path/to/codebook.xml --dataset-id demo --dry-run
# Purge an existing dataset before reloading
ddigraph load /path/to/codebook.xml --dataset-id demo --replace
# Load DDI-L FragmentInstance with JSON output
ddigraph load /path/to/questionnaire.xml --json
# Set any setting without a dedicated flag
ddigraph load /path/to/codebook.xml --dataset-id demo \
--tune chunk_size=500 --tune strict_parsing=true
# Or keep the same settings in a TOML file
ddigraph load /path/to/codebook.xml --dataset-id demo --config tuning.toml
Format Detection¶
Detect the DDI format of a file without loading it:
ddigraph detect /path/to/survey.xml
# Output:
# Format: lifecycle
# File: /path/to/survey.xml
ddigraph detect /path/to/survey.xml --json
# Output: {"path": "/path/to/survey.xml", "format": "lifecycle"}
ddigraph detect /path/to/cdi-metadata.xml
# Output:
# Format: cdi
# File: /path/to/cdi-metadata.xml
The detect_ddi_format() function returns one of three values: "codebook", "lifecycle", or
"cdi". The is_cdi_format() utility function is also available for CDI-specific detection.
Environment Variables¶
Export Neo4j connection details from your shell or a .env file:
export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=secret
export DDIGRAPH_NEO4J_DATABASE=neo4j # optional, defaults to "neo4j"
Complete Flag and Environment Variable Mapping¶
Every CLI flag maps 1:1 to a DDIGRAPH_ environment variable. Booleans accept truthy/falsy
strings (true/false, 1/0).
Connection Options¶
| CLI Flag | Environment Variable | Description |
|---|---|---|
--neo4j-uri |
DDIGRAPH_NEO4J_URI |
Neo4j bolt/s URI |
--neo4j-user |
DDIGRAPH_NEO4J_USER |
Neo4j username |
--neo4j-password |
DDIGRAPH_NEO4J_PASSWORD |
Neo4j password |
--neo4j-database |
DDIGRAPH_NEO4J_DATABASE |
Target database (default: neo4j) |
Driver Pooling¶
| CLI Flag | Environment Variable | Description |
|---|---|---|
--max-connection-pool-size |
DDIGRAPH_MAX_CONNECTION_POOL_SIZE |
Max pooled connections |
--connection-timeout |
DDIGRAPH_CONNECTION_TIMEOUT |
Connection open timeout (seconds) |
--max-connection-lifetime |
DDIGRAPH_MAX_CONNECTION_LIFETIME |
Pool lifetime (seconds) |
--session-timeout |
DDIGRAPH_SESSION_TIMEOUT |
Session lifetime (seconds) |
--transaction-timeout |
DDIGRAPH_TRANSACTION_TIMEOUT |
Server-side transaction timeout |
TLS Options¶
| CLI Flag | Environment Variable | Description |
|---|---|---|
--encrypted |
DDIGRAPH_ENCRYPTED |
Require TLS connections |
--verify-hostname |
DDIGRAPH_VERIFY_HOSTNAME |
Verify TLS hostname |
--trusted-certificates |
DDIGRAPH_TRUSTED_CERTIFICATES |
Trust policy (e.g., TRUST_ALL_CERTIFICATES) |
--trusted-certificates-file |
DDIGRAPH_TRUSTED_CERTIFICATES_FILE |
PEM bundle path |
Ingestion Tuning¶
| CLI Flag | Environment Variable | Description |
|---|---|---|
--queue-maxsize |
DDIGRAPH_QUEUE_MAXSIZE |
Back-pressure threshold (batches) |
--chunk-size |
DDIGRAPH_CHUNK_SIZE |
Records per batch |
--writer-concurrency |
DDIGRAPH_WRITER_CONCURRENCY |
Concurrent writer tasks |
--batch-metrics |
DDIGRAPH_BATCH_METRICS |
Emit per-batch metrics |
--strict-parsing |
DDIGRAPH_STRICT_PARSING |
Fail on XML syntax errors |
--dry-run / --validate-only |
DDIGRAPH_DRY_RUN |
Parse without writing |
--replace |
DDIGRAPH_REPLACE |
Purge dataset before loading |
Retry Settings¶
| CLI Flag | Environment Variable | Description |
|---|---|---|
--write-retry-attempts |
DDIGRAPH_WRITE_RETRY_ATTEMPTS |
Total retry attempts |
--write-retry-base-delay |
DDIGRAPH_WRITE_RETRY_BASE_DELAY |
Base backoff delay (seconds) |
--write-retry-jitter |
DDIGRAPH_WRITE_RETRY_JITTER |
Max jitter (seconds) |
Logging¶
| CLI Flag | Environment Variable | Description |
|---|---|---|
--log-level |
DDIGRAPH_LOG_LEVEL |
Logging verbosity (DEBUG, INFO, etc.) |
--metrics-namespace |
DDIGRAPH_METRICS_NAMESPACE |
Metrics prefix |
TLS Configuration Examples¶
# AuraDB (encryption on; rely on platform/system CAs)
DDIGRAPH_NEO4J_URI=neo4j+s://<your-aura-host>:7687 \
ddigraph bootstrap --encrypted
# Self-signed certificate from a private Neo4j deployment
DDIGRAPH_ENCRYPTED=true \
DDIGRAPH_TRUSTED_CERTIFICATES_FILE=/etc/ssl/certs/private-ca.pem \
ddigraph load /path/to/codebook.xml --dataset-id demo
Retry Configuration¶
Tune retry behavior for different network conditions:
# Tighten retries for fast failure when the cluster is healthy
ddigraph load /path/to/codebook.xml --dataset-id demo \
--write-retry-attempts 2 --write-retry-base-delay 0.1 --write-retry-jitter 0
# Loosen retries to survive intermittent packet loss
DDIGRAPH_WRITE_RETRY_ATTEMPTS=5 \
DDIGRAPH_WRITE_RETRY_BASE_DELAY=1.0 \
DDIGRAPH_WRITE_RETRY_JITTER=0.5 \
ddigraph load /path/to/codebook.xml --dataset-id demo
Combined Examples¶
Copy/pasteable snippets for common operational setups:
# Hard cap transaction duration and retry with jitter
ddigraph load /path/to/codebook.xml --dataset-id demo \
--transaction-timeout 15 --write-retry-attempts 5 \
--write-retry-base-delay 0.5 --write-retry-jitter 0.25
# Batch-level observability with strict parsing
DDIGRAPH_BATCH_METRICS=true \
ddigraph load /path/to/codebook.xml --dataset-id demo \
--strict-parsing --chunk-size 500 --queue-maxsize 4
# Load DDI-L FragmentInstance with full schema bootstrap
ddigraph bootstrap
ddigraph load /path/to/questionnaire.xml --chunk-size 300
# Validate DDI-L file without writing
ddigraph load /path/to/questionnaire.xml --dry-run --json
Behavior Notes¶
- Format auto-detection: When
--format auto(the default), the CLI inspects the XML root element to determine Codebook, FragmentInstance, or CDI format. - Dataset ID validation: For Codebook format,
--dataset-idis required. For FragmentInstance, it's optional (fragments are self-identifying). - Dry-run and replace: When
--dry-runis enabled,--replaceis ignored (no data is modified). - Strict vs. forgiving parsing: Default forgiving mode enables XML recovery to stream past
malformed markup. Enable
--strict-parsingto fail fast on syntax errors.
See Architecture and DDI-L FragmentInstance for design context.