# DataForge CLI A powerful CLI tool for converting and validating data formats (JSON, YAML, TOML) with schema validation using JSON Schema. [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) ## Features - **Format Conversion**: Convert between JSON, YAML, and TOML formats seamlessly - **Schema Validation**: Validate data against JSON Schema with detailed error messages - **Type Checking**: Infer and validate data types with schema inference - **Batch Processing**: Process multiple files at once with pattern matching - **Quiet Mode**: Minimal output for scripting and automation ## Installation ### From PyPI (Recommended) ```bash pip install dataforge-cli ``` ### From Source ```bash git clone https://7000pct.gitea.bloupla.net/7000pctAUTO/dataforge-cli.git cd dataforge-cli pip install -e ".[dev]" ``` ### Development Installation ```bash pip install -e ".[dev]" ``` ## Quick Start ### Convert a file from JSON to YAML ```bash dataforge convert config.json config.yaml --to yaml ``` ### Validate a file against a JSON Schema ```bash dataforge validate config.json --schema schema.json ``` ### Infer type schema from data ```bash dataforge typecheck config.json --infer ``` ### Batch convert all JSON files to YAML ```bash dataforge batch-convert --to yaml --pattern "*.json" ``` ### Validate all config files against a schema ```bash dataforge batch-validate --schema schema.json --pattern "*.{json,yaml,yml,toml}" ``` ## Commands ### convert Convert a file from one format to another. ```bash dataforge convert INPUT_FILE OUTPUT_FILE --to FORMAT [--from FORMAT] [--indent N] [--quiet] ``` **Options:** - `INPUT_FILE`: Input file path (or `-` for stdin) - `OUTPUT_FILE`: Output file path (or `-` for stdout) - `--to, -t`: Output format (required: `json`, `yaml`, `toml`) - `--from, -f`: Input format (auto-detected from extension if not specified) - `--indent, -i`: Indentation spaces (default: 2, use 0 for compact) - `--quiet, -q`: Minimal output **Examples:** ```bash # Convert JSON to YAML dataforge convert config.json config.yaml --to yaml # Convert YAML to TOML with compact output dataforge convert config.yaml config.toml --to toml --indent 0 # Convert from stdin to stdout cat config.json | dataforge convert - config.yaml --to yaml ``` ### batch-convert Convert multiple files from one format to another. ```bash dataforge batch-convert [--from FORMAT] --to FORMAT [--output-dir DIR] [--pattern GLOB] [--recursive] [--quiet] ``` **Options:** - `--to, -t`: Output format (required) - `--from, -f`: Input format - `--output-dir, -o`: Output directory (default: `.`) - `--pattern, -p`: File pattern for batch processing (default: `*.{json,yaml,yml,toml}`) - `--recursive, -r`: Search recursively - `--quiet, -q`: Minimal output **Examples:** ```bash # Convert all JSON files to YAML dataforge batch-convert --to yaml --pattern "*.json" # Convert all files in configs/ directory dataforge batch-convert --to yaml --directory configs/ --recursive # Convert and save to output directory dataforge batch-convert --to toml --output-dir converted/ ``` ### validate Validate a file against a JSON Schema. ```bash dataforge validate INPUT_FILE [--schema SCHEMA_FILE] [--strict] [--quiet] ``` **Options:** - `INPUT_FILE`: Input file path (or `-` for stdin) - `--schema, -s`: Path to JSON Schema file - `--strict`: Strict validation mode - `--quiet, -q`: Minimal output **Examples:** ```bash # Validate with schema dataforge validate config.json --schema schema.json # Validate without schema (check format only) dataforge validate config.json # Validate from stdin cat config.json | dataforge validate - --schema schema.json ``` ### batch-validate Validate multiple files against a JSON Schema. ```bash dataforge batch-validate --schema SCHEMA_FILE [INPUT_FILES...] [--pattern GLOB] [--recursive] [--quiet] ``` **Options:** - `--schema, -s`: Path to JSON Schema file (required) - `--pattern, -p`: File pattern for batch processing (default: `*.{json,yaml,yml,toml}`) - `--recursive, -r`: Search recursively - `--quiet, -q`: Minimal output **Examples:** ```bash # Validate all config files dataforge batch-validate --schema schema.json --pattern "*.{json,yaml,yml,toml}" # Validate specific files dataforge batch-validate --schema schema.json config1.json config2.yaml config3.toml # Recursively validate all files dataforge batch-validate --schema schema.json --recursive ``` ### typecheck Check types in a data file. ```bash dataforge typecheck INPUT_FILE [--infer] [--quiet] ``` **Options:** - `INPUT_FILE`: Input file path (or `-` for stdin) - `--infer`: Infer schema from data and print it - `--quiet, -q`: Minimal output **Examples:** ```bash # Check types in file dataforge typecheck config.json # Infer and print schema dataforge typecheck config.json --infer # Infer from stdin cat config.json | dataforge typecheck - --infer ``` ### Global Options - `--help, -h`: Show help message - `--version`: Show version ## Exit Codes DataForge uses exit codes to indicate success or failure: - `0`: Success - `1`: Validation failed or error occurred This makes it suitable for use in CI/CD pipelines and scripts. ## Configuration ### Environment Variables No environment variables are required. All configuration is done via command-line options. ### Project Configuration DataForge can be configured via `pyproject.toml`: ```toml [tool.dataforge] # Custom configuration options (future) ``` ## JSON Schema Validation DataForge supports JSON Schema validation with the following features: - Draft-07 and Draft 2019-09 schemas - Detailed error messages with path information - Support for all JSON Schema keywords Example schema: ```json { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "name": { "type": "string" }, "version": { "type": "string", "pattern": "^\\d+\\.\\d+\\.\\d+$" } }, "required": ["name", "version"] } ``` ## Use Cases ### Configuration File Validation Validate all configuration files before deployment: ```bash dataforge batch-validate --schema config-schema.json --pattern "*.{json,yaml,yml,toml}" --recursive ``` ### CI/CD Pipeline Integration ```bash # Validate configs in CI dataforge batch-validate --schema schema.json || exit 1 ``` ### Data Format Conversion Convert legacy configuration files: ```bash for file in legacy/*.json; do dataforge convert "$file" "${file%.json}.yaml" --to yaml done ``` ### API Response Validation ```bash # Validate API response schema curl -s https://api.example.com/data | dataforge validate - --schema api-schema.json ``` ## Development ### Setting Up Development Environment ```bash # Clone the repository git clone https://7000pct.gitea.bloupla.net/7000pctAUTO/dataforge-cli.git cd dataforge-cli # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install in development mode pip install -e ".[dev]" # Run tests pytest -v # Run tests with coverage pytest --cov=dataforge ``` ### Running Tests ```bash # Run all tests pytest -v # Run specific test file pytest tests/test_dataforge_parsers.py -v # Run with coverage pytest --cov=dataforge --cov-report=html ``` ### Code Style ```bash # Check code style (if configured) ruff check . ``` ## Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## Roadmap - [ ] Support for XML and CSV formats - [ ] Plugin system for custom validators - [ ] Configuration file support - [ ] Improved error reporting with colored output - [ ] Integration with popular frameworks ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Acknowledgments - [Click](https://click.palletsprojects.com/) for the CLI framework - [PyYAML](https://pyyaml.org/) for YAML support - [jsonschema](https://python-jsonschema.readthedocs.io/) for JSON Schema validation