Files
dataforge-cli/README.md

373 lines
8.1 KiB
Markdown

# DataForge CLI
A powerful CLI tool for converting and validating data formats (JSON, YAML, TOML) with schema validation using JSON Schema.
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
## Features
- **Format Conversion**: Convert between JSON, YAML, and TOML formats seamlessly
- **Schema Validation**: Validate data against JSON Schema with detailed error messages
- **Type Checking**: Infer and validate data types with schema inference
- **Batch Processing**: Process multiple files at once with pattern matching
- **Quiet Mode**: Minimal output for scripting and automation
## Installation
### From PyPI (Recommended)
```bash
pip install dataforge-cli
```
### From Source
```bash
git clone https://7000pct.gitea.bloupla.net/7000pctAUTO/dataforge-cli.git
cd dataforge-cli
pip install -e ".[dev]"
```
### Development Installation
```bash
pip install -e ".[dev]"
```
## Quick Start
### Convert a file from JSON to YAML
```bash
dataforge convert config.json config.yaml --to yaml
```
### Validate a file against a JSON Schema
```bash
dataforge validate config.json --schema schema.json
```
### Infer type schema from data
```bash
dataforge typecheck config.json --infer
```
### Batch convert all JSON files to YAML
```bash
dataforge batch-convert --to yaml --pattern "*.json"
```
### Validate all config files against a schema
```bash
dataforge batch-validate --schema schema.json --pattern "*.{json,yaml,yml,toml}"
```
## Commands
### convert
Convert a file from one format to another.
```bash
dataforge convert INPUT_FILE OUTPUT_FILE --to FORMAT [--from FORMAT] [--indent N] [--quiet]
```
**Options:**
- `INPUT_FILE`: Input file path (or `-` for stdin)
- `OUTPUT_FILE`: Output file path (or `-` for stdout)
- `--to, -t`: Output format (required: `json`, `yaml`, `toml`)
- `--from, -f`: Input format (auto-detected from extension if not specified)
- `--indent, -i`: Indentation spaces (default: 2, use 0 for compact)
- `--quiet, -q`: Minimal output
**Examples:**
```bash
# Convert JSON to YAML
dataforge convert config.json config.yaml --to yaml
# Convert YAML to TOML with compact output
dataforge convert config.yaml config.toml --to toml --indent 0
# Convert from stdin to stdout
cat config.json | dataforge convert - config.yaml --to yaml
```
### batch-convert
Convert multiple files from one format to another.
```bash
dataforge batch-convert [--from FORMAT] --to FORMAT [--output-dir DIR] [--pattern GLOB] [--recursive] [--quiet]
```
**Options:**
- `--to, -t`: Output format (required)
- `--from, -f`: Input format
- `--output-dir, -o`: Output directory (default: `.`)
- `--pattern, -p`: File pattern for batch processing (default: `*.{json,yaml,yml,toml}`)
- `--recursive, -r`: Search recursively
- `--quiet, -q`: Minimal output
**Examples:**
```bash
# Convert all JSON files to YAML
dataforge batch-convert --to yaml --pattern "*.json"
# Convert all files in configs/ directory
dataforge batch-convert --to yaml --directory configs/ --recursive
# Convert and save to output directory
dataforge batch-convert --to toml --output-dir converted/
```
### validate
Validate a file against a JSON Schema.
```bash
dataforge validate INPUT_FILE [--schema SCHEMA_FILE] [--strict] [--quiet]
```
**Options:**
- `INPUT_FILE`: Input file path (or `-` for stdin)
- `--schema, -s`: Path to JSON Schema file
- `--strict`: Strict validation mode
- `--quiet, -q`: Minimal output
**Examples:**
```bash
# Validate with schema
dataforge validate config.json --schema schema.json
# Validate without schema (check format only)
dataforge validate config.json
# Validate from stdin
cat config.json | dataforge validate - --schema schema.json
```
### batch-validate
Validate multiple files against a JSON Schema.
```bash
dataforge batch-validate --schema SCHEMA_FILE [INPUT_FILES...] [--pattern GLOB] [--recursive] [--quiet]
```
**Options:**
- `--schema, -s`: Path to JSON Schema file (required)
- `--pattern, -p`: File pattern for batch processing (default: `*.{json,yaml,yml,toml}`)
- `--recursive, -r`: Search recursively
- `--quiet, -q`: Minimal output
**Examples:**
```bash
# Validate all config files
dataforge batch-validate --schema schema.json --pattern "*.{json,yaml,yml,toml}"
# Validate specific files
dataforge batch-validate --schema schema.json config1.json config2.yaml config3.toml
# Recursively validate all files
dataforge batch-validate --schema schema.json --recursive
```
### typecheck
Check types in a data file.
```bash
dataforge typecheck INPUT_FILE [--infer] [--quiet]
```
**Options:**
- `INPUT_FILE`: Input file path (or `-` for stdin)
- `--infer`: Infer schema from data and print it
- `--quiet, -q`: Minimal output
**Examples:**
```bash
# Check types in file
dataforge typecheck config.json
# Infer and print schema
dataforge typecheck config.json --infer
# Infer from stdin
cat config.json | dataforge typecheck - --infer
```
### Global Options
- `--help, -h`: Show help message
- `--version`: Show version
## Exit Codes
DataForge uses exit codes to indicate success or failure:
- `0`: Success
- `1`: Validation failed or error occurred
This makes it suitable for use in CI/CD pipelines and scripts.
## Configuration
### Environment Variables
No environment variables are required. All configuration is done via command-line options.
### Project Configuration
DataForge can be configured via `pyproject.toml`:
```toml
[tool.dataforge]
# Custom configuration options (future)
```
## JSON Schema Validation
DataForge supports JSON Schema validation with the following features:
- Draft-07 and Draft 2019-09 schemas
- Detailed error messages with path information
- Support for all JSON Schema keywords
Example schema:
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$"
}
},
"required": ["name", "version"]
}
```
## Use Cases
### Configuration File Validation
Validate all configuration files before deployment:
```bash
dataforge batch-validate --schema config-schema.json --pattern "*.{json,yaml,yml,toml}" --recursive
```
### CI/CD Pipeline Integration
```bash
# Validate configs in CI
dataforge batch-validate --schema schema.json || exit 1
```
### Data Format Conversion
Convert legacy configuration files:
```bash
for file in legacy/*.json; do
dataforge convert "$file" "${file%.json}.yaml" --to yaml
done
```
### API Response Validation
```bash
# Validate API response schema
curl -s https://api.example.com/data | dataforge validate - --schema api-schema.json
```
## Development
### Setting Up Development Environment
```bash
# Clone the repository
git clone https://7000pct.gitea.bloupla.net/7000pctAUTO/dataforge-cli.git
cd dataforge-cli
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest -v
# Run tests with coverage
pytest --cov=dataforge
```
### Running Tests
```bash
# Run all tests
pytest -v
# Run specific test file
pytest tests/test_dataforge_parsers.py -v
# Run with coverage
pytest --cov=dataforge --cov-report=html
```
### Code Style
```bash
# Check code style (if configured)
ruff check .
```
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## Roadmap
- [ ] Support for XML and CSV formats
- [ ] Plugin system for custom validators
- [ ] Configuration file support
- [ ] Improved error reporting with colored output
- [ ] Integration with popular frameworks
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- [Click](https://click.palletsprojects.com/) for the CLI framework
- [PyYAML](https://pyyaml.org/) for YAML support
- [jsonschema](https://python-jsonschema.readthedocs.io/) for JSON Schema validation