diff --git a/README.md b/README.md index 1dc1631..737d553 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,307 @@ -# code-privacy-shield +# Code Privacy Shield -A CLI tool that analyzes code before sending to AI coding assistants and automatically redacts sensitive data like API keys, PII, database connection strings, and environment variables. \ No newline at end of file +A CLI tool that analyzes code before sending to AI coding assistants and automatically redacts sensitive data like API keys, PII, database connection strings, and environment variables. It preserves code structure while protecting secrets, using regex patterns and heuristics to identify and mask sensitive information. + +## Features + +- **API Key Detection**: Automatically detect and redact API keys, tokens, and secrets for common services (OpenAI, GitHub, AWS, Stripe, etc.) +- **PII Detection**: Mask personally identifiable information including emails, phone numbers, SSNs, and credit cards +- **Environment Variable Redaction**: Detect and redact environment variable patterns (`os.environ`, `os.getenv`, `.env` files) +- **Database Connection String Redaction**: Identify and mask database connection strings for PostgreSQL, MySQL, MongoDB, Redis, and more +- **Custom Redaction Rules**: Support user-defined redaction rules via configuration files +- **Preview Mode**: Show what will be redacted without modifying files +- **Code Structure Preservation**: Maintain code syntax and line numbers using position-based replacement +- **Multiple File Support**: Process multiple files and directories with glob patterns +- **Integration Hooks**: Pre-commit hooks and wrapper scripts for AI coding tools + +## Installation + +### From Source + +```bash +# Clone the repository +git clone https://github.com/yourusername/code-privacy-shield.git +cd code-privacy-shield + +# Install in development mode +pip install -e . + +# Or install with dev dependencies +pip install -e ".[dev]" +``` + +### From PyPI (coming soon) + +```bash +pip install code-privacy-shield +``` + +## Usage + +### Basic Redaction + +```bash +# Redact a single file +cps redact myfile.py + +# Redact multiple files +cps redact file1.py file2.py file3.py + +# Redact a directory recursively +cps redact my_project/ +``` + +### Preview Mode + +```bash +# Preview what will be redacted without modifying files +cps redact --preview myfile.py + +# Or use the preview command directly +cps preview myfile.py +``` + +### In-Place Editing + +```bash +# Edit files in place (use with caution!) +cps redact --in-place myfile.py +``` + +### Stdin/Stdout + +```bash +# Pipe code through stdin +echo "api_key = 'sk-abc123'" | cps redact + +# Redirect output to a file +cps redact myfile.py > redacted.py +``` + +### Check for Sensitive Data + +```bash +# Check if a file contains sensitive data (exits 0 if clean, 1 if secrets found) +cps check myfile.py +``` + +### Configuration + +```bash +# Initialize an example config file +cps init-config ~/.config/cps/config.toml + +# Show configuration locations +cps config-locations + +# View current configuration +cps config show + +# Use a custom config file +cps --config /path/to/config.toml redact myfile.py +``` + +## Configuration File + +Code Privacy Shield looks for configuration files in the following order: + +1. `.cps.toml` (project directory) +2. `~/.config/cps/config.toml` (user config) +3. Command line `--config` option + +### Example Configuration + +```toml +[general] +preview_mode = false +quiet_mode = false +preserve_structure = true +recursive = true + +[redaction] +default_replacement = "█" * 8 +preserve_length = false + +[redaction.categories] +api_keys = true +pii = true +database = true +env_var = true +ip = true +authorization = true + +# Custom patterns +[[custom_patterns]] +name = "Internal API Key" +pattern = "(?i)(internal[_-]?api[_-]?key['\"]?\s*[:=]\s*['\"]?)([a-zA-Z0-9_-]{16,})" +category = "internal" + +# Exclude patterns +exclude_patterns = [ + "*.pyc", + "__pycache__", + ".git", + ".svn", + "node_modules", + ".env", + "dist", + "build", +] + +[output] +format = "text" +show_line_numbers = false +color_output = true +``` + +## Output Formats + +### Text (default) + +```bash +cps redact --preview myfile.py +# [api_keys] OpenAI API Key: sk-abc123def456 -> ████████c456 +# [pii] Email Address: user@example.com -> ████████.com +``` + +### JSON + +```bash +cps redact --preview --format json myfile.py +# { +# "matches": [ +# { +# "category": "api_keys", +# "name": "OpenAI API Key", +# "original": "sk-abc123def456", +# "replacement": "████████c456" +# } +# ], +# "total_matches": 1, +# "categories": ["api_keys"] +# } +``` + +## Integration with AI Coding Tools + +### Pre-commit Hook + +Add the provided pre-commit hook to your repository: + +```bash +cp examples/pre-commit-hook.sh .git/hooks/pre-commit +chmod +x .git/hooks/pre-commit +``` + +### Using with Claude Code + +```bash +# Create a wrapper script +cat > /usr/local/bin/claude-safe +#!/bin/bash +# Read input file, redact secrets, then pass to Claude Code +cps redact "$1" | claude "$@" +``` + +### Using with Continue.dev + +Configure in your `~/.continue/config.py`: + +```python +# Add CPS as a preprocessing step +import subprocess + +def preprocess_code(code: str) -> str: + result = subprocess.run( + ["cps", "redact", "--stdin"], + input=code, + capture_output=True, + text=True + ) + return result.stdout +``` + +## Supported Patterns + +### API Keys & Tokens +- OpenAI API Keys (`sk-...`) +- GitHub Tokens (`ghp_...`, `gho_...`) +- AWS Access Keys +- Stripe Keys +- Slack Tokens +- SendGrid Keys +- Twilio Keys +- And more... + +### Personally Identifiable Information +- Email addresses +- Phone numbers +- Social Security Numbers +- Credit card numbers +- Full names +- Addresses +- Passwords +- Usernames + +### Database Connections +- PostgreSQL (`postgresql://...`) +- MySQL (`mysql://...`) +- MongoDB (`mongodb://...`) +- Redis (`redis://...`) +- SQLite +- SQL Server +- And more... + +### Environment Variables +- `os.environ` access +- `os.getenv` calls +- Shell export statements +- `.env` file contents + +### Authorization Headers +- Bearer tokens +- Basic auth +- API key headers +- Custom authorization headers + +## Development + +### Running Tests + +```bash +# Run all tests +pytest tests/ -v + +# Run with coverage +pytest tests/ --cov=src + +# Run specific test file +pytest tests/test_patterns.py -v +``` + +### Adding Custom Patterns + +To add custom patterns, create a configuration file: + +```toml +[[custom_patterns]] +name = "My Custom Secret" +pattern = "(?i)mysecret['\"]?\s*[:=]\s*['\"]?)([a-zA-Z0-9]{16,})" +category = "custom" +``` + +## License + +MIT License - see LICENSE file for details. + +## Contributing + +1. Fork the repository +2. Create a feature branch +3. Add tests for your changes +4. Ensure all tests pass +5. Submit a pull request + +## Security + +If you discover a security vulnerability, please open an issue or contact the maintainers directly. Do not disclose security issues publicly.