feat: initial commit for regex-humanizer-cli

- Add regex parser, translator, and test generator
- Add CLI with explain, test, interactive commands
- Add multi-flavor support (PCRE, JavaScript, Python)
- Add Gitea Actions CI workflow
- Add comprehensive README documentation
This commit is contained in:
CI Bot
2026-02-06 03:02:57 +00:00
commit 52e792305b
12 changed files with 2413 additions and 0 deletions

View File

@@ -0,0 +1,38 @@
name: CI
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
test:
runs-on: ubuntu-latest
timeout: 600
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -e .
python -m pip install pytest pytest-cov ruff
- name: Run tests
run: python -m pytest tests/ -v --tb=short
- name: Run linting
run: python -m ruff check regex_humanizer/
- name: Run type checking
run: python -m pip install mypy && python -m mypy regex_humanizer/ --ignore-missing-imports

171
README.md Normal file
View File

@@ -0,0 +1,171 @@
# Regex Humanizer CLI
A CLI tool that converts complex regex patterns to human-readable English descriptions and generates comprehensive test cases.
## Features
- **Regex to English Translation**: Convert any regex pattern to plain English
- **Test Case Generation**: Auto-generate matching and non-matching test inputs
- **Multi-Flavor Support**: Supports PCRE, JavaScript, and Python regex flavors
- **Interactive Mode**: REPL-style interface for exploring regex patterns
- **Pattern Validation**: Validate regex patterns for different flavors
- **Flavor Conversion**: Convert patterns between different regex flavors
## Installation
```bash
pip install regex-humanizer-cli
```
Or from source:
```bash
pip install -e .
```
## Quick Start
### Explain a regex pattern
```bash
regex-humanizer explain "^\d{3}-\d{4}$"
```
Output:
```
Pattern: ^\d{3}-\d{4}$
Flavor: pcre
English Explanation:
--------------------------------------------------
at the start of line or stringany digit (0-9)any digit (0-9)any digit (0-9)hyphenany digit (0-9)any digit (0-9)any digit (0-9)any digit (0-9)at the end of line or string
```
### Generate test cases
```bash
regex-humanizer test "^[a-z]+$"
```
Output:
```
Pattern: ^[a-z]+$
Flavor: pcre
Matching strings (should match the pattern):
--------------------------------------------------
1. abc
2. hello
3. world
Non-matching strings (should NOT match the pattern):
--------------------------------------------------
1. 123
2. Hello
3. test123
```
### Interactive mode
```bash
regex-humanizer interactive
```
## Commands
### explain
Explain a regex pattern in human-readable English:
```bash
regex-humanizer explain "PATTERN" [OPTIONS]
```
Options:
- `--output, -o`: Output format (text/json, default: text)
- `--verbose, -v`: Show detailed breakdown
- `--flavor, -f`: Regex flavor (pcre/javascript/python)
### test
Generate test cases for a regex pattern:
```bash
regex-humanizer test "PATTERN" [OPTIONS]
```
Options:
- `--output, -o`: Output format (text/json, default: text)
- `--count, -n`: Number of test cases (default: 5)
### interactive
Start an interactive REPL for exploring regex patterns:
```bash
regex-humanizer interactive [OPTIONS]
```
Options:
- `--flavor, -f`: Default regex flavor
### flavors
List available regex flavors:
```bash
regex-humanizer flavors
```
### validate
Validate a regex pattern:
```bash
regex-humanizer validate "PATTERN" [OPTIONS]
```
Options:
- `--flavor, -f`: Specific flavor to validate against
### convert
Convert a regex pattern between flavors:
```bash
regex-humanizer convert "PATTERN" --from-flavor pcre --to-flavor javascript
```
## Flavor Support
| Feature | PCRE | JavaScript | Python |
|---------|------|------------|--------|
| Lookahead | ✅ | ✅ | ✅ |
| Lookbehind | ✅ | ⚠️ Limited | ✅ |
| Named Groups | ✅ | ✅ | ✅ |
| Possessive Quantifiers | ✅ | ❌ | ❌ |
| Atomic Groups | ✅ | ❌ | ❌ |
## Configuration
No configuration file required. All options can be passed via command line.
## Development
```bash
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run linting
ruff check regex_humanizer/
# Run type checking
mypy regex_humanizer/ --ignore-missing-imports
```
## License
MIT License

60
pyproject.toml Normal file
View File

@@ -0,0 +1,60 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "regex-humanizer-cli"
version = "1.0.0"
description = "A CLI tool that converts complex regex patterns to human-readable English descriptions and generates comprehensive test cases"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.9"
authors = [
{name = "Regex Humanizer Contributors"}
]
keywords = ["regex", "regular-expression", "cli", "humanizer", "testing"]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dependencies = [
"click>=8.0",
"regex>=2023.0",
"parsimonious>=0.10.0",
"pygments>=2.15",
]
[project.optional-dependencies]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"black>=23.0",
"ruff>=0.1.0",
]
[project.scripts]
regex-humanizer = "regex_humanizer.cli:main"
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
addopts = "-v --tb=short"
[tool.black]
line-length = 100
target-version = ['py39']
[tool.ruff]
line-length = 100
target-version = "py39"
[tool.setuptools.packages.find]
where = ["."]
include = ["regex_humanizer*"]

View File

@@ -0,0 +1,3 @@
"""Regex Humanizer CLI - Convert regex patterns to human-readable English."""
__version__ = "1.0.0"

280
regex_humanizer/cli.py Normal file
View File

@@ -0,0 +1,280 @@
"""Command-line interface for Regex Humanizer."""
import json
import sys
import click
from .parser import parse_regex
from .translator import translate_regex
from .test_generator import generate_test_cases
from .flavors import get_flavor_manager
from .interactive import start_interactive_mode
@click.group()
@click.option(
"--flavor",
type=click.Choice(["pcre", "javascript", "python"]),
default="pcre",
help="Regex flavor to use",
)
@click.pass_context
def main(ctx: click.Context, flavor: str):
"""Regex Humanizer CLI - Convert regex patterns to human-readable English and generate test cases."""
ctx.ensure_object(dict)
ctx.obj["flavor"] = flavor
@main.command("explain")
@click.argument("pattern", type=str)
@click.option(
"--output",
"-o",
type=click.Choice(["text", "json"]),
default="text",
help="Output format",
)
@click.option(
"--verbose",
"-v",
is_flag=True,
help="Show detailed breakdown",
)
@click.option(
"--flavor",
"-f",
type=click.Choice(["pcre", "javascript", "python"]),
default=None,
help="Regex flavor to use",
)
@click.pass_context
def explain(ctx: click.Context, pattern: str, output: str, verbose: bool, flavor: str):
"""Explain a regex pattern in human-readable English."""
if ctx.obj is None:
ctx.obj = {}
flavor = flavor or ctx.obj.get("flavor", "pcre")
try:
ast = parse_regex(pattern, flavor)
translation = translate_regex(pattern, flavor)
if output == "json":
result = {
"pattern": pattern,
"flavor": flavor,
"explanation": translation,
"verbose": {
"node_count": len(get_all_nodes(ast)),
"features": identify_features(ast),
} if verbose else None,
}
click.echo(json.dumps(result, indent=2))
else:
click.echo(f"\nPattern: {pattern}")
click.echo(f"Flavor: {flavor}")
click.echo("\nEnglish Explanation:")
click.echo("-" * 50)
click.echo(translation)
click.echo()
if verbose:
features = identify_features(ast)
click.echo("\nFeatures detected:")
for feature in features:
click.echo(f" - {feature}")
except Exception as e:
click.echo(f"Error: {e}", err=True)
sys.exit(1)
@main.command("test")
@click.argument("pattern", type=str)
@click.option(
"--output",
"-o",
type=click.Choice(["text", "json"]),
default="text",
help="Output format",
)
@click.option(
"--count",
"-n",
type=int,
default=5,
help="Number of test cases to generate",
)
@click.pass_context
def test(ctx: click.Context, pattern: str, output: str, count: int):
"""Generate test cases (matching and non-matching) for a regex pattern."""
if ctx.obj is None:
ctx.obj = {}
flavor = ctx.obj.get("flavor", "pcre")
try:
result = generate_test_cases(
pattern,
flavor,
matching_count=count,
non_matching_count=count
)
if output == "json":
click.echo(json.dumps(result, indent=2))
else:
click.echo(f"\nPattern: {pattern}")
click.echo(f"Flavor: {flavor}")
click.echo("\nMatching strings (should match the pattern):")
click.echo("-" * 50)
for i, s in enumerate(result["matching"], 1):
click.echo(f" {i}. {s}")
click.echo("\nNon-matching strings (should NOT match the pattern):")
click.echo("-" * 50)
for i, s in enumerate(result["non_matching"], 1):
click.echo(f" {i}. {s}")
click.echo()
except Exception as e:
click.echo(f"Error: {e}", err=True)
sys.exit(1)
@main.command("interactive")
@click.option(
"--flavor",
"-f",
type=click.Choice(["pcre", "javascript", "python"]),
default="pcre",
help="Regex flavor to use",
)
@click.pass_context
def interactive(ctx: click.Context, flavor: str):
"""Start an interactive REPL for exploring regex patterns."""
start_interactive_mode(flavor=flavor)
@main.command("flavors")
@click.pass_context
def flavors(ctx: click.Context):
"""List available regex flavors."""
manager = get_flavor_manager()
flavor_list = manager.list_flavors()
click.echo("\nAvailable Regex Flavors:")
click.echo("-" * 50)
for name, desc in flavor_list:
click.echo(f"\n {name}:")
click.echo(f" {desc}")
click.echo()
@main.command("validate")
@click.argument("pattern", type=str)
@click.option(
"--flavor",
"-f",
type=click.Choice(["pcre", "javascript", "python"]),
default=None,
help="Specific flavor to validate against",
)
@click.pass_context
def validate(ctx: click.Context, pattern: str, flavor: str):
"""Validate a regex pattern."""
if ctx.obj is None:
ctx.obj = {}
check_flavor = flavor or ctx.obj.get("flavor", "pcre")
try:
ast = parse_regex(pattern, check_flavor)
click.echo(f"\nPattern: {pattern}")
click.echo(f"Flavor: {check_flavor}")
click.echo("\nValidation: PASSED")
click.echo(f"AST node count: {len(get_all_nodes(ast))}")
except Exception as e:
click.echo(f"\nPattern: {pattern}")
click.echo("Validation: FAILED")
click.echo(f"Error: {e}")
sys.exit(1)
@main.command("convert")
@click.argument("pattern", type=str)
@click.option(
"--from-flavor",
"-s",
type=click.Choice(["pcre", "javascript", "python"]),
default="pcre",
help="Source flavor",
)
@click.option(
"--to-flavor",
"-t",
type=click.Choice(["pcre", "javascript", "python"]),
default="javascript",
help="Target flavor",
)
@click.pass_context
def convert(ctx: click.Context, pattern: str, from_flavor: str, to_flavor: str):
"""Convert a regex pattern between flavors."""
manager = get_flavor_manager()
converted, warnings = manager.convert(pattern, from_flavor, to_flavor)
click.echo(f"\nOriginal ({from_flavor}): {pattern}")
click.echo(f"Converted ({to_flavor}): {converted}")
if warnings:
click.echo("\nWarnings:")
for warning in warnings:
click.echo(f" - {warning}")
def get_all_nodes(ast) -> list:
"""Get all nodes from AST."""
nodes = [ast]
for child in getattr(ast, 'children', []):
nodes.extend(get_all_nodes(child))
return nodes
def identify_features(ast) -> list[str]:
"""Identify features in a regex pattern."""
features = []
nodes = get_all_nodes(ast)
node_types = set(n.node_type.name for n in nodes)
if "LOOKAHEAD" in node_types or "NEGATIVE_LOOKAHEAD" in node_types:
features.append("Lookahead assertions")
if "LOOKBEHIND" in node_types or "NEGATIVE_LOOKBEHIND" in node_types:
features.append("Lookbehind assertions")
if "NAMED_GROUP" in node_types:
features.append("Named groups")
if "CAPTURING_GROUP" in node_types:
features.append("Capturing groups")
if "NON_CAPTURING_GROUP" in node_types:
features.append("Non-capturing groups")
if "QUANTIFIER" in node_types:
features.append("Quantifiers")
for n in nodes:
if n.node_type.name == "QUANTIFIER" and n.is_lazy:
features.append("Lazy quantifiers")
break
if n.node_type.name == "QUANTIFIER" and n.is_possessive:
features.append("Possessive quantifiers")
break
if "POSITIVE_SET" in node_types or "NEGATIVE_SET" in node_types:
features.append("Character classes")
if "ANCHOR_START" in node_types or "ANCHOR_END" in node_types:
features.append("Anchors")
if "DIGIT" in node_types or "WORD_CHAR" in node_types or "WHITESPACE" in node_types:
features.append("Shorthand character classes")
if "BACKREFERENCE" in node_types:
features.append("Backreferences")
return features
if __name__ == "__main__":
main()

207
regex_humanizer/flavors.py Normal file
View File

@@ -0,0 +1,207 @@
"""Flavor support system for different regex flavors."""
from abc import ABC, abstractmethod
from typing import Optional
import re
class RegexFlavor(ABC):
"""Base class for regex flavors."""
@property
@abstractmethod
def name(self) -> str:
"""Return the flavor name."""
pass
@property
@abstractmethod
def description(self) -> str:
"""Return a description of the flavor."""
pass
@abstractmethod
def normalize(self, pattern: str) -> tuple[str, list[str]]:
"""Normalize a pattern to this flavor, returning warnings."""
pass
@abstractmethod
def get_flags(self) -> int:
"""Return regex flags for this flavor."""
pass
@abstractmethod
def supports_feature(self, feature: str) -> bool:
"""Check if a feature is supported."""
pass
class PCREFlavor(RegexFlavor):
"""PCRE (Perl Compatible Regular Expressions) flavor."""
@property
def name(self) -> str:
return "pcre"
@property
def description(self) -> str:
return "PCRE - Full feature set with possessive quantifiers, lookbehinds, and all Perl extensions"
def normalize(self, pattern: str) -> tuple[str, list[str]]:
warnings = []
normalized = pattern
return normalized, warnings
def get_flags(self) -> int:
return re.MULTILINE
def supports_feature(self, feature: str) -> bool:
supported = {
"lookahead": True,
"lookbehind": True,
"named_groups": True,
"non_capturing_groups": True,
"possessive_quantifiers": True,
"atomic_groups": True,
"comment_syntax": True,
"inline_flags": True,
"recursion": True,
"subroutine_references": True,
}
return supported.get(feature, False)
class JavaScriptFlavor(RegexFlavor):
"""JavaScript regex flavor."""
@property
def name(self) -> str:
return "javascript"
@property
def description(self) -> str:
return "JavaScript/ECMAScript - Limited lookbehind support, dotAll flag needed for . matching newlines"
def normalize(self, pattern: str) -> tuple[str, list[str]]:
warnings = []
normalized = pattern
normalized = normalized.replace("(?P<", "(?<")
while "\\k<" in normalized:
normalized = normalized.replace("\\k<", "\\k")
warnings.append("Note: Some PCRE features may not work in JavaScript")
return normalized, warnings
def get_flags(self) -> int:
return 0
def supports_feature(self, feature: str) -> bool:
supported = {
"lookahead": True,
"lookbehind": True,
"named_groups": True,
"non_capturing_groups": True,
"possessive_quantifiers": False,
"atomic_groups": False,
"comment_syntax": False,
"inline_flags": False,
"recursion": False,
"subroutine_references": False,
}
return supported.get(feature, False)
class PythonFlavor(RegexFlavor):
"""Python re module regex flavor."""
@property
def name(self) -> str:
return "python"
@property
def description(self) -> str:
return "Python re module - Full Unicode support, named groups, and most PCRE features"
def normalize(self, pattern: str) -> tuple[str, list[str]]:
warnings = []
normalized = pattern
normalized = normalized.replace("(?P<", "(?<")
return normalized, warnings
def get_flags(self) -> int:
return re.MULTILINE | re.UNICODE
def supports_feature(self, feature: str) -> bool:
supported = {
"lookahead": True,
"lookbehind": True,
"named_groups": True,
"non_capturing_groups": True,
"possessive_quantifiers": False,
"atomic_groups": False,
"comment_syntax": True,
"inline_flags": True,
"recursion": False,
"subroutine_references": False,
}
return supported.get(feature, False)
class FlavorManager:
"""Manages regex flavors and their adapters."""
def __init__(self):
self._flavors: dict[str, RegexFlavor] = {}
self._register_default_flavors()
def _register_default_flavors(self):
"""Register the default flavors."""
self.register_flavor(PCREFlavor())
self.register_flavor(JavaScriptFlavor())
self.register_flavor(PythonFlavor())
def register_flavor(self, flavor: RegexFlavor):
"""Register a new flavor."""
self._flavors[flavor.name] = flavor
def get_flavor(self, name: str) -> Optional[RegexFlavor]:
"""Get a flavor by name."""
return self._flavors.get(name)
def list_flavors(self) -> list[tuple[str, str]]:
"""List all available flavors."""
return [(name, flavor.description) for name, flavor in self._flavors.items()]
def convert(
self,
pattern: str,
from_flavor: str,
to_flavor: str
) -> tuple[str, list[str]]:
"""Convert a pattern from one flavor to another."""
source = self.get_flavor(from_flavor)
target = self.get_flavor(to_flavor)
if not source:
return pattern, [f"Unknown source flavor: {from_flavor}"]
if not target:
return pattern, [f"Unknown target flavor: {to_flavor}"]
normalized, warnings = source.normalize(pattern)
result, convert_warnings = target.normalize(normalized)
return result, warnings + convert_warnings
def get_flavor_manager() -> FlavorManager:
"""Get the global flavor manager instance."""
return FlavorManager()
def get_available_flavors() -> list[str]:
"""Get a list of available flavor names."""
return ["pcre", "javascript", "python"]

View File

@@ -0,0 +1,289 @@
"""Interactive REPL mode for exploring regex patterns."""
import sys
import os
from .translator import translate_regex
from .test_generator import generate_test_cases
from .flavors import get_flavor_manager
def format_output(text: str, use_color: bool = True) -> str:
"""Format output with optional color."""
if not use_color or not sys.stdout.isatty():
return text
try:
from pygments import highlight
from pygments.lexers import RegexLexer
from pygments.formatters import TerminalFormatter
lexer = RegexLexer()
formatter = TerminalFormatter()
return highlight(text, lexer, formatter)
except ImportError:
return text
class InteractiveSession:
"""Interactive session for regex exploration."""
def __init__(self, flavor: str = "pcre", use_color: bool = True):
self.flavor = flavor
self.use_color = use_color
self.history: list[str] = []
self.history_file = os.path.expanduser("~/.regex_humanizer_history")
self._load_history()
def _load_history(self):
"""Load command history from file."""
if os.path.exists(self.history_file):
try:
with open(self.history_file, 'r') as f:
self.history = [line.strip() for line in f if line.strip()]
except Exception:
self.history = []
def _save_history(self):
"""Save command history to file."""
try:
os.makedirs(os.path.dirname(self.history_file), exist_ok=True)
with open(self.history_file, 'w') as f:
for cmd in self.history[-1000:]:
f.write(cmd + '\n')
except Exception:
pass
def run(self):
"""Run the interactive session."""
print("\nRegex Humanizer - Interactive Mode")
print("Type 'help' for available commands, 'quit' to exit.\n")
while True:
try:
import click
user_input = click.prompt(
"regex> ",
type=str,
default="",
show_default=False
)
if not user_input.strip():
continue
self.history.append(user_input)
self._save_history()
self._process_command(user_input.strip())
except (KeyboardInterrupt, EOFError):
print("\nGoodbye!")
break
def _process_command(self, command: str):
"""Process a user command."""
parts = command.split(None, 1)
cmd = parts[0].lower()
args = parts[1] if len(parts) > 1 else ""
commands = {
"help": self._cmd_help,
"quit": self._cmd_quit,
"exit": self._cmd_quit,
"explain": self._cmd_explain,
"test": self._cmd_test,
"flavor": self._cmd_flavor,
"set": self._cmd_flavor,
"load": self._cmd_load,
"save": self._cmd_save,
"history": self._cmd_history,
"clear": self._cmd_clear,
"example": self._cmd_example,
}
handler = commands.get(cmd)
if handler:
handler(args)
else:
print(f"Unknown command: {cmd}")
print("Type 'help' for available commands.")
def _cmd_help(self, args: str):
"""Show help message."""
help_text = """
Available Commands:
explain <pattern> - Explain a regex pattern in English
test <pattern> - Generate test cases for a pattern
flavor <name> - Set the regex flavor (pcre, javascript, python)
set <name> - Same as 'flavor'
load <filename> - Load a pattern from a file
save <filename> - Save the last pattern to a file
history - Show command history
example - Show an example pattern
clear - Clear the screen
quit / exit - Exit the interactive mode
Examples:
explain ^\\d{3}-\\d{4}$
test [a-z]+
flavor javascript
"""
print(help_text)
def _cmd_quit(self, args: str):
"""Exit the session."""
print("Goodbye!")
sys.exit(0)
def _cmd_explain(self, args: str):
"""Explain a regex pattern."""
if not args:
print("Usage: explain <pattern>")
return
try:
pattern = self._expand_pattern(args)
result = translate_regex(pattern, self.flavor)
header = f"Pattern: {pattern}"
print("\n" + "=" * (len(header)))
print(header)
print("=" * (len(header)))
print("\nEnglish Explanation:")
print("-" * (len(header)))
print(result)
print()
except Exception as e:
print(f"Error parsing pattern: {e}")
def _cmd_test(self, args: str):
"""Generate test cases for a pattern."""
if not args:
print("Usage: test <pattern>")
return
try:
pattern = self._expand_pattern(args)
result = generate_test_cases(pattern, self.flavor, 3, 3)
header = f"Pattern: {pattern}"
print("\n" + "=" * (len(header)))
print(header)
print("=" * (len(header)))
print(f"\nFlavor: {self.flavor}")
print("\nMatching strings:")
print("-" * (len(header)))
for i, s in enumerate(result["matching"], 1):
print(f" {i}. {s}")
print("\nNon-matching strings:")
print("-" * (len(header)))
for i, s in enumerate(result["non_matching"], 1):
print(f" {i}. {s}")
print()
except Exception as e:
print(f"Error generating tests: {e}")
def _cmd_flavor(self, args: str):
"""Set the current flavor."""
if not args:
manager = get_flavor_manager()
flavors = manager.list_flavors()
print("Available flavors:")
for name, desc in flavors:
marker = " (current)" if name == self.flavor else ""
print(f" {name}{marker}: {desc}")
return
flavor_name = args.strip().lower()
manager = get_flavor_manager()
if manager.get_flavor(flavor_name):
self.flavor = flavor_name
print(f"Flavor set to: {flavor_name}")
else:
print(f"Unknown flavor: {flavor_name}")
print("Available flavors: pcre, javascript, python")
def _cmd_load(self, args: str):
"""Load a pattern from a file."""
if not args:
print("Usage: load <filename>")
return
filename = args.strip()
if not os.path.exists(filename):
print(f"File not found: {filename}")
return
try:
with open(filename, 'r') as f:
pattern = f.read().strip()
print(f"Loaded pattern: {pattern}")
if hasattr(self, '_last_pattern'):
pass
self._last_pattern = pattern
except Exception as e:
print(f"Error reading file: {e}")
def _cmd_save(self, args: str):
"""Save a pattern to a file."""
if not args:
print("Usage: save <filename>")
return
pattern = getattr(self, '_last_pattern', None)
if not pattern:
print("No pattern to save. Use 'explain' or 'test' first.")
return
try:
with open(args.strip(), 'w') as f:
f.write(pattern)
print(f"Saved pattern to: {args.strip()}")
except Exception as e:
print(f"Error writing file: {e}")
def _cmd_history(self, args: str):
"""Show command history."""
print("Command history:")
for i, cmd in enumerate(self.history[-50:], 1):
print(f" {i:3}. {cmd}")
def _cmd_clear(self, args: str):
"""Clear the screen."""
os.system('cls' if os.name == 'nt' else 'clear')
def _cmd_example(self, args: str):
"""Show an example pattern."""
examples = [
r"^\d{3}-\d{4}$",
r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
r"^(?:http|https)://[^\s]+$",
r"\b\d{4}-\d{2}-\d{2}\b",
r"(?i)(hello|hi|greetings)\s+world!?",
]
import random
example = random.choice(examples)
print(f"\nExample pattern: {example}")
print("\nType: explain " + example)
print("Type: test " + example)
print()
def _expand_pattern(self, pattern: str) -> str:
"""Expand a pattern from history or args."""
return pattern
def start_interactive_mode(flavor: str = "pcre"):
"""Start the interactive mode."""
session = InteractiveSession(flavor=flavor)
session.run()

664
regex_humanizer/parser.py Normal file
View File

@@ -0,0 +1,664 @@
"""Regex parser for converting regex patterns to AST nodes."""
from typing import Optional, Any
from dataclasses import dataclass, field
from enum import Enum
class NodeType(Enum):
LITERAL = "literal"
CHARACTER_CLASS = "character_class"
POSITIVE_SET = "positive_set"
NEGATIVE_SET = "negative_set"
DOT = "dot"
GROUP = "group"
CAPTURING_GROUP = "capturing_group"
NON_CAPTURING_GROUP = "non_capturing_group"
NAMED_GROUP = "named_group"
LOOKAHEAD = "lookahead"
LOOKBEHIND = "lookbehind"
NEGATIVE_LOOKAHEAD = "negative_lookahead"
NEGATIVE_LOOKBEHIND = "negative_lookbehind"
QUANTIFIER = "quantifier"
ANCHOR_START = "anchor_start"
ANCHOR_END = "anchor_end"
WORD_BOUNDARY = "word_boundary"
NON_WORD_BOUNDARY = "non_word_boundary"
START_OF_STRING = "start_of_string"
END_OF_STRING = "end_of_string"
END_OF_STRING_Z = "end_of_string_z"
ANY_NEWLINE = "any_newline"
CONTROL_CHAR = "control_char"
ESCAPED_CHAR = "escaped_char"
HEX_ESCAPE = "hex_escape"
OCTAL_ESCAPE = "octal_escape"
UNICODE_PROPERTY = "unicode_property"
BACKREFERENCE = "backreference"
BRANCH = "branch"
SEQUENCE = "sequence"
DIGIT = "digit"
NON_DIGIT = "non_digit"
WORD_CHAR = "word_char"
NON_WORD_CHAR = "non_word_char"
WHITESPACE = "whitespace"
NON_WHITESPACE = "non_whitespace"
@dataclass
class RegexNode:
"""Base class for regex AST nodes."""
node_type: NodeType
children: list["RegexNode"] = field(default_factory=list)
raw: str = ""
position: int = 0
@dataclass
class LiteralNode(RegexNode):
"""Represents a literal character or string."""
value: str = ""
@dataclass
class CharacterClassNode(RegexNode):
"""Represents a character class like [a-z]."""
negated: bool = False
ranges: list[tuple[str, str]] = field(default_factory=list)
characters: str = ""
@dataclass
class QuantifierNode(RegexNode):
"""Represents a quantifier like *, +, ?, {n,m}."""
min_count: Optional[int] = None
max_count: Any = None
is_lazy: bool = False
is_possessive: bool = False
@dataclass
class GroupNode(RegexNode):
"""Represents a group."""
name: Optional[str] = None
group_index: Optional[int] = None
is_non_capturing: bool = False
class RegexParser:
"""Parser for regex patterns that builds an AST."""
def __init__(self, pattern: str, flavor: str = "pcre"):
self.pattern = pattern
self.flavor = flavor
self.pos = 0
self.length = len(pattern)
self._errors: list[str] = []
def parse(self) -> RegexNode:
"""Parse the entire pattern into an AST."""
self.pos = 0
self._errors = []
result = self._parse_sequence()
if self.pos < self.length:
remaining = self.pattern[self.pos:]
self._errors.append(f"Unexpected content at position {self.pos}: {remaining[:20]}")
return result
def _parse_sequence(self) -> RegexNode:
"""Parse a sequence of regex elements."""
children = []
start_pos = self.pos
while self.pos < self.length:
char = self.pattern[self.pos]
if char == ')':
break
elif char == '\\':
node = self._parse_escape()
if node:
children.append(node)
elif char == '[':
node = self._parse_character_class()
if node:
children.append(node)
elif char == '.':
children.append(RegexNode(
node_type=NodeType.DOT,
raw=char,
position=self.pos
))
self.pos += 1
elif char == '(':
node = self._parse_group()
if node:
children.append(node)
elif char == '|':
self.pos += 1
first_alt_children = []
if children and children[-1].node_type == NodeType.BRANCH:
first_alt_children = children[-1].children
else:
first_alt_children = children[:]
children.clear()
alt_children = first_alt_children
while self.pos < self.length and self.pattern[self.pos] != ')' and self.pattern[self.pos] != '|':
char = self.pattern[self.pos]
if char == '\\':
node = self._parse_escape()
if node:
alt_children.append(node)
elif char == '[':
node = self._parse_character_class()
if node:
alt_children.append(node)
elif char == '.':
alt_children.append(RegexNode(
node_type=NodeType.DOT,
raw=char,
position=self.pos
))
self.pos += 1
elif char == '(':
node = self._parse_group()
if node:
alt_children.append(node)
elif char in '*+?{':
if alt_children:
prev = alt_children.pop()
if char == '{':
node = self._parse_quantifier(char, prev)
else:
node = self._parse_quantifier(char, prev)
if node:
alt_children.append(node)
else:
alt_children.append(prev)
self.pos += 1
elif char == ')':
break
else:
literal = char
self.pos += 1
while self.pos < self.length and self.pattern[self.pos] not in r')|*+?[\.^{$':
literal += self.pattern[self.pos]
self.pos += 1
alt_children.append(LiteralNode(
node_type=NodeType.LITERAL,
value=literal,
raw=literal,
position=self.pos - len(literal)
))
if children and children[-1].node_type == NodeType.BRANCH:
pass
else:
branch = RegexNode(
node_type=NodeType.BRANCH,
children=first_alt_children,
raw='|',
position=self.pos - 1
)
children.append(branch)
elif char in '^$':
if char == '^':
children.append(RegexNode(
node_type=NodeType.ANCHOR_START,
raw=char,
position=self.pos
))
else:
children.append(RegexNode(
node_type=NodeType.ANCHOR_END,
raw=char,
position=self.pos
))
self.pos += 1
elif char in '*+?':
node = self._parse_quantifier(char, children.pop() if children else None)
if node:
children.append(node)
else:
self._errors.append(f"Quantifier '{char}' without preceding element at position {self.pos}")
self.pos += 1
elif char == '{':
if children:
node = self._parse_quantifier(char, children.pop())
if node:
children.append(node)
else:
self._errors.append(f"Invalid quantifier at position {self.pos}")
self.pos += 1
else:
self._errors.append(f"Quantifier '{{' without preceding element at position {self.pos}")
self.pos += 1
else:
literal = char
self.pos += 1
while self.pos < self.length and self.pattern[self.pos] not in r')|*+?[\.^{$':
char = self.pattern[self.pos]
if char == '\\':
if self.pos + 1 < self.length:
literal += char + self.pattern[self.pos + 1]
self.pos += 2
else:
literal += char
self.pos += 1
else:
literal += char
self.pos += 1
children.append(LiteralNode(
node_type=NodeType.LITERAL,
value=literal,
raw=literal,
position=self.pos - len(literal)
))
end_pos = self.pos
return RegexNode(
node_type=NodeType.SEQUENCE,
children=children,
raw=self.pattern[start_pos:end_pos],
position=start_pos
)
def _parse_escape(self) -> Optional[RegexNode]:
"""Parse an escape sequence."""
if self.pos + 1 >= self.length:
return None
self.pos += 1
char = self.pattern[self.pos]
self.pos += 1
escaped_chars = {
'd': ('digit', '\\d'),
'D': ('non_digit', '\\D'),
'w': ('word_char', '\\w'),
'W': ('non_word_char', '\\W'),
's': ('whitespace', '\\s'),
'S': ('non_whitespace', '\\S'),
'b': ('word_boundary', '\\b'),
'B': ('non_word_boundary', '\\B'),
}
if char in escaped_chars:
node_type_name, raw = escaped_chars[char]
return RegexNode(
node_type=NodeType(node_type_name),
raw=f'\\{char}',
position=self.pos - 2
)
special_escaped = {
'.': '.',
'*': '*',
'+': '+',
'?': '?',
'^': '^',
'$': '$',
'|': '|',
'(': '(',
')': ')',
'[': '[',
']': ']',
'{': '{',
'}': '}',
'\\': '\\',
'-': '-',
'n': '\n',
'r': '\r',
't': '\t',
}
if char in special_escaped:
return LiteralNode(
node_type=NodeType.ESCAPED_CHAR,
value=special_escaped[char],
raw=f'\\{char}',
position=self.pos - 2
)
if char == '0':
return RegexNode(
node_type=NodeType.OCTAL_ESCAPE,
raw=f'\\{char}',
position=self.pos - 2
)
if char == 'x':
if self.pos + 2 <= self.length:
hex_part = self.pattern[self.pos:self.pos + 2]
if all(c in '0123456789abcdefABCDEF' for c in hex_part):
self.pos += 2
return RegexNode(
node_type=NodeType.HEX_ESCAPE,
raw=f'\\x{hex_part}',
position=self.pos - 4
)
if char == 'u':
if self.pos + 4 <= self.length:
hex_part = self.pattern[self.pos:self.pos + 4]
if all(c in '0123456789abcdefABCDEF' for c in hex_part):
self.pos += 4
return RegexNode(
node_type=NodeType.UNICODE_PROPERTY,
raw=f'\\u{hex_part}',
position=self.pos - 6
)
if char == 'p':
if self.pos < self.length and self.pattern[self.pos] == '{':
end = self.pattern.find('}', self.pos + 1)
if end != -1:
prop = self.pattern[self.pos + 1:end]
self.pos = end + 1
return RegexNode(
node_type=NodeType.UNICODE_PROPERTY,
raw=f'\\p{{{prop}}}',
position=self.pos - len(f'\\p{{{prop}}}')
)
if char == 'c':
if self.pos < self.length:
ctrl_char = self.pattern[self.pos]
self.pos += 1
return RegexNode(
node_type=NodeType.CONTROL_CHAR,
raw=f'\\c{ctrl_char}',
position=self.pos - 3
)
if char.isdigit():
backref = char
while self.pos < self.length and self.pattern[self.pos].isdigit():
backref += self.pattern[self.pos]
self.pos += 1
return RegexNode(
node_type=NodeType.BACKREFERENCE,
raw=f'\\{backref}',
position=self.pos - len(backref) - 1
)
return LiteralNode(
node_type=NodeType.ESCAPED_CHAR,
value=char,
raw=f'\\{char}',
position=self.pos - 2
)
def _parse_character_class(self) -> Optional[RegexNode]:
"""Parse a character class like [a-z] or [^a-z]."""
if self.pos >= self.length or self.pattern[self.pos] != '[':
return None
start_pos = self.pos
self.pos += 1
negated = False
if self.pos < self.length and self.pattern[self.pos] == '^':
negated = True
self.pos += 1
elif self.pos < self.length and self.pattern[self.pos] == ']':
self.pos += 1
ranges = []
characters = ""
while self.pos < self.length:
char = self.pattern[self.pos]
if char == ']':
self.pos += 1
break
elif char == '\\':
if self.pos + 1 < self.length:
next_char = self.pattern[self.pos + 1]
if next_char == 'd' or next_char == 'D':
self.pos += 2
elif next_char == 'w' or next_char == 'W':
self.pos += 2
elif next_char == 's' or next_char == 'S':
self.pos += 2
else:
self.pos += 2
characters += next_char
else:
self.pos += 1
elif char == '-' and characters and self.pos + 1 < self.length and self.pattern[self.pos + 1] != ']':
self.pos += 1
end_char = self.pattern[self.pos]
self.pos += 1
if characters[-1]:
ranges.append((characters[-1], end_char))
characters = characters[:-1]
else:
characters += char
self.pos += 1
node = CharacterClassNode(
node_type=NodeType.NEGATIVE_SET if negated else NodeType.POSITIVE_SET,
negated=negated,
ranges=ranges,
characters=characters,
raw=self.pattern[start_pos:self.pos],
position=start_pos
)
return node
def _parse_group(self) -> Optional[RegexNode]:
"""Parse a group like (?:...) or (?<name>...) or (?=...)."""
if self.pos >= self.length or self.pattern[self.pos] != '(':
return None
start_pos = self.pos
self.pos += 1
if self.pos < self.length and self.pattern[self.pos] == '?':
self.pos += 1
if self.pos < self.length:
next_char = self.pattern[self.pos]
if next_char == '=':
self.pos += 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.LOOKAHEAD,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
is_non_capturing=True
)
elif next_char == '!':
self.pos += 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.NEGATIVE_LOOKAHEAD,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
is_non_capturing=True
)
elif next_char == '<':
self.pos += 1
if self.pos < self.length:
if self.pattern[self.pos] == '=':
self.pos += 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.LOOKBEHIND,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
is_non_capturing=True
)
elif self.pattern[self.pos] == '!':
self.pos += 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.NEGATIVE_LOOKBEHIND,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
is_non_capturing=True
)
else:
name_start = self.pos
while self.pos < self.length and self.pattern[self.pos] != '>':
self.pos += 1
name = self.pattern[name_start:self.pos]
self.pos += 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.NAMED_GROUP,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
name=name,
is_non_capturing=False
)
elif next_char == ':':
self.pos += 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.NON_CAPTURING_GROUP,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
is_non_capturing=True
)
elif next_char == '#':
comment_end = self.pattern.find(')', self.pos)
if comment_end != -1:
self.pos = comment_end + 1
children = self._parse_sequence()
return RegexNode(
node_type=NodeType.NON_CAPTURING_GROUP,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos
)
elif next_char == 'P':
self.pos += 1
if self.pos < self.length and self.pattern[self.pos] == '<':
name_start = self.pos + 1
name_end = self.pattern.find('>', name_start)
if name_end != -1:
name = self.pattern[name_start:name_end]
self.pos = name_end + 1
children = self._parse_sequence()
return GroupNode(
node_type=NodeType.NAMED_GROUP,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
name=name,
is_non_capturing=False
)
elif next_char in 'iDsx':
self.pos += 1
children = self._parse_sequence()
return RegexNode(
node_type=NodeType.NON_CAPTURING_GROUP,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos
)
children = self._parse_sequence()
if self.pos < self.length and self.pattern[self.pos] == ')':
self.pos += 1
return GroupNode(
node_type=NodeType.CAPTURING_GROUP,
children=[children],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
is_non_capturing=False
)
def _parse_quantifier(self, char: str, node: Optional[RegexNode]) -> Optional[RegexNode]:
"""Parse a quantifier like *, +, ?, {n,m}."""
if node is None:
return None
start_pos = self.pos
is_lazy = False
is_possessive = False
if char in '*+?':
self.pos += 1
if char == '*':
min_count = 0
max_count = float('inf')
elif char == '+':
min_count = 1
max_count = float('inf')
else:
min_count = 0
max_count = 1
if self.pos < self.length and self.pattern[self.pos] in '?+':
modifier = self.pattern[self.pos]
if modifier == '?':
is_lazy = True
elif modifier == '+':
is_possessive = True
self.pos += 1
elif char == '{':
end = self.pattern.find('}', self.pos + 1)
if end == -1:
return None
quant_content = self.pattern[self.pos + 1:end]
self.pos = end + 1
parts = quant_content.split(',')
min_count = int(parts[0])
if len(parts) > 1 and parts[1].strip():
max_count = int(parts[1])
else:
max_count = min_count
if self.pos < self.length and self.pattern[self.pos] in '?+':
modifier = self.pattern[self.pos]
if modifier == '?':
is_lazy = True
elif modifier == '+':
is_possessive = True
self.pos += 1
else:
return None
result = QuantifierNode(
node_type=NodeType.QUANTIFIER,
children=[node] if node else [],
raw=self.pattern[start_pos:self.pos],
position=start_pos,
min_count=min_count,
max_count=max_count,
is_lazy=is_lazy,
is_possessive=is_possessive
)
if node:
result.children = [node]
return result
def get_errors(self) -> list[str]:
"""Return any parsing errors."""
return self._errors
def parse_regex(pattern: str, flavor: str = "pcre") -> RegexNode:
"""Parse a regex pattern into an AST."""
parser = RegexParser(pattern, flavor)
ast = parser.parse()
return ast

View File

@@ -0,0 +1,382 @@
"""Test case generator for regex patterns."""
import random
import string
from typing import Optional
from .parser import parse_regex, RegexNode, NodeType
class TestCaseGenerator:
"""Generates matching and non-matching test cases for regex patterns."""
def __init__(self, flavor: str = "pcre"):
self.flavor = flavor
def generate_matching(
self,
pattern: str,
count: int = 5,
max_length: int = 50
) -> list[str]:
"""Generate strings that match the pattern."""
try:
ast = parse_regex(pattern, self.flavor)
return self._generate_matching_from_ast(ast, count, max_length)
except Exception:
return self._generate_fallback_matching(pattern, count)
def _generate_matching_from_ast(
self,
node: RegexNode,
count: int,
max_length: int
) -> list[str]:
"""Generate matching strings from AST."""
if node.node_type == NodeType.SEQUENCE:
return self._generate_sequence(node.children, count, max_length)
return [pattern_to_string(node, max_length) for _ in range(count)]
def _generate_sequence(
self,
children: list[RegexNode],
count: int,
max_length: int
) -> list[str]:
"""Generate strings for a sequence of nodes."""
results = []
for _ in range(count):
parts = []
for child in children:
if len("".join(parts)) >= max_length:
break
part = generate_from_node(child, max_length - len("".join(parts)))
if part is None:
part = ""
parts.append(part)
results.append("".join(parts))
return results
def _generate_fallback_matching(
self,
pattern: str,
count: int
) -> list[str]:
"""Fallback matching generation using simple heuristics."""
results = []
for _ in range(count):
result = ""
in_class = False
class_chars = []
for char in pattern:
if char == '\\' and len(pattern) > 1:
next_char = pattern[pattern.index(char) + 1]
if next_char in 'dDsSwWbB':
if next_char == 'd':
result += random.choice(string.digits)
elif next_char == 'D':
result += random.choice(string.ascii_letters)
elif next_char == 'w':
result += random.choice(string.ascii_letters)
elif next_char == 'W':
result += random.choice(' !@#$%^&*()')
elif next_char == 's':
result += " "
elif next_char == 'b':
result += random.choice(string.ascii_letters)
else:
result += next_char
elif char == '.':
result += random.choice(string.ascii_letters)
elif char in '*+?':
continue
elif char == '[':
in_class = True
class_chars = []
elif char == ']':
in_class = False
if class_chars:
result += random.choice(class_chars)
elif in_class:
if char == '-' and class_chars:
pass
else:
class_chars.append(char)
elif char not in '()|^$\\{}':
result += char
if not result:
result = "test"
results.append(result[:20])
return results[:count]
def generate_non_matching(
self,
pattern: str,
count: int = 5,
max_length: int = 50
) -> list[str]:
"""Generate strings that do NOT match the pattern."""
try:
ast = parse_regex(pattern, self.flavor)
return self._generate_non_matching_from_ast(pattern, ast, count, max_length)
except Exception:
return self._generate_fallback_non_matching(pattern, count)
def _generate_non_matching_from_ast(
self,
pattern: str,
node: RegexNode,
count: int,
max_length: int
) -> list[str]:
"""Generate non-matching strings from AST."""
results = set()
if node.node_type == NodeType.ANCHOR_START:
return [s + "prefix" for s in results] or ["prefix_test"]
if node.node_type == NodeType.ANCHOR_END:
return ["suffix" + s for s in results] or ["test_suffix"]
if node.node_type == NodeType.START_OF_STRING:
return ["prefix" + s for s in results] or ["prefix_test"]
if node.node_type == NodeType.END_OF_STRING:
return [s + "suffix" for s in results] or ["test_suffix"]
base_matching = self._generate_matching_from_ast(node, 10, max_length)
for matching in base_matching:
if len(results) >= count:
break
if len(matching) > 0:
pos = random.randint(0, len(matching) - 1)
original = matching[pos]
replacement = get_replacement_char(original)
if replacement != original:
non_match = matching[:pos] + replacement + matching[pos + 1:]
if not matches_pattern(pattern, non_match, self.flavor):
results.add(non_match)
if len(results) < count and matching:
pos = random.randint(0, len(matching))
char_to_add = get_opposite_char_class(matching[pos - 1] if pos > 0 else 'a')
non_match = matching[:pos] + char_to_add + matching[pos:]
if not matches_pattern(pattern, non_match, self.flavor):
results.add(non_match)
if len(results) < count:
for _ in range(count - len(results)):
base = self._generate_fallback_non_matching(pattern, 1)[0] if self._generate_fallback_non_matching(pattern, 1) else "does_not_match_123"
results.add(base + str(random.randint(100, 999)))
return list(results)[:count]
def _generate_fallback_non_matching(
self,
pattern: str,
count: int
) -> list[str]:
"""Fallback non-matching generation."""
results = ["does_not_match", "completely_different", "!@#$%^&*()", "", "xyz123"]
if pattern.startswith('^'):
results.append("prefix_" + results[0])
if pattern.endswith('$'):
results.append(results[0] + "_suffix")
if '\\d' in pattern or '[0-9]' in pattern:
results.append("abc_def")
if '\\w' in pattern:
results.append("!@#$%^&*")
if '\\s' in pattern:
results.append("nospacehere")
dot_count = pattern.count('.')
if dot_count > 0:
results.append("x" * (dot_count + 1))
import re
try:
compiled = re.compile(pattern)
filtered_results = []
for r in results:
if compiled.search(r) is None:
filtered_results.append(r)
if filtered_results:
return filtered_results[:count]
except re.error:
pass
return results[:count]
def generate_from_node(node: RegexNode, max_length: int) -> Optional[str]:
"""Generate a string from a single node."""
if node.node_type == NodeType.LITERAL:
return node.value[:max_length] if node.value else None
if node.node_type == NodeType.ESCAPED_CHAR:
return node.value if node.value else None
if node.node_type == NodeType.DOT:
return random.choice(string.ascii_letters)
if node.node_type in (NodeType.POSITIVE_SET, NodeType.NEGATIVE_SET):
if node.node_type == NodeType.NEGATIVE_SET:
all_chars = []
for start, end in node.ranges:
all_chars.extend([chr(i) for i in range(ord(start), ord(end) + 1)])
all_chars.extend(node.characters)
available = [c for c in string.ascii_letters if c not in all_chars]
if available:
return random.choice(available)
return "!"
if node.ranges:
start, end = node.ranges[0]
return chr(random.randint(ord(start), ord(end)))
if node.characters:
return random.choice(node.characters)
return "a"
if node.node_type in (NodeType.DIGIT, NodeType.NON_DIGIT):
return random.choice(string.digits)
if node.node_type in (NodeType.WORD_CHAR, NodeType.NON_WORD_CHAR):
return random.choice(string.ascii_letters)
if node.node_type in (NodeType.WHITESPACE, NodeType.NON_WHITESPACE):
return " "
if node.node_type == NodeType.QUANTIFIER:
if node.children:
child_str = generate_from_node(node.children[0], max_length)
if child_str is None:
child_str = "x"
min_count = node.min_count if node.min_count else 0
max_count = min(node.max_count, 3) if node.max_count and node.max_count != float('inf') else 3
max_count = max(min_count, max_count)
if min_count == 0 and max_count == 0:
repeat = 0
elif min_count == 0:
repeat = random.randint(1, max_count)
else:
repeat = random.randint(min_count, max_count)
return (child_str * repeat)[:max_length]
return None
if node.node_type == NodeType.CAPTURING_GROUP:
if node.children:
return generate_from_node(node.children[0], max_length)
return None
if node.node_type == NodeType.NON_CAPTURING_GROUP:
if node.children:
return generate_from_node(node.children[0], max_length)
return None
if node.node_type == NodeType.NAMED_GROUP:
if node.children:
return generate_from_node(node.children[0], max_length)
return None
if node.node_type in (NodeType.LOOKAHEAD, NodeType.NEGATIVE_LOOKAHEAD):
return ""
if node.node_type in (NodeType.LOOKBEHIND, NodeType.NEGATIVE_LOOKBEHIND):
return ""
if node.node_type == NodeType.SEQUENCE:
result = ""
for child in node.children:
if len(result) >= max_length:
break
part = generate_from_node(child, max_length - len(result))
if part:
result += part
return result if result else None
if node.node_type == NodeType.BRANCH:
if node.children:
choices = []
for child in node.children:
part = generate_from_node(child, max_length)
if part:
choices.append(part)
if choices:
return random.choice(choices)
return None
return None
def pattern_to_string(node: RegexNode, max_length: int) -> str:
"""Convert a node to a representative string."""
result = generate_from_node(node, max_length)
return result if result else "test"
def get_replacement_char(original: str) -> str:
"""Get a replacement character different from the original."""
if original.isdigit():
return random.choice([c for c in string.digits if c != original])
if original.isalpha():
return random.choice([c for c in string.ascii_letters if c.lower() != original.lower()])
if original == ' ':
return random.choice(['\t', '\n'])
return 'x'
def get_opposite_char_class(char: str) -> str:
"""Get a character from a different class."""
if char.isdigit():
return random.choice(string.ascii_letters)
if char.isalpha():
return random.choice(string.digits)
if char == ' ':
return 'x'
return '1'
def matches_pattern(pattern: str, text: str, flavor: str) -> bool:
"""Check if text matches pattern."""
import re
try:
flags = 0
if flavor == "python":
pass
elif flavor == "javascript":
flags = re.MULTILINE
elif flavor == "pcre":
flags = re.MULTILINE
compiled = re.compile(pattern, flags)
return compiled.search(text) is not None
except re.error:
return False
def generate_test_cases(
pattern: str,
flavor: str = "pcre",
matching_count: int = 5,
non_matching_count: int = 5
) -> dict:
"""Generate all test cases for a pattern."""
generator = TestCaseGenerator(flavor)
return {
"pattern": pattern,
"flavor": flavor,
"matching": generator.generate_matching(pattern, matching_count),
"non_matching": generator.generate_non_matching(pattern, non_matching_count)
}

View File

@@ -0,0 +1,291 @@
"""Translator for converting regex AST to human-readable English."""
from .parser import (
RegexNode, NodeType, LiteralNode, CharacterClassNode,
QuantifierNode, GroupNode, RegexParser
)
class RegexTranslator:
"""Translates regex AST nodes to human-readable English."""
def __init__(self, flavor: str = "pcre"):
self.flavor = flavor
def translate(self, pattern: str) -> str:
"""Translate a regex pattern to human-readable English."""
parser = RegexParser(pattern, self.flavor)
ast = parser.parse()
return self._translate_node(ast)
def _translate_node(self, node: RegexNode) -> str:
"""Translate a single node."""
if node is None:
return ""
handlers = {
NodeType.SEQUENCE: self._translate_sequence,
NodeType.LITERAL: self._translate_literal,
NodeType.ESCAPED_CHAR: self._translate_escaped_char,
NodeType.DOT: self._translate_dot,
NodeType.POSITIVE_SET: self._translate_positive_set,
NodeType.NEGATIVE_SET: self._translate_negative_set,
NodeType.CAPTURING_GROUP: self._translate_capturing_group,
NodeType.NON_CAPTURING_GROUP: self._translate_non_capturing_group,
NodeType.NAMED_GROUP: self._translate_named_group,
NodeType.LOOKAHEAD: self._translate_lookahead,
NodeType.NEGATIVE_LOOKAHEAD: self._translate_negative_lookahead,
NodeType.LOOKBEHIND: self._translate_lookbehind,
NodeType.NEGATIVE_LOOKBEHIND: self._translate_negative_lookbehind,
NodeType.QUANTIFIER: self._translate_quantifier,
NodeType.ANCHOR_START: self._translate_anchor_start,
NodeType.ANCHOR_END: self._translate_anchor_end,
NodeType.WORD_BOUNDARY: self._translate_word_boundary,
NodeType.NON_WORD_BOUNDARY: self._translate_non_word_boundary,
NodeType.BRANCH: self._translate_branch,
NodeType.START_OF_STRING: self._translate_start_of_string,
NodeType.END_OF_STRING: self._translate_end_of_string,
NodeType.DIGIT: self._translate_digit,
NodeType.NON_DIGIT: self._translate_non_digit,
NodeType.WORD_CHAR: self._translate_word_char,
NodeType.NON_WORD_CHAR: self._translate_non_word_char,
NodeType.WHITESPACE: self._translate_whitespace,
NodeType.NON_WHITESPACE: self._translate_non_whitespace,
NodeType.BACKREFERENCE: self._translate_backreference,
}
handler = handlers.get(node.node_type)
if handler:
return handler(node)
return f"[{node.node_type.value}]"
def _translate_sequence(self, node: RegexNode) -> str:
"""Translate a sequence of nodes."""
if not node.children:
return "empty string"
parts = []
for child in node.children:
if child.node_type == NodeType.BRANCH:
branch_parts = [self._translate_node(c) for c in child.children]
if len(branch_parts) == 1:
parts.append(branch_parts[0])
else:
parts.append("(" + " OR ".join(branch_parts) + ")")
else:
parts.append(self._translate_node(child))
return "".join(parts)
def _translate_branch(self, node: RegexNode) -> str:
"""Translate a branch (alternation)."""
if not node.children:
return ""
parts = [self._translate_node(child) for child in node.children]
return " OR ".join(parts)
def _translate_literal(self, node: LiteralNode) -> str:
"""Translate a literal node."""
value = node.value
value = value.replace("\\", "backslash ")
value = value.replace(".", "period ")
value = value.replace("*", "asterisk ")
value = value.replace("+", "plus ")
value = value.replace("?", "question mark ")
value = value.replace("$", "dollar sign ")
value = value.replace("^", "caret ")
value = value.replace("|", "pipe ")
value = value.replace("(", "left parenthesis ")
value = value.replace(")", "right parenthesis ")
value = value.replace("[", "left bracket ")
value = value.replace("]", "right bracket ")
value = value.replace("{", "left brace ")
value = value.replace("}", "right brace ")
value = value.replace("\t", "tab ")
value = value.replace("\n", "newline ")
value = value.replace("\r", "carriage return ")
value = value.replace(" ", "space ")
return value
def _translate_escaped_char(self, node: LiteralNode) -> str:
"""Translate an escaped character."""
value = node.value
if value == " ":
return "space"
elif value == "\t":
return "tab character (escape sequence \\t)"
elif value == "\n":
return "newline character (escape sequence \\n)"
elif value == "\r":
return "carriage return (escape sequence \\r)"
return f"'{value}'"
def _translate_dot(self, node: RegexNode) -> str:
"""Translate a dot (any character)."""
return "any single character"
def _translate_positive_set(self, node: CharacterClassNode) -> str:
"""Translate a positive character set like [a-z]."""
parts = []
for start, end in node.ranges:
parts.append(f"any character from {start} through {end}")
for char in node.characters:
if char == '-':
parts.append("hyphen")
else:
parts.append(f"'{char}'")
if not parts:
return "any character in empty set"
if len(parts) == 1:
return parts[0]
return "any of: " + ", ".join(parts)
def _translate_negative_set(self, node: CharacterClassNode) -> str:
"""Translate a negative character set like [^a-z]."""
positive = self._translate_positive_set(node)
if positive.startswith("any character from"):
return "any character EXCEPT " + positive[20:]
return f"any character EXCEPT {positive[7:]}"
def _translate_capturing_group(self, node: GroupNode) -> str:
"""Translate a capturing group."""
if node.children:
content = self._translate_node(node.children[0])
return f"capturing group: ({content})"
return "capturing group: ()"
def _translate_non_capturing_group(self, node: GroupNode) -> str:
"""Translate a non-capturing group."""
if node.children:
content = self._translate_node(node.children[0])
return f"non-capturing group: ({content})"
return "non-capturing group: ()"
def _translate_named_group(self, node: GroupNode) -> str:
"""Translate a named group."""
name = node.name or "unnamed"
if node.children:
content = self._translate_node(node.children[0])
return f"named group '{name}': ({content})"
return f"named group '{name}': ()"
def _translate_lookahead(self, node: GroupNode) -> str:
"""Translate a positive lookahead."""
if node.children:
content = self._translate_node(node.children[0])
return f"followed by ({content})"
return "followed by ()"
def _translate_negative_lookahead(self, node: GroupNode) -> str:
"""Translate a negative lookahead."""
if node.children:
content = self._translate_node(node.children[0])
return f"NOT followed by ({content})"
return "NOT followed by ()"
def _translate_lookbehind(self, node: GroupNode) -> str:
"""Translate a lookbehind."""
if node.children:
content = self._translate_node(node.children[0])
return f"preceded by ({content})"
return "preceded by ()"
def _translate_negative_lookbehind(self, node: GroupNode) -> str:
"""Translate a negative lookbehind."""
if node.children:
content = self._translate_node(node.children[0])
return f"NOT preceded by ({content})"
return "NOT preceded by ()"
def _translate_quantifier(self, node: QuantifierNode) -> str:
"""Translate a quantifier."""
if not node.children:
return "[empty quantifier]"
child = node.children[0]
base = self._translate_node(child)
lazy_str = " (lazy)" if node.is_lazy else ""
possessive_str = " (possessive)" if node.is_possessive else ""
if node.min_count == 0 and node.max_count == 1:
return f"optional: {base}{lazy_str}{possessive_str}"
elif node.min_count == 0 and node.max_count == float('inf'):
return f"zero or more of: {base}{lazy_str}{possessive_str}"
elif node.min_count == 1 and node.max_count == float('inf'):
return f"one or more of: {base}{lazy_str}{possessive_str}"
elif node.min_count == node.max_count:
count = node.min_count
if count == 1:
return base
else:
return f"exactly {count} of: {base}{lazy_str}{possessive_str}"
elif node.max_count == float('inf'):
return f"at least {node.min_count} of: {base}{lazy_str}{possessive_str}"
else:
return f"between {node.min_count} and {node.max_count} of: {base}{lazy_str}{possessive_str}"
def _translate_anchor_start(self, node: RegexNode) -> str:
"""Translate start anchor."""
return "at the start of line or string"
def _translate_anchor_end(self, node: RegexNode) -> str:
"""Translate end anchor."""
return "at the end of line or string"
def _translate_word_boundary(self, node: RegexNode) -> str:
"""Translate word boundary."""
return "at a word boundary"
def _translate_non_word_boundary(self, node: RegexNode) -> str:
"""Translate non-word boundary."""
return "not at a word boundary"
def _translate_start_of_string(self, node: RegexNode) -> str:
"""Translate start of string anchor."""
return "at the start of the string"
def _translate_end_of_string(self, node: RegexNode) -> str:
"""Translate end of string anchor."""
return "at the end of the string"
def _translate_digit(self, node: RegexNode) -> str:
"""Translate digit character class."""
return "any digit (0-9)"
def _translate_non_digit(self, node: RegexNode) -> str:
"""Translate non-digit character class."""
return "any non-digit character"
def _translate_word_char(self, node: RegexNode) -> str:
"""Translate word character class."""
return "any word character (a-z, A-Z, 0-9, underscore)"
def _translate_non_word_char(self, node: RegexNode) -> str:
"""Translate non-word character class."""
return "any non-word character"
def _translate_whitespace(self, node: RegexNode) -> str:
"""Translate whitespace character class."""
return "any whitespace character (space, tab, newline, etc.)"
def _translate_non_whitespace(self, node: RegexNode) -> str:
"""Translate non-whitespace character class."""
return "any non-whitespace character"
def _translate_backreference(self, node: RegexNode) -> str:
"""Translate a backreference."""
return f"same as capture group \\{node.raw}"
def translate_regex(pattern: str, flavor: str = "pcre") -> str:
"""Translate a regex pattern to human-readable English."""
translator = RegexTranslator(flavor)
return translator.translate(pattern)

5
requirements.txt Normal file
View File

@@ -0,0 +1,5 @@
click>=8.0
regex>=2023.0
parsimonious>=0.10.0
pytest>=7.0
pygments>=2.15

23
setup.py Normal file
View File

@@ -0,0 +1,23 @@
from setuptools import setup, find_packages
setup(
name="regex-humanizer-cli",
version="1.0.0",
packages=find_packages(where="."),
package_dir={"": "."},
install_requires=[
"click>=8.0",
"regex>=2023.0",
"parsimonious>=0.10.0",
"pygments>=2.15",
],
extras_require={
"dev": ["pytest>=7.0", "pytest-cov>=4.0", "black>=23.0", "ruff>=0.1.0"],
},
entry_points={
"console_scripts": [
"regex-humanizer=regex_humanizer.cli:main",
],
},
python_requires=">=3.9",
)