Architecture Overview¶
git-autosquash is designed as a modular system with clear separation of concerns. This document provides a comprehensive overview of the system architecture, component interactions, and design decisions.
System Overview¶
graph TB
subgraph "CLI Layer"
A[main.py] --> B[Argument Parsing]
B --> C[Repository Validation]
end
subgraph "Validation Layer"
C --> D[SourceNormalizer]
D --> E[ProcessingValidator]
end
subgraph "Core Processing"
E --> F[GitOps]
F --> G[HunkParser]
G --> H[BlameAnalyzer]
end
subgraph "User Interface"
H --> I[AutoSquashApp]
I --> J[ApprovalScreen]
J --> K[Widgets]
end
subgraph "Execution Layer"
K --> L[RebaseManager]
L --> M[Interactive Rebase]
M --> N[Conflict Resolution]
N --> E
end
style A fill:#e1f5fe
style D fill:#fff3e0
style E fill:#fff3e0
style I fill:#f3e5f5
style L fill:#e8f5e8
Core Components¶
1. GitOps (git_ops.py)¶
Purpose: Central interface for all Git operations with proper error handling and subprocess management.
Key Responsibilities:
- Repository validation and branch detection
- Working tree status analysis
- Git command execution with timeout and error handling
- Merge base calculation and commit validation
Design Patterns: - Facade Pattern: Simplifies complex Git interactions - Error Handling: Comprehensive subprocess error management - Caching: Intelligent caching of expensive Git operations
class GitOps:
def __init__(self, repo_path: str = ".") -> None
def is_git_repo(self) -> bool
def get_current_branch(self) -> str | None
def get_merge_base_with_main(self, current_branch: str) -> str | None
def get_working_tree_status(self) -> dict[str, bool]
def run_git_command(self, args: list[str], env: dict[str, str] | None = None) -> subprocess.CompletedProcess[str]
2. SourceNormalizer (source_normalizer.py)¶
Purpose: Normalizes all input sources (working-tree, index, HEAD, commits) to a single commit for consistent processing.
Key Responsibilities:
- Convert working-tree changes to temporary commits with --no-verify
- Convert index changes to temporary commits with --no-verify
- Pass through HEAD and commit references unchanged
- Track parent SHA for safe cleanup
- Handle edge cases: empty diffs, detached HEAD, concurrent modifications
Design Decisions:
- Single Code Path: All inputs normalized before hunk parsing eliminates branching logic
- Temporary Commits: Working-tree and index changes committed with --no-verify to skip hooks
- Explicit Cleanup: Parent SHA tracked for safe removal of temporary commits
- Safety First: Validation ensures no data loss during normalization
class SourceNormalizer:
def normalize_source(self, source: str) -> NormalizedSource
def cleanup_temp_commit(self, normalized: NormalizedSource) -> None
3. ProcessingValidator (validation.py)¶
Purpose: Provides pre-flight and post-flight validation to guarantee data integrity throughout hunk processing.
Key Responsibilities:
- Pre-flight validation: Verify hunk count matches expectations
- Post-flight validation: Use git diff to detect any data corruption
- Provide detailed error messages with recovery instructions
- Work correctly in all repository states (detached HEAD, etc.)
Validation Strategy:
- Pre-flight: validate_hunk_count() checks that parsed hunks match expected count
- Post-flight: validate_processing() runs git diff <start> <end> to verify no changes lost or added
- Error Detection: Non-empty diff indicates corruption with detailed error reporting
- Recovery Guidance: Clear instructions for manual recovery if validation fails
class ProcessingValidator:
def validate_hunk_count(self, hunks: list[DiffHunk], from_commit: str) -> None
def validate_processing(self, start_commit: str, end_commit: str) -> None
4. HunkCommitSplitter (hunk_commit_splitter.py)¶
Purpose: Splits source commits into separate temporary commits, one per hunk, enabling reliable 3-way merge during cherry-pick.
Key Responsibilities: - Create temporary branch for split commits - Generate individual commits for each hunk from source commit - Preserve original commit metadata (author, date, message) - Provide cleanup mechanism for temporary commits and branch
Design Decisions:
- Real Git Commits: Creates actual commits instead of text patches for 3-way merge compatibility
- One Hunk Per Commit: Each split commit contains exactly one change for granular application
- Temporary Branch: Uses git-autosquash-split-<hash> naming pattern for isolation
- Automatic Cleanup: Removes all temporary commits and branches after processing
Why This Enables Reliable Squashing: - Git's 3-way merge can handle complex cases (removing lines from commit that added them) - Cherry-pick with --no-commit applies changes without creating new commits - Full git history context available during merge (not just text diffs)
class HunkCommitSplitter:
def split_commit_into_hunks(self, source_commit: str) -> tuple[List[str], List[DiffHunk]]
def cleanup(self) -> None
5. HunkParser (hunk_parser.py)¶
Purpose: Parses Git diff output into structured hunk objects for analysis and processing.
Key Responsibilities:
- Parse git diff output into structured DiffHunk objects
- Support both default and line-by-line hunk splitting modes
- Extract file context and line range information
- Handle various diff formats and edge cases
Design Decisions:
- Immutable Data Structures: DiffHunk objects are immutable for safety
- Flexible Parsing: Supports multiple diff modes and contexts
- Line Preservation: Maintains exact line content including whitespace
@dataclass(frozen=True)
class DiffHunk:
file_path: str
old_start: int
old_count: int
new_start: int
new_count: int
lines: list[str]
context_before: list[str]
context_after: list[str]
6. BlameAnalyzer (blame_analyzer.py)¶
Purpose: Analyzes Git blame information to determine target commits for each hunk with confidence scoring.
Key Responsibilities: - Run git blame analysis on hunk line ranges - Determine most frequent commit for each hunk (frequency-first algorithm) - Filter commits to branch scope (merge-base to HEAD) - Calculate confidence levels based on blame consistency - Cache commit metadata for performance
Algorithm Design: - Frequency-First Scoring: Prioritizes commits that modified the most lines - Recency Tiebreaking: Uses commit timestamps to break frequency ties - Branch Scoping: Only considers commits on current branch since merge-base
class BlameAnalyzer:
def analyze_hunks(self, hunks: List[DiffHunk]) -> List[HunkTargetMapping]
def _analyze_single_hunk(self, hunk: DiffHunk) -> HunkTargetMapping
def _get_branch_commits(self) -> Set[str] # Cached
def _get_commit_timestamp(self, commit_hash: str) -> int # Cached
7. TUI System (tui/)¶
Purpose: Rich terminal interface using Textual framework for user interaction and approval workflow.
Component Structure:
graph TD
A[AutoSquashApp] --> B[ApprovalScreen]
B --> C[HunkMappingWidget]
B --> D[DiffViewer]
B --> E[ProgressIndicator]
C --> F[Checkbox]
C --> G[Static Text]
D --> H[Syntax Highlighting]
E --> I[Progress Display]
Key Design Principles: - Reactive UI: Real-time updates based on user interactions - Safety Defaults: All hunks start unapproved requiring explicit consent - Keyboard Navigation: Full keyboard control for efficient workflows - Graceful Fallback: Text-based fallback when TUI unavailable
8. RebaseManager (rebase_manager.py)¶
Purpose: Orchestrates interactive rebase operations to apply approved hunks to historical commits using split-commit + cherry-pick approach.
Key Responsibilities: - Group hunks by target commit for batch processing - Apply hunks via cherry-pick of split commits (primary method) - Fallback to patch-based approach if cherry-pick unavailable - Execute interactive rebase with chronological ordering - Handle stash/unstash operations for working tree management - Detect and report conflicts with resolution guidance - Provide automatic rollback on errors or interruption
Execution Strategy:
- Primary: Cherry-pick split commits with --no-commit flag (uses git's 3-way merge)
- Fallback: Patch-based application if split commits unavailable
- Why 3-way works: Git understands history context, can remove lines from commit that added them
Execution Flow:
1. Preparation: Stash uncommitted changes, validate branch state
2. Grouping: Organize hunks by target commit hash
3. Ordering: Sort commits chronologically (oldest first) for history integrity
4. Processing: For each commit:
- Start interactive rebase to edit the commit
- Cherry-pick split commits using git cherry-pick --no-commit (if available)
- Fallback to patch application using git apply --3way (if split commits unavailable)
- Amend the commit with new changes
- Continue rebase to next commit
5. Cleanup: Restore stash, clean up split commits/branches, handle remaining cleanup
Error Handling Strategy: - Conflict Detection: Identify merge conflicts and pause with guidance - Automatic Rollback: Restore repository state on errors or cancellation - Resource Cleanup: Ensure temporary files and stashes are properly cleaned up
Data Flow¶
1. Input Processing with Validation¶
sequenceDiagram
participant CLI as main.py
participant Normalizer as SourceNormalizer
participant Validator as ProcessingValidator
participant Git as GitOps
participant Parser as HunkParser
CLI->>Git: Validate repository
Git->>CLI: Repository status
CLI->>Normalizer: normalize_source(source)
Normalizer->>Git: Create temp commit if needed
Git->>Normalizer: Commit SHA
Normalizer->>CLI: NormalizedSource (commit + parent)
CLI->>Parser: get_diff_hunks(from_commit)
Parser->>Git: git diff <commit>
Git->>Parser: Raw diff output
Parser->>CLI: Structured DiffHunk objects
CLI->>Validator: validate_hunk_count(hunks, commit)
Validator->>CLI: Validation pass/fail
2. Analysis Phase¶
sequenceDiagram
participant CLI as main.py
participant Blame as BlameAnalyzer
participant Git as GitOps
CLI->>Blame: analyze_hunks(hunks)
loop For each hunk
Blame->>Git: git blame <lines>
Git->>Blame: Blame output
Blame->>Blame: Find target commit
Blame->>Blame: Calculate confidence
end
Blame->>CLI: HunkTargetMapping list
3. User Approval¶
sequenceDiagram
participant CLI as main.py
participant App as AutoSquashApp
participant Screen as ApprovalScreen
participant User as User
CLI->>App: Launch TUI with mappings
App->>Screen: Create approval screen
Screen->>User: Show hunk mappings
User->>Screen: Review and approve hunks
Screen->>App: Approved mappings
App->>CLI: User decisions
4. Execution Phase with Post-Flight Validation¶
sequenceDiagram
participant CLI as main.py
participant Validator as ProcessingValidator
participant Rebase as RebaseManager
participant Git as GitOps
participant Normalizer as SourceNormalizer
CLI->>Rebase: execute_squash(mappings)
Rebase->>Rebase: Group hunks by commit
Rebase->>Rebase: Order commits chronologically
loop For each target commit
Rebase->>Git: Start interactive rebase
Rebase->>Git: Apply hunk patches
Rebase->>Git: Amend commit
Rebase->>Git: Continue rebase
end
Rebase->>CLI: Success/failure result
CLI->>Validator: validate_processing(start, end)
Validator->>Git: git diff <start> <end>
Git->>Validator: Diff output (should be empty)
Validator->>CLI: Validation pass/fail
CLI->>Normalizer: cleanup_temp_commit(normalized)
Normalizer->>Git: Reset to parent if temp commit
Normalizer->>CLI: Cleanup complete
Design Patterns and Principles¶
1. Separation of Concerns¶
Each component has a single, well-defined responsibility:
- GitOps: Git command interface
- HunkParser: Diff parsing and structure
- BlameAnalyzer: Blame analysis and targeting
- TUI Components: User interface and interaction
- RebaseManager: Rebase orchestration and execution
2. Error Handling Strategy¶
Defensive Programming: - Validate all inputs at component boundaries - Handle subprocess failures gracefully - Provide meaningful error messages to users - Implement automatic rollback mechanisms
Error Categories:
- User Errors: Invalid repository state, detached HEAD
- Git Errors: Command failures, conflicts, repository issues
- System Errors: File I/O, permissions, resource constraints
- Interruption: User cancellation, keyboard interrupt
3. Performance Optimizations¶
Caching Strategy: - Commit metadata: Timestamps and summaries cached to avoid repeated Git calls - Branch commits: Expensive commit list operations cached per session - Blame results: Reuse blame data across multiple hunk analyses
Resource Management: - Subprocess timeouts: Prevent hanging on Git operations - Temporary file cleanup: Automatic cleanup of patches and todo files - Memory efficiency: Stream processing of large diffs when possible
4. Testing Architecture¶
Test Categories:
- Unit Tests: Individual component functionality with mocking
- Integration Tests: Component interaction with real Git repositories
- TUI Tests: User interface behavior without DOM dependencies
- End-to-End Tests: Complete workflow simulation
Test Infrastructure: - Mocking Strategy: Mock Git operations for reliable, fast tests - Test Data: Structured test repositories and diff scenarios - Edge Case Coverage: Boundary conditions and error scenarios
Configuration and Extensibility¶
Future Extension Points¶
- Configuration System:
- User preferences for approval defaults
- Custom confidence thresholds
-
Blame analysis parameters
-
Plugin Architecture:
- Custom hunk filtering rules
- Alternative conflict resolution strategies
-
Integration with external tools
-
Output Formats:
- JSON output for tooling integration
- Structured logging for automation
- Custom report generation
Security Considerations¶
Git Command Safety¶
- Command Injection Prevention: All Git arguments properly escaped
- Repository Validation: Verify repository integrity before operations
- Branch Protection: Only operate on feature branches with clear merge-base
Data Integrity¶
- Atomic Operations: Rebase operations are atomic where possible
- Backup Strategy: Automatic stashing preserves user work
- Rollback Capability: Complete restoration on failure or cancellation
User Safety¶
- Default Deny: All operations require explicit user approval
- Clear Feedback: Detailed progress and error reporting
- Escape Mechanisms: Multiple ways to safely abort operations
This architecture provides a robust, maintainable foundation for git-autosquash while supporting future enhancements and ensuring user safety throughout the workflow.