Skip to content

Architecture Overview

git-autosquash is designed as a modular system with clear separation of concerns. This document provides a comprehensive overview of the system architecture, component interactions, and design decisions.

System Overview

graph TB
    subgraph "CLI Layer"
        A[main.py] --> B[Argument Parsing]
        B --> C[Repository Validation]
    end

    subgraph "Validation Layer"
        C --> D[SourceNormalizer]
        D --> E[ProcessingValidator]
    end

    subgraph "Core Processing"
        E --> F[GitOps]
        F --> G[HunkParser]
        G --> H[BlameAnalyzer]
    end

    subgraph "User Interface"
        H --> I[AutoSquashApp]
        I --> J[ApprovalScreen]
        J --> K[Widgets]
    end

    subgraph "Execution Layer"
        K --> L[RebaseManager]
        L --> M[Interactive Rebase]
        M --> N[Conflict Resolution]
        N --> E
    end

    style A fill:#e1f5fe
    style D fill:#fff3e0
    style E fill:#fff3e0
    style I fill:#f3e5f5
    style L fill:#e8f5e8

Core Components

1. GitOps (git_ops.py)

Purpose: Central interface for all Git operations with proper error handling and subprocess management.

Key Responsibilities: - Repository validation and branch detection - Working tree status analysis
- Git command execution with timeout and error handling - Merge base calculation and commit validation

Design Patterns: - Facade Pattern: Simplifies complex Git interactions - Error Handling: Comprehensive subprocess error management - Caching: Intelligent caching of expensive Git operations

class GitOps:
    def __init__(self, repo_path: str = ".") -> None
    def is_git_repo(self) -> bool
    def get_current_branch(self) -> str | None
    def get_merge_base_with_main(self, current_branch: str) -> str | None
    def get_working_tree_status(self) -> dict[str, bool]
    def run_git_command(self, args: list[str], env: dict[str, str] | None = None) -> subprocess.CompletedProcess[str]

2. SourceNormalizer (source_normalizer.py)

Purpose: Normalizes all input sources (working-tree, index, HEAD, commits) to a single commit for consistent processing.

Key Responsibilities: - Convert working-tree changes to temporary commits with --no-verify - Convert index changes to temporary commits with --no-verify - Pass through HEAD and commit references unchanged - Track parent SHA for safe cleanup - Handle edge cases: empty diffs, detached HEAD, concurrent modifications

Design Decisions: - Single Code Path: All inputs normalized before hunk parsing eliminates branching logic - Temporary Commits: Working-tree and index changes committed with --no-verify to skip hooks - Explicit Cleanup: Parent SHA tracked for safe removal of temporary commits - Safety First: Validation ensures no data loss during normalization

class SourceNormalizer:
    def normalize_source(self, source: str) -> NormalizedSource
    def cleanup_temp_commit(self, normalized: NormalizedSource) -> None

3. ProcessingValidator (validation.py)

Purpose: Provides pre-flight and post-flight validation to guarantee data integrity throughout hunk processing.

Key Responsibilities: - Pre-flight validation: Verify hunk count matches expectations - Post-flight validation: Use git diff to detect any data corruption - Provide detailed error messages with recovery instructions - Work correctly in all repository states (detached HEAD, etc.)

Validation Strategy: - Pre-flight: validate_hunk_count() checks that parsed hunks match expected count - Post-flight: validate_processing() runs git diff <start> <end> to verify no changes lost or added - Error Detection: Non-empty diff indicates corruption with detailed error reporting - Recovery Guidance: Clear instructions for manual recovery if validation fails

class ProcessingValidator:
    def validate_hunk_count(self, hunks: list[DiffHunk], from_commit: str) -> None
    def validate_processing(self, start_commit: str, end_commit: str) -> None

4. HunkCommitSplitter (hunk_commit_splitter.py)

Purpose: Splits source commits into separate temporary commits, one per hunk, enabling reliable 3-way merge during cherry-pick.

Key Responsibilities: - Create temporary branch for split commits - Generate individual commits for each hunk from source commit - Preserve original commit metadata (author, date, message) - Provide cleanup mechanism for temporary commits and branch

Design Decisions: - Real Git Commits: Creates actual commits instead of text patches for 3-way merge compatibility - One Hunk Per Commit: Each split commit contains exactly one change for granular application - Temporary Branch: Uses git-autosquash-split-<hash> naming pattern for isolation - Automatic Cleanup: Removes all temporary commits and branches after processing

Why This Enables Reliable Squashing: - Git's 3-way merge can handle complex cases (removing lines from commit that added them) - Cherry-pick with --no-commit applies changes without creating new commits - Full git history context available during merge (not just text diffs)

class HunkCommitSplitter:
    def split_commit_into_hunks(self, source_commit: str) -> tuple[List[str], List[DiffHunk]]
    def cleanup(self) -> None

5. HunkParser (hunk_parser.py)

Purpose: Parses Git diff output into structured hunk objects for analysis and processing.

Key Responsibilities: - Parse git diff output into structured DiffHunk objects - Support both default and line-by-line hunk splitting modes - Extract file context and line range information - Handle various diff formats and edge cases

Design Decisions: - Immutable Data Structures: DiffHunk objects are immutable for safety - Flexible Parsing: Supports multiple diff modes and contexts - Line Preservation: Maintains exact line content including whitespace

@dataclass(frozen=True)
class DiffHunk:
    file_path: str
    old_start: int
    old_count: int  
    new_start: int
    new_count: int
    lines: list[str]
    context_before: list[str]
    context_after: list[str]

6. BlameAnalyzer (blame_analyzer.py)

Purpose: Analyzes Git blame information to determine target commits for each hunk with confidence scoring.

Key Responsibilities: - Run git blame analysis on hunk line ranges - Determine most frequent commit for each hunk (frequency-first algorithm) - Filter commits to branch scope (merge-base to HEAD) - Calculate confidence levels based on blame consistency - Cache commit metadata for performance

Algorithm Design: - Frequency-First Scoring: Prioritizes commits that modified the most lines - Recency Tiebreaking: Uses commit timestamps to break frequency ties - Branch Scoping: Only considers commits on current branch since merge-base

class BlameAnalyzer:
    def analyze_hunks(self, hunks: List[DiffHunk]) -> List[HunkTargetMapping]
    def _analyze_single_hunk(self, hunk: DiffHunk) -> HunkTargetMapping
    def _get_branch_commits(self) -> Set[str]  # Cached
    def _get_commit_timestamp(self, commit_hash: str) -> int  # Cached

7. TUI System (tui/)

Purpose: Rich terminal interface using Textual framework for user interaction and approval workflow.

Component Structure:

graph TD
    A[AutoSquashApp] --> B[ApprovalScreen]
    B --> C[HunkMappingWidget]
    B --> D[DiffViewer] 
    B --> E[ProgressIndicator]

    C --> F[Checkbox]
    C --> G[Static Text]
    D --> H[Syntax Highlighting]
    E --> I[Progress Display]

Key Design Principles: - Reactive UI: Real-time updates based on user interactions - Safety Defaults: All hunks start unapproved requiring explicit consent - Keyboard Navigation: Full keyboard control for efficient workflows - Graceful Fallback: Text-based fallback when TUI unavailable

8. RebaseManager (rebase_manager.py)

Purpose: Orchestrates interactive rebase operations to apply approved hunks to historical commits using split-commit + cherry-pick approach.

Key Responsibilities: - Group hunks by target commit for batch processing - Apply hunks via cherry-pick of split commits (primary method) - Fallback to patch-based approach if cherry-pick unavailable - Execute interactive rebase with chronological ordering - Handle stash/unstash operations for working tree management - Detect and report conflicts with resolution guidance - Provide automatic rollback on errors or interruption

Execution Strategy: - Primary: Cherry-pick split commits with --no-commit flag (uses git's 3-way merge) - Fallback: Patch-based application if split commits unavailable - Why 3-way works: Git understands history context, can remove lines from commit that added them

Execution Flow: 1. Preparation: Stash uncommitted changes, validate branch state 2. Grouping: Organize hunks by target commit hash 3. Ordering: Sort commits chronologically (oldest first) for history integrity 4. Processing: For each commit: - Start interactive rebase to edit the commit - Cherry-pick split commits using git cherry-pick --no-commit (if available) - Fallback to patch application using git apply --3way (if split commits unavailable) - Amend the commit with new changes - Continue rebase to next commit 5. Cleanup: Restore stash, clean up split commits/branches, handle remaining cleanup

Error Handling Strategy: - Conflict Detection: Identify merge conflicts and pause with guidance - Automatic Rollback: Restore repository state on errors or cancellation - Resource Cleanup: Ensure temporary files and stashes are properly cleaned up

Data Flow

1. Input Processing with Validation

sequenceDiagram
    participant CLI as main.py
    participant Normalizer as SourceNormalizer
    participant Validator as ProcessingValidator
    participant Git as GitOps
    participant Parser as HunkParser

    CLI->>Git: Validate repository
    Git->>CLI: Repository status
    CLI->>Normalizer: normalize_source(source)
    Normalizer->>Git: Create temp commit if needed
    Git->>Normalizer: Commit SHA
    Normalizer->>CLI: NormalizedSource (commit + parent)
    CLI->>Parser: get_diff_hunks(from_commit)
    Parser->>Git: git diff <commit>
    Git->>Parser: Raw diff output
    Parser->>CLI: Structured DiffHunk objects
    CLI->>Validator: validate_hunk_count(hunks, commit)
    Validator->>CLI: Validation pass/fail

2. Analysis Phase

sequenceDiagram
    participant CLI as main.py
    participant Blame as BlameAnalyzer
    participant Git as GitOps

    CLI->>Blame: analyze_hunks(hunks)
    loop For each hunk
        Blame->>Git: git blame <lines>
        Git->>Blame: Blame output
        Blame->>Blame: Find target commit
        Blame->>Blame: Calculate confidence
    end
    Blame->>CLI: HunkTargetMapping list

3. User Approval

sequenceDiagram
    participant CLI as main.py
    participant App as AutoSquashApp
    participant Screen as ApprovalScreen
    participant User as User

    CLI->>App: Launch TUI with mappings
    App->>Screen: Create approval screen
    Screen->>User: Show hunk mappings
    User->>Screen: Review and approve hunks
    Screen->>App: Approved mappings
    App->>CLI: User decisions

4. Execution Phase with Post-Flight Validation

sequenceDiagram
    participant CLI as main.py
    participant Validator as ProcessingValidator
    participant Rebase as RebaseManager
    participant Git as GitOps
    participant Normalizer as SourceNormalizer

    CLI->>Rebase: execute_squash(mappings)
    Rebase->>Rebase: Group hunks by commit
    Rebase->>Rebase: Order commits chronologically
    loop For each target commit
        Rebase->>Git: Start interactive rebase
        Rebase->>Git: Apply hunk patches
        Rebase->>Git: Amend commit
        Rebase->>Git: Continue rebase
    end
    Rebase->>CLI: Success/failure result
    CLI->>Validator: validate_processing(start, end)
    Validator->>Git: git diff <start> <end>
    Git->>Validator: Diff output (should be empty)
    Validator->>CLI: Validation pass/fail
    CLI->>Normalizer: cleanup_temp_commit(normalized)
    Normalizer->>Git: Reset to parent if temp commit
    Normalizer->>CLI: Cleanup complete

Design Patterns and Principles

1. Separation of Concerns

Each component has a single, well-defined responsibility: - GitOps: Git command interface - HunkParser: Diff parsing and structure
- BlameAnalyzer: Blame analysis and targeting - TUI Components: User interface and interaction - RebaseManager: Rebase orchestration and execution

2. Error Handling Strategy

Defensive Programming: - Validate all inputs at component boundaries - Handle subprocess failures gracefully - Provide meaningful error messages to users - Implement automatic rollback mechanisms

Error Categories: - User Errors: Invalid repository state, detached HEAD - Git Errors: Command failures, conflicts, repository issues
- System Errors: File I/O, permissions, resource constraints - Interruption: User cancellation, keyboard interrupt

3. Performance Optimizations

Caching Strategy: - Commit metadata: Timestamps and summaries cached to avoid repeated Git calls - Branch commits: Expensive commit list operations cached per session - Blame results: Reuse blame data across multiple hunk analyses

Resource Management: - Subprocess timeouts: Prevent hanging on Git operations - Temporary file cleanup: Automatic cleanup of patches and todo files - Memory efficiency: Stream processing of large diffs when possible

4. Testing Architecture

Test Categories: - Unit Tests: Individual component functionality with mocking - Integration Tests: Component interaction with real Git repositories
- TUI Tests: User interface behavior without DOM dependencies - End-to-End Tests: Complete workflow simulation

Test Infrastructure: - Mocking Strategy: Mock Git operations for reliable, fast tests - Test Data: Structured test repositories and diff scenarios - Edge Case Coverage: Boundary conditions and error scenarios

Configuration and Extensibility

Future Extension Points

  1. Configuration System:
  2. User preferences for approval defaults
  3. Custom confidence thresholds
  4. Blame analysis parameters

  5. Plugin Architecture:

  6. Custom hunk filtering rules
  7. Alternative conflict resolution strategies
  8. Integration with external tools

  9. Output Formats:

  10. JSON output for tooling integration
  11. Structured logging for automation
  12. Custom report generation

Security Considerations

Git Command Safety

  • Command Injection Prevention: All Git arguments properly escaped
  • Repository Validation: Verify repository integrity before operations
  • Branch Protection: Only operate on feature branches with clear merge-base

Data Integrity

  • Atomic Operations: Rebase operations are atomic where possible
  • Backup Strategy: Automatic stashing preserves user work
  • Rollback Capability: Complete restoration on failure or cancellation

User Safety

  • Default Deny: All operations require explicit user approval
  • Clear Feedback: Detailed progress and error reporting
  • Escape Mechanisms: Multiple ways to safely abort operations

This architecture provides a robust, maintainable foundation for git-autosquash while supporting future enhancements and ensuring user safety throughout the workflow.