Skip to content

Explain Dynamic Test Grouping System#

This document describes the dynamic test grouping system that automatically optimizes test execution in CI by balancing test groups based on execution time and test count.

Overview#

The dynamic test grouping system replaces hardcoded test groups with an intelligent algorithm that:

  • Automatically discovers all test files in the repository
  • Uses a greedy algorithm to balance test execution times across groups
  • Caches results to avoid unnecessary regeneration
  • Learns from execution data to improve future groupings
  • Targets 10-13 minutes execution time per group

Architecture#

Core Components#

  1. Dynamic Test Grouper (scripts/dynamic_test_grouper.py)

    • Main script implementing the grouping algorithm
    • Handles caching, timing collection, and group generation
  2. Makefile Integration

    • Dynamic targets that replace hardcoded test groups
    • Automatic cache invalidation and regeneration
  3. CI Workflow Integration

    • GitHub Actions steps for cache management
    • Automatic timing data collection
    • Performance monitoring and optimization
  4. Management Scripts

    • Update script for CI environments
    • Performance analysis and reporting

Algorithm Details#

The system uses a greedy bin-packing algorithm:

1. Discover all test files automatically
2. Load historical timing data (if available)
3. Sort tests by execution time (descending)
4. For each test:
   - Find group with minimum total time
   - Assign test to that group
   - Update group's total time
5. Cache results for future use

File Structure#

.test_groups_cache/           # Cache directory (git-ignored)
├── unit_groups.json         # Cached unit test groups
├── unit_groups.mk          # Makefile format for unit groups
├── unit_timings.json       # Historical timing data for unit tests
├── integration_groups.json # Cached integration test groups
├── integration_groups.mk  # Makefile format for integration groups
└── integration_timings.json # Historical timing data for integration tests

scripts/
└── dynamic_test_grouper.py  # Main grouping algorithm

.github/
├── scripts/
│   └── update_test_groups.sh # CI update script
└── workflows/
    ├── pull_request.yml      # Updated CI workflow
    └── test-groups-management.yml # Management workflow

Usage#

Manual Group Generation#

# Generate unit test groups (6 groups)
python3 scripts/dynamic_test_grouper.py --type=unit --groups=6

# Generate integration test groups (9 groups)  
python3 scripts/dynamic_test_grouper.py --type=integration --groups=9

# Force regeneration (ignore cache)
python3 scripts/dynamic_test_grouper.py --type=unit --groups=6 --force

# Update timing data from JUnit results
python3 scripts/dynamic_test_grouper.py --type=unit --update-timings --junit-dir=tests/unit/outputs

Makefile Targets#

# Run a specific test group (uses dynamic groups)
make unit_test_group TEST_GROUP=1
make integration_test_group TEST_GROUP=3

# Force regenerate all groups
make regenerate_test_groups

# Clean cache
make clean_test_groups

CI Integration#

The system automatically:

  1. Checks for new tests before each CI run
  2. Updates groups if tests have been added/removed
  3. Collects timing data after test execution
  4. Improves groupings over time

Configuration#

Target Execution Time#

The default target is 10-13 minutes per group. To modify:

# In scripts/dynamic_test_grouper.py
MAX_GROUP_TIME = 13 * 60  # 13 minutes in seconds

Default Test Time#

For new tests without timing data:

# In scripts/dynamic_test_grouper.py
DEFAULT_TEST_TIME = 30    # 30 seconds

Number of Groups#

Configured in CI workflow and Makefile:

  • Unit tests: 6 groups
  • Integration tests: 9 groups

To change, update:

  • .github/workflows/pull_request.yml (matrix strategy)
  • Makefile calls to the grouper script

Monitoring & Analysis#

Performance Reports#

The system generates statistics showing:

# Group Statistics:
# Group 1: 12 tests, 8m 45s
# Group 2: 10 tests, 9m 12s
# Group 3: 11 tests, 8m 56s
# ...
# Max group time: 9.2m, Min: 8.7m, Avg: 9.0m

Automated Monitoring#

  • Weekly optimization runs via GitHub Actions
  • Performance reports generated as artifacts
  • Automatic issue creation for balance problems

Manual Analysis#

# View current group statistics
./.github/scripts/update_test_groups.sh --stats

# Force regeneration with analysis
make regenerate_test_groups

Troubleshooting#

Cache Issues#

If groups seem incorrect:

# Clear cache and regenerate
make clean_test_groups
make regenerate_test_groups

Missing Timing Data#

For new repositories or after major changes:

# The system will use default times initially
# After first CI run, timing data will be collected automatically

Group Imbalance#

If groups exceed target time:

  1. Increase number of groups in CI configuration
  2. Identify slow tests for optimization
  3. Use the management workflow for analysis

CI Integration Issues#

Check that:

  1. Cache keys in CI workflow match current setup
  2. Python 3 is available in the CI environment
  3. File permissions allow script execution

Cache Invalidation#

The cache is invalidated when:

  1. New tests are added or existing tests removed
  2. Script is modified (dynamic_test_grouper.py)
  3. Force regeneration is requested
  4. Cache files are corrupted or missing

Performance Benefits#

Before (Hardcoded Groups)#

  • Manual group assignment
  • Unbalanced execution times
  • New tests forgotten in CI
  • No optimization over time

After (Dynamic Groups)#

  • Automatic test discovery
  • Optimal load balancing
  • Self-improving over time
  • Target execution time control
  • Zero maintenance required

Development#

Adding New Features#

To extend the system:

  1. Modify the Python script for algorithm changes
  2. Update CI workflows for new caching strategies
  3. Test with various scenarios before deployment

Algorithm Improvements#

Possible enhancements:

  • Machine learning for better time prediction
  • Dependency-aware grouping for related tests
  • Dynamic group sizing based on available runners
  • Test prioritization for faster feedback

Limitations#

  1. Initial runs use estimated times (30s default)
  2. Perl dependency for the grouping script
  3. Cache storage consumes some CI storage
  4. JUnit XML format required for timing collection

References#