Date Dimension Implementation
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
Introduction
This document provides comprehensive documentation for the date dimension table (anls_date) implementation in the StarRocks database. The anls_date table serves as a foundational time dimension for BI analytics, providing a complete date range coverage from January 1, 2020 to December 31, 2050 with automated data generation using recursive sequences. This implementation enables sophisticated time-series analysis, calendar-aware calculations, and seamless integration with the broader BI ecosystem.
The date dimension table is designed to support various analytical patterns including daily, weekly, and monthly reporting, while maintaining optimal performance through strategic indexing and partitioning strategies. The implementation leverages StarRocks' OLAP capabilities to deliver fast query performance on time-based aggregations and trend analysis.
Project Structure
The date dimension implementation spans multiple components within the BI analysis system:
Diagram sources
The implementation follows a modular architecture where the database schema defines the physical structure, the SQL builder handles dynamic query generation, and the frontend components provide user interface integration for date-based analytics.
Section sources
Core Components
Date Dimension Table Structure
The anls_date table implements a comprehensive date dimension with the following key attributes:
| Column Name | Data Type | Description | Constraints |
|---|---|---|---|
| t_date | DATE | Primary key date value | NOT NULL, PRIMARY KEY |
| t_week | VARCHAR(16) | ISO week identifier | NOT NULL, Format: YYYY-Www |
| t_month | VARCHAR(8) | Year-month identifier | NOT NULL, Format: YYYY-MM |
| t_quarter | VARCHAR(8) | Quarter identifier | NOT NULL, Format: YYYY-Qn |
| t_year | INT | Calendar year | NOT NULL |
| i_day_of_week | INT | Day of week (1-7) | NOT NULL |
| i_day_of_month | INT | Day of month | NOT NULL |
| i_week_of_year | INT | Week number of year | NOT NULL |
| t_week_start | DATE | Week start date (Monday) | NULL |
| t_week_end | DATE | Week end date (Sunday) | NULL |
| c_week_display | VARCHAR(32) | Formatted week display | NULL |
Automated Data Generation
The implementation uses a sophisticated recursive sequence generation approach to populate the date dimension table automatically:
Diagram sources
The sequence generation utilizes a cross join of single-digit sequences (0-9) across five positions, creating a range from 0 to 20500 days from the base date. This approach ensures complete coverage of the target date range while maintaining efficient execution.
Section sources
Architecture Overview
The date dimension implementation integrates seamlessly with the broader BI analysis architecture:
Diagram sources
The architecture supports both batch initialization and real-time query scenarios, enabling flexible deployment strategies for different operational requirements.
Detailed Component Analysis
Date Attribute Calculations
The date dimension table calculates multiple derived attributes to support various analytical needs:
ISO Week Formatting
The ISO week calculation follows international standards with Monday as the first day of the week:
- Format:
YYYY-Www(e.g., 2026-W02) - Implementation uses StarRocks
WEEKOFYEAR()function - Ensures consistent week numbering across calendar boundaries
Month and Quarter Calculations
- Month:
YYYY-MMformat usingDATE_FORMAT()function - Quarter:
YYYY-Qnformat combining year and quarter number - Both calculations use the base date arithmetic as foundation
Day Calculations
- Day of Week: Uses
DAYOFWEEK()with 1=Sunday convention - Day of Month: Direct extraction using
DAYOFMONTH() - Week of Year: Standard
WEEKOFYEAR()calculation
Week Boundary Determination
The implementation calculates both start and end dates for each week:
- Week Start: Monday (calculated using
(DAYOFWEEK(date) + 5) % 7offset) - Week End: Sunday (6 days after week start)
- Display Format:
YY年W周MMDD~MMDD(e.g., 24年1周1230~0105)
Weekly Calculation Logic
The weekly calculation logic implements a Monday-to-Sunday week cycle with specific formatting requirements:
Diagram sources
The calculation ensures that:
- Week numbering aligns with ISO standards (Monday start)
- Display format provides human-readable week identification
- Boundary dates accurately represent the complete week period
Section sources
SQL Builder Integration
The SQL builder component dynamically generates queries that utilize the date dimension table:
Diagram sources
The integration enables automatic time series generation based on the requested dimension type, ensuring optimal query performance and accurate data representation.
Section sources
Frontend Integration
The frontend components provide user interface support for date-based analytics:
Diagram sources
The frontend implementation supports both direct date selection and week-based filtering, with automatic conversion between different date formats.
Section sources
Dependency Analysis
The date dimension table has strategic dependencies within the BI ecosystem:
Diagram sources
The dependency structure ensures that the date dimension serves as a central reference point for all time-based analytics, while maintaining loose coupling with other system components.
Section sources
Performance Considerations
Storage and Distribution Strategy
The anls_date table employs optimized storage characteristics for analytical workloads:
- Primary Key: t_date ensures efficient point lookups and range queries
- Hash Distribution: BUCKET 4 distribution across t_date column
- Compression: LZ4 compression reduces storage footprint
- Persistent Index: Enabled for improved query performance
- Replication: Single replica configuration optimized for read-heavy workloads
Indexing Strategy
The table design incorporates minimal but effective indexing:
- Primary Key Index: Automatic index on t_date
- No Additional Indexes: Optimized for read performance with minimal write overhead
- Compression Benefits: LZ4 compression reduces I/O requirements
Query Optimization Patterns
The SQL builder implements several optimization strategies:
Diagram sources
Time-Series Analysis Patterns
The implementation supports common time-series analysis patterns:
- Daily Trends: Direct t_date grouping for day-level analysis
- Weekly Patterns: c_week_display grouping for week-over-week comparisons
- Monthly Analysis: t_month grouping for month-over-month trends
- Quarterly Reporting: t_quarter grouping for seasonal analysis
Section sources
Troubleshooting Guide
Common Issues and Solutions
Date Range Validation
- Issue: Queries return incomplete date ranges
- Solution: Verify initialization script executed successfully
- Verification: Check row count equals expected date range length
Performance Degradation
- Issue: Slow query performance on date-based filters
- Solution: Ensure proper use of t_date primary key in WHERE clauses
- Monitoring: Check query execution plans for index utilization
Data Integrity Issues
- Issue: Missing or duplicate date entries
- Solution: Recreate table using initialization script
- Prevention: Schedule regular integrity checks
Memory and Storage Concerns
- Issue: High memory usage during date dimension queries
- Solution: Optimize query patterns to use appropriate time granularity
- Monitoring: Track table size and query performance metrics
Diagnostic Procedures
- Schema Verification: Confirm table structure matches expected definition
- Data Validation: Verify date coverage from 2020-01-01 to 2050-12-31
- Index Check: Ensure primary key index is properly configured
- Query Analysis: Review execution plans for optimal performance
- Storage Monitoring: Track compression ratios and I/O patterns
Section sources
Conclusion
The anls_date table implementation provides a robust foundation for time-based analytics within the StarRocks BI ecosystem. The comprehensive date range coverage, automated data generation, and optimized performance characteristics enable sophisticated analytical capabilities while maintaining operational simplicity.
Key strengths of the implementation include:
- Complete Coverage: Full 31-year date range supporting long-term trend analysis
- Automated Maintenance: Self-populating design reduces manual maintenance overhead
- Optimized Performance: Strategic storage and indexing configurations for analytical workloads
- Flexible Integration: Seamless integration with SQL builder and frontend components
- Standards Compliance: ISO week calculations and standardized date formatting
The implementation successfully balances comprehensive functionality with operational efficiency, providing a solid foundation for enterprise-grade business intelligence applications. Future enhancements could include dynamic date range expansion and advanced calendar-aware calculations, building upon the solid foundation established by this implementation.