Skip to content

Date Dimension Implementation

**Referenced Files in This Document** - [[schema.sql]](file/bi-analysis/docs/database/schema.sql) - [[sql_builder.go]](file/bi-analysis/internal/biz/sql-builder.go) - [[xcbi.sql]](file/bi-chat/bi-chat/src/db/xcbi.sql) - [[Infrastructure.md]](file/ui-web-docs/pages/zh/xcbi-dev/.md) - [[Database Design.md]](file/ui-web-docs/pages/zh/xcbi-dev/.md) - [[page.tsx]](file/ui-web/src/app/settings/diy-table/page.tsx)

Table of Contents

  1. Introduction
  2. Project Structure
  3. Core Components
  4. Architecture Overview
  5. Detailed Component Analysis
  6. Dependency Analysis
  7. Performance Considerations
  8. Troubleshooting Guide
  9. Conclusion

Introduction

This document provides comprehensive documentation for the date dimension table (anls_date) implementation in the StarRocks database. The anls_date table serves as a foundational time dimension for BI analytics, providing a complete date range coverage from January 1, 2020 to December 31, 2050 with automated data generation using recursive sequences. This implementation enables sophisticated time-series analysis, calendar-aware calculations, and seamless integration with the broader BI ecosystem.

The date dimension table is designed to support various analytical patterns including daily, weekly, and monthly reporting, while maintaining optimal performance through strategic indexing and partitioning strategies. The implementation leverages StarRocks' OLAP capabilities to deliver fast query performance on time-based aggregations and trend analysis.

Project Structure

The date dimension implementation spans multiple components within the BI analysis system:

Diagram sources

The implementation follows a modular architecture where the database schema defines the physical structure, the SQL builder handles dynamic query generation, and the frontend components provide user interface integration for date-based analytics.

Section sources

Core Components

Date Dimension Table Structure

The anls_date table implements a comprehensive date dimension with the following key attributes:

Column NameData TypeDescriptionConstraints
t_dateDATEPrimary key date valueNOT NULL, PRIMARY KEY
t_weekVARCHAR(16)ISO week identifierNOT NULL, Format: YYYY-Www
t_monthVARCHAR(8)Year-month identifierNOT NULL, Format: YYYY-MM
t_quarterVARCHAR(8)Quarter identifierNOT NULL, Format: YYYY-Qn
t_yearINTCalendar yearNOT NULL
i_day_of_weekINTDay of week (1-7)NOT NULL
i_day_of_monthINTDay of monthNOT NULL
i_week_of_yearINTWeek number of yearNOT NULL
t_week_startDATEWeek start date (Monday)NULL
t_week_endDATEWeek end date (Sunday)NULL
c_week_displayVARCHAR(32)Formatted week displayNULL

Automated Data Generation

The implementation uses a sophisticated recursive sequence generation approach to populate the date dimension table automatically:

Diagram sources

The sequence generation utilizes a cross join of single-digit sequences (0-9) across five positions, creating a range from 0 to 20500 days from the base date. This approach ensures complete coverage of the target date range while maintaining efficient execution.

Section sources

Architecture Overview

The date dimension implementation integrates seamlessly with the broader BI analysis architecture:

Diagram sources

The architecture supports both batch initialization and real-time query scenarios, enabling flexible deployment strategies for different operational requirements.

Detailed Component Analysis

Date Attribute Calculations

The date dimension table calculates multiple derived attributes to support various analytical needs:

ISO Week Formatting

The ISO week calculation follows international standards with Monday as the first day of the week:

  • Format: YYYY-Www (e.g., 2026-W02)
  • Implementation uses StarRocks WEEKOFYEAR() function
  • Ensures consistent week numbering across calendar boundaries

Month and Quarter Calculations

  • Month: YYYY-MM format using DATE_FORMAT() function
  • Quarter: YYYY-Qn format combining year and quarter number
  • Both calculations use the base date arithmetic as foundation

Day Calculations

  • Day of Week: Uses DAYOFWEEK() with 1=Sunday convention
  • Day of Month: Direct extraction using DAYOFMONTH()
  • Week of Year: Standard WEEKOFYEAR() calculation

Week Boundary Determination

The implementation calculates both start and end dates for each week:

  • Week Start: Monday (calculated using (DAYOFWEEK(date) + 5) % 7 offset)
  • Week End: Sunday (6 days after week start)
  • Display Format: YY年W周MMDD~MMDD (e.g., 24年1周1230~0105)

Weekly Calculation Logic

The weekly calculation logic implements a Monday-to-Sunday week cycle with specific formatting requirements:

Diagram sources

The calculation ensures that:

  • Week numbering aligns with ISO standards (Monday start)
  • Display format provides human-readable week identification
  • Boundary dates accurately represent the complete week period

Section sources

SQL Builder Integration

The SQL builder component dynamically generates queries that utilize the date dimension table:

Diagram sources

The integration enables automatic time series generation based on the requested dimension type, ensuring optimal query performance and accurate data representation.

Section sources

Frontend Integration

The frontend components provide user interface support for date-based analytics:

Diagram sources

The frontend implementation supports both direct date selection and week-based filtering, with automatic conversion between different date formats.

Section sources

Dependency Analysis

The date dimension table has strategic dependencies within the BI ecosystem:

Diagram sources

The dependency structure ensures that the date dimension serves as a central reference point for all time-based analytics, while maintaining loose coupling with other system components.

Section sources

Performance Considerations

Storage and Distribution Strategy

The anls_date table employs optimized storage characteristics for analytical workloads:

  • Primary Key: t_date ensures efficient point lookups and range queries
  • Hash Distribution: BUCKET 4 distribution across t_date column
  • Compression: LZ4 compression reduces storage footprint
  • Persistent Index: Enabled for improved query performance
  • Replication: Single replica configuration optimized for read-heavy workloads

Indexing Strategy

The table design incorporates minimal but effective indexing:

  • Primary Key Index: Automatic index on t_date
  • No Additional Indexes: Optimized for read performance with minimal write overhead
  • Compression Benefits: LZ4 compression reduces I/O requirements

Query Optimization Patterns

The SQL builder implements several optimization strategies:

Diagram sources

Time-Series Analysis Patterns

The implementation supports common time-series analysis patterns:

  • Daily Trends: Direct t_date grouping for day-level analysis
  • Weekly Patterns: c_week_display grouping for week-over-week comparisons
  • Monthly Analysis: t_month grouping for month-over-month trends
  • Quarterly Reporting: t_quarter grouping for seasonal analysis

Section sources

Troubleshooting Guide

Common Issues and Solutions

Date Range Validation

  • Issue: Queries return incomplete date ranges
  • Solution: Verify initialization script executed successfully
  • Verification: Check row count equals expected date range length

Performance Degradation

  • Issue: Slow query performance on date-based filters
  • Solution: Ensure proper use of t_date primary key in WHERE clauses
  • Monitoring: Check query execution plans for index utilization

Data Integrity Issues

  • Issue: Missing or duplicate date entries
  • Solution: Recreate table using initialization script
  • Prevention: Schedule regular integrity checks

Memory and Storage Concerns

  • Issue: High memory usage during date dimension queries
  • Solution: Optimize query patterns to use appropriate time granularity
  • Monitoring: Track table size and query performance metrics

Diagnostic Procedures

  1. Schema Verification: Confirm table structure matches expected definition
  2. Data Validation: Verify date coverage from 2020-01-01 to 2050-12-31
  3. Index Check: Ensure primary key index is properly configured
  4. Query Analysis: Review execution plans for optimal performance
  5. Storage Monitoring: Track compression ratios and I/O patterns

Section sources

Conclusion

The anls_date table implementation provides a robust foundation for time-based analytics within the StarRocks BI ecosystem. The comprehensive date range coverage, automated data generation, and optimized performance characteristics enable sophisticated analytical capabilities while maintaining operational simplicity.

Key strengths of the implementation include:

  • Complete Coverage: Full 31-year date range supporting long-term trend analysis
  • Automated Maintenance: Self-populating design reduces manual maintenance overhead
  • Optimized Performance: Strategic storage and indexing configurations for analytical workloads
  • Flexible Integration: Seamless integration with SQL builder and frontend components
  • Standards Compliance: ISO week calculations and standardized date formatting

The implementation successfully balances comprehensive functionality with operational efficiency, providing a solid foundation for enterprise-grade business intelligence applications. Future enhancements could include dynamic date range expansion and advanced calendar-aware calculations, building upon the solid foundation established by this implementation.