Analytical Processing and Data Transformation
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
Introduction
This document explains the analytical processing capabilities of bi-analysis, focusing on the dynamic field processing system, SQL query building mechanisms, and data transformation pipelines. It covers metric calculation algorithms, aggregation functions, formula processing, query optimization strategies, caching mechanisms, performance tuning, and integration with external data sources. Real-time data processing workflows are addressed through the SQL builder’s support for order-related views and computed fields.
Project Structure
The bi-analysis module is a Kratos-based microservice that integrates with bi-common and exposes analytical APIs. The internal/biz package implements the business logic for query orchestration, SQL generation, formula parsing, and validation. The internal/data package provides repositories and persistence wiring via GORM and Redis.
Diagram sources
Section sources
Core Components
- QueryUsecase orchestrates analytical queries: parses requests, resolves field dependencies, builds SQL, executes, and formats results.
- SQLBuilder constructs complex CTE-based SQL tailored to field categories (base, formula, order/finance, DIY).
- FormulaParser converts JSON-formatted formula expressions into safe SQL with division-by-zero protection and IFNULL wrapping.
- Validators enforce safe WHERE conditions and date formats.
- OrderViewBuilder and ProductViewBuilder generate canonical CTEs for order/finance and product analytics aligned with “Yingfeng” semantics.
- Repositories and Transaction abstraction provide persistence and optional Redis caching.
Section sources
- [query.go]
- [sql_builder.go]
- [sql_formula.go]
- [sql_validator.go]
- [sql_order_view.go]
- [sql_product_view.go]
- [data.go]
Architecture Overview
The analytical pipeline follows a layered architecture:
- API layer (Protobuf) receives structured requests.
- Business layer (usecases) validates inputs, resolves dependencies, and delegates SQL construction.
- SQL builder composes CTEs for product info, sales/refunds, and derived views.
- Formula parser transforms expressions into SQL with safety checks.
- Data layer executes SQL and returns typed results, formatted for clients.
Diagram sources
Detailed Component Analysis
Dynamic Field Processing System
- Field categorization: Fields are classified into base, formula, period (derived), DIY, and order/finance groups. Order/finance fields trigger the use of a unified sale day view.
- Runtime configuration: Per-field aggregation modes (SUM, AVG, MAX, MIN, AVG_CLEAN) are resolved from runtime config or defaults.
- Formula dependencies: Formula fields are parsed and dependencies are recursively loaded to ensure correct evaluation order.
Diagram sources
Section sources
SQL Query Building Mechanisms
- CTE composition: The builder generates multiple CTEs (product_info, data sources, product_date, merge_base, product_diy, dimension_group, merge_formula, result, result_limit, result_count).
- Simplified paths: Dedicated SQL builders for pure DIY metrics and pure order/finance metrics enable efficient execution plans.
- Ranking and pagination: Uses dense ranking and an i_order_by column derived from the first dimension to ensure stable ordering and pagination.
- Performance hint: Emits a SQL hint to guide query planner nodes.
Diagram sources
Section sources
Data Transformation Pipelines
- Aggregation functions: getAggFunc/getAggExpr/getDimensionAggExpr select appropriate aggregate functions and wrap expressions for null handling and zero-div prevention.
- Formula evaluation: FormulaParser wraps numeric operands and division operators with IFNULL/NULLIF to avoid nulls and division-by-zero.
- Result formatting: Values are converted to strings with date/time formatting and nil normalization.
Diagram sources
Section sources
Metric Calculation Algorithms and Aggregation Functions
- Finance metrics mapping: A strict mapping ensures fields like payment amount, refunds, and derived purity metrics resolve to canonical columns in the unified sale-day view.
- Product metrics mapping: Product indicators (sales, refunds,试用, purity) map to fields in the product merge view.
- Group detection: isOrderRelatedTable, IsFinanceGroup, IsProductGroup route fields to appropriate CTEs.
Diagram sources
Section sources
Formula Processing Capabilities
- Token-based parsing: JSON arrays of tokens (field, symbol, parentheses, number) are validated and transformed into SQL.
- Safety checks: Division operators receive NULLIF wrapping; unknown operators and invalid numbers are rejected.
- Dependency sorting: Topological sort ensures composite formulas evaluate inner dependencies first.
Diagram sources
Section sources
Query Optimization Strategies
- Early routing: Pure-DIY and pure-order/finance paths bypass unnecessary joins and reduce plan complexity.
- Date filtering: getDateRange enforces bounds and supports flexible date inputs.
- Planner hint: A SQL hint is embedded to influence node placement.
- Ranking window functions: Dense rank and partitioned ordering ensure stable pagination without OFFSET scans.
Section sources
Caching Mechanisms and Performance Tuning
- Optional Redis: The data layer initializes Redis when configured; while not explicitly used in analytical queries here, it is available for caching metadata or intermediate results.
- Transaction isolation: InTx ensures consistent reads/writes across operations.
- Pagination: Uses window functions and dense ranks to minimize overhead compared to OFFSET-based pagination.
Section sources
Examples of Complex Analytical Operations
- Multi-dimensional aggregations: Rows and columns are split and trimmed, then used to construct GROUP BY clauses and ORDER BY expressions.
- Time-series analysis: When time dimensions are present, time series CTEs are generated and LEFT JOINed to ensure zero-fill behavior for missing dates.
- Derived metrics: Purity metrics (net items/orders/buyers, net amount) and cost/fee adjustments are computed in dedicated CTEs and reused across queries.
Section sources
Data Validation, Error Handling, and Logging Patterns
- WHERE condition parser: Validates operators, escapes values, and rejects invalid field names or empty IN lists.
- Date validator: Accepts standard dates or allowed SQL date functions; falls back to current date when invalid.
- Logging: Extensive debug logs during SQL generation and shop resolution; errors are wrapped with context for traceability.
Section sources
Integration with External Data Sources and Real-Time Workflows
- Tenant-scoped tables: Order/refund/sd tables are dynamically constructed per tenant and platform/shop, enabling multi-tenant isolation.
- Unified sale-day view: Orders, refunds, and试用 are merged into a single view to compute derived metrics consistently.
- Shop-aware queries: When order/finance fields are present, shop lists are resolved to build per-shop CTEs and union them.
Section sources
Dependency Analysis
The module depends on bi-common for configuration, database, cache, and API response types. Wire is used for dependency injection.
Diagram sources
Section sources
Performance Considerations
- Prefer pure-DIY or pure-order/finance routes when feasible to minimize joins.
- Use dense rank and partitioned ordering for stable pagination.
- Wrap division operands with NULLIF to prevent expensive divide-by-zero exceptions.
- Limit result sets and leverage time-series zero-fill only when needed.
Troubleshooting Guide
- Unsupported base metrics: Queries requiring base tenant metrics return an error; switch to DIY or order/finance fields.
- Empty shop list: If no shops are found for a tenant, the builder returns an empty CTE; verify tenant/shop configuration.
- Invalid formula or WHERE: Parsing errors are returned with context; validate JSON tokens and operators.
- Date range issues: Invalid dates fall back to sensible defaults; ensure correct date formats or supported SQL functions.
Section sources
Conclusion
Bi-analysis provides a robust, extensible framework for analytical queries. Its SQL builder composes canonical CTEs for order/finance and product domains, safely evaluates formulas, and enforces validations. With optional Redis caching and transactional guarantees, it supports scalable, accurate analytics across tenants and time-series workloads.