Data Transformation and Mapping
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
Introduction
This document describes the data transformation and mapping system that converts external Taobao API responses into internal business objects (BO) and persistent data models. It covers field mapping strategies for orders, products, SKUs, and refunds across API versions, validation rules, type conversions, and format standardization. It also documents the repository pattern for persistence and retrieval, error handling, missing field handling, and data quality checks. Finally, it outlines performance optimization techniques for bulk processing and caching strategies for frequently accessed transformed data.
Project Structure
The transformation pipeline spans three layers:
- External API ingestion and orchestration (Taobao via Leke SDK)
- Business transformation and mapping (BO and DO models)
- Persistence and retrieval (repository pattern with GORM)
Diagram sources
Section sources
Core Components
- OrderUsecase orchestrates fetching Taobao orders, validates DataKey, computes pagination, and emits Kafka messages for full/incremental sync.
- BO models define the internal representation of orders and filters.
- DO models define the persisted entity shape and dynamic table naming strategy.
- OrderRepoImpl persists and retrieves orders using GORMX client.
- Product/Shop BOs support product and shop metadata mapping for SKU and goods details.
Key responsibilities:
- External API to BO mapping for orders
- Type conversions and validation
- Dynamic table routing for orders
- Repository CRUD operations
Section sources
Architecture Overview
The transformation follows a publish-subscribe model:
- Orchestration builds Kafka messages containing pagination and credentials.
- Consumers decode messages and call Taobao APIs via Leke SDK.
- Responses are mapped to BO models, validated, normalized, and persisted via repositories.
Diagram sources
Detailed Component Analysis
Order Transformation and Mapping
- External API fields are parsed from JSON-like structures and mapped to OrderBO fields.
- Date/time fields are converted to pointers to time.Time for nullable semantics.
- Monetary and numeric fields are converted to float64/int32 for calculations.
- Dynamic table naming supports multi-tenant, platform, and shop segmentation.
Diagram sources
Section sources
Taobao API Integration and Message Building
- Validates DataKey via SDK before querying orders.
- Computes total pages from TotalResults and emits per-page Kafka messages.
- Supports both full and incremental order sync with configurable page sizes.
Diagram sources
Section sources
Repository Pattern Implementation
- OrderRepoImpl encapsulates CRUD operations using GORMX client.
- Dynamic table selection via GetOrderTableName enables sharding by tenant, platform, and shop.
- Pagination uses raw SQL with explicit limits and offsets for performance.
Diagram sources
Section sources
Data Model and Field Mapping Strategy
- OrderBO and Order DO fields align with the order display contract and protobuf mapping.
- Protobuf fields include standardized aliases for mapping (e.g., cost to d_cost).
- Dynamic table naming ensures isolation and scalability.
Diagram sources
Section sources
Products, SKUs, and Shop Metadata Mapping
- TaobaoItemDetail and TaobaoGoodsDetail capture product and SKU attributes.
- TaobaoSku includes price, quantity, and outer ID for inventory alignment.
- TaobaoUserInfo provides shop-level metadata for authorization and identity.
Diagram sources
Section sources
Refunds Mapping Strategy
- Refund-related fields in BO/DO include flags and amounts for refund status and monetary adjustments.
- Protobuf mapping defines standardized aliases for refund indicators and descriptions.
Diagram sources
Section sources
Dependency Analysis
- Orchestration depends on the Leke SDK for Taobao API calls and Kafka producer factory for emitting tasks.
- Mapping depends on BO/DO models and protobuf definitions for field alignment.
- Persistence depends on GORMX client and dynamic table naming.
Diagram sources
Section sources
Performance Considerations
- Concurrency control: Use a fixed-size semaphore to cap concurrent shop processing during full/incremental sync.
- Batch Kafka writes: Emit per-page messages and write in batches to reduce overhead.
- Pagination: Compute total pages from TotalResults to avoid unnecessary retries.
- GORMX client: Use raw SQL for pagination to leverage StarRocks performance characteristics.
- Dynamic table routing: Partition orders by tenant, platform, and shop to improve query locality and reduce contention.
[No sources needed since this section provides general guidance]
Troubleshooting Guide
Common issues and resolutions:
- DataKey validation failures: Verify seller nick and DataKey before API calls; log detailed error codes and messages.
- Empty or zero TotalResults: Treat as no data and skip emitting messages.
- Type conversion errors: Validate numeric fields before parsing; fallback to defaults or log anomalies.
- Missing fields: Initialize optional fields to neutral values; mark records for manual review if critical fields are absent.
- Kafka write failures: Retry failed batches; monitor producer errors and adjust batch sizes.
- Repository errors: Log SQL errors and inspect dynamic table names; ensure table creation precedes writes.
Section sources
Conclusion
The system transforms Taobao API responses into standardized BO/DO models, persists them efficiently via a repository pattern, and scales through dynamic table routing and controlled concurrency. Clear field mapping aligned with protobuf definitions, robust validation, and batch processing enable reliable, high-throughput order ingestion. Extending the mapping to products, SKUs, and refunds follows the same patterns, ensuring consistent data quality and performance.