Skip to main contentWhat is a foundation model for time series forecasting?
A foundation model for time series forecasting is a large-scale transformer
model pre-trained on diverse time series data that captures universal temporal
patterns and relationships. Similar to how BERT and GPT models learn language
structure, time series foundation models learn to understand trends,
seasonality, and variable interactions across different domains and scales.
This pre-training enables the model to adapt to new forecasting tasks with
no or minimal fine-tuning, effectively transferring knowledge from its training
data to your specific use case while handling complex patterns and multivariate
relationships that traditionally required extensive feature engineering.
Architecture
Mimosa leverages a transformer-based encoder-decoder architecture adapted for
time series forecasting. The model processes numerical time series data through
a tokenization approach that converts continuous values into discrete tokens,
enabling the use of traditional transformer mechanisms for sequential data.
The architecture consists of multiple transformer blocks incorporating
self-attention layers, feed-forward networks, and residual connections.
What distinguishes Mimosa is its minimalist design philosophy - rather than
introducing complex domain-specific components, it relies on the transformer’s
inherent capabilities to learn temporal patterns and relationships.
Training dataset
Mimosa was trained on a comprehensive collection of time series data spanning
multiple domains and sampling frequencies. The training corpus combines both
real-world datasets and synthetic data to ensure robust generalization.
The training data includes:
- Financial data: Cash flow data, revenue streams, operational costs, and
profitability indicators from various businesses.
- Energy domain: Electricity consumption patterns, power generation data, and
grid utilization metrics.
- Supply chain: Inventory levels, order volumes, fulfillment rates, and demand
forecasts across different product categories.
- Healthcare analytics: Patient admission rates, treatment utilization patterns,
and facility capacity metrics.
- Consumer products: User engagement data from mobile apps and games, including
daily active users, session lengths, and in-app activities.
The training data was enhanced through two key strategies:
- Time series mixing that creates new patterns by combining existing sequences
in varying proportions.
- Synthetic data generation using Gaussian processes with compositional kernels
to introduce additional variability.