What is a foundation model for time series forecasting?

A foundation model for time series forecasting is a large-scale transformer model pre-trained on diverse time series data that captures universal temporal patterns and relationships. Similar to how BERT and GPT models learn language structure, time series foundation models learn to understand trends, seasonality, and variable interactions across different domains and scales. This pre-training enables the model to adapt to new forecasting tasks with no or minimal fine-tuning, effectively transferring knowledge from its training data to your specific use case while handling complex patterns and multivariate relationships that traditionally required extensive feature engineering.

Architecture

Mimosa leverages a transformer-based encoder-decoder architecture adapted for time series forecasting. The model processes numerical time series data through a tokenization approach that converts continuous values into discrete tokens, enabling the use of traditional transformer mechanisms for sequential data. The architecture consists of multiple transformer blocks incorporating self-attention layers, feed-forward networks, and residual connections. What distinguishes Mimosa is its minimalist design philosophy - rather than introducing complex domain-specific components, it relies on the transformer’s inherent capabilities to learn temporal patterns and relationships.

Training dataset

Mimosa was trained on a comprehensive collection of time series data spanning multiple domains and sampling frequencies. The training corpus combines both real-world datasets and synthetic data to ensure robust generalization.

The training data includes:

  • Financial data: Cash flow data, revenue streams, operational costs, and profitability indicators from various businesses.
  • Energy domain: Electricity consumption patterns, power generation data, and grid utilization metrics.
  • Supply chain: Inventory levels, order volumes, fulfillment rates, and demand forecasts across different product categories.
  • Healthcare analytics: Patient admission rates, treatment utilization patterns, and facility capacity metrics.
  • Consumer products: User engagement data from mobile apps and games, including daily active users, session lengths, and in-app activities.

The training data was enhanced through two key strategies:

  • Time series mixing that creates new patterns by combining existing sequences in varying proportions.
  • Synthetic data generation using Gaussian processes with compositional kernels to introduce additional variability.