Datasets

Datasets are a core feature of Sulie, providing a centralized approach to manage time series data for forecasting and model fine-tuning at scale.

You can make forecasts by directly passing a pd.DataFrame to Sulie’s SDK; however, the Datasets feature enables more advanced management, including:

  • Continuous data updates: Avoid manual handling of historical data.
  • Team collaboration: Share and manage datasets across teams.
  • Efficient forecasting: Streamline ad-hoc forecasting without reloading data.

Quick Tip: For simple forecasts or testing, you can directly pass a pd.DataFrame to the forecast function.

Uploading a Dataset

Uploading data to Sulie is straightforward. The Dataset class offers a familiar interface similar to a Pandas DataFrame, allowing you to use familiar Pandas operations for easy data handling.

import os
from sulie import Sulie

# Initialize Sulie client
client = Sulie(api_key=os.environ.get("SULIE_API_KEY"))

# Load the dataset
df = pd.DataFrame({
    'timestamp': pd.date_range(start='2023-01-01', periods=1000, freq='H'),
    'solar_demand': [...],  # hourly solar energy demand
    'location': ['Plant A', 'Plant B', ...]
})

# Cast the timestamp column to native type
df["timestamp"] = pd.to_datetime(df["timestamp"])

# Upload a new dataset
dataset = client.upload_dataset(
    name="solar-power-demand",      # Unique name for the dataset
    df=df,                          # Data source (Pandas DataFrame or Sulie Dataset)
    mode="append"                   # 'append' to add data, 'overwrite' to replace
)

Parameters

The upload_dataset method accepts the following parameters:

NameDescriptionType
nameA unique identifier for your dataset.required
dfThe source data as a Pandas DataFrame or another Sulie Dataset.required
modeDetermines how new data is handled - append or overwrite to replace all data.optional