Dimension Generators

Dimensions are non-numeric or static categorical columns that provide context, labeling, and grouping for your time series metrics (e.g., store_id, region, device_id, client_version).

In ts-data-generator, dimensions are designed to be infinite Python iterators (generators). Because they yield values on-demand, they can easily populate a dataset of any arbitrary duration or granularity without running out of memory.


๐Ÿ› ๏ธ Built-in Dimension Helpers

The package includes a comprehensive set of pre-built dimension helpers inside ts_data_generator.utils.functions. These functions can be used in your Python scripts or invoked directly in your terminal using the CLI shorthand.

Helper Type Description Example CLI Shorthand
constant(value) Deterministic Yields the same value indefinitely. env:constant:production
ordered_choice(vals) Deterministic Cycles through values in a round-robin order. server:ordered_choice:srv1,srv2,srv3
random_choice(vals) Stochastic Selects a random element uniformly at each step. region:random_choice:US,EU,AP
random_int(min, max) Stochastic Yields random integers in [min, max] (inclusive). user_id:random_int:1000,9999
random_float(min, max) Stochastic Yields random floats in [min, max) (exclusive). weight:random_float:0.0,1.0
auto_generate_name(pre) Deterministic Yields incrementing string keys with a prefix. id:auto_generate_name:sensor

๐Ÿ’ป CLI Shorthand Syntax

The CLI provides a convenient way to define dimensions using the --dims (or -d) flag.

The format follows:

--dims "column_name:helper_name:arg1,arg2,..."

[!TIP] If you omit the helper name entirely, the CLI will automatically default to random_choice: --dims "region:US,EU,AP" is identical to --dims "region:random_choice:US,EU,AP"

Concrete CLI Examples:

# 1. Uniformly assign a random region to each row
tsdata generate --dims "region:US,EU,AP" --start 2024-01-01 --end 2024-01-02 --granularity h --output data.csv

# 2. Cycle deterministically through nodes (ordered_choice)
tsdata generate --dims "node:ordered_choice:nodeA,nodeB,nodeC" --start 2024-01-01 --end 2024-01-02 --granularity h --output data.csv

# 3. Generate incrementing device IDs with a prefix
tsdata generate --dims "device:auto_generate_name:device_" --start 2024-01-01 --end 2024-01-02 --granularity h --output data.csv

# 4. Generate random continuous float weights and random discrete integer IDs
tsdata generate \
  --dims "sensor_weight:random_float:0.5,1.5" \
  --dims "cust_id:random_int:100000,999999" \
  --start 2024-01-01 --end 2024-01-02 --granularity h --output data.csv

๐Ÿ Python API Usage

When using the Python API, you pass an infinite generator or any standard Python Iterable to the .add_dimension() method.

Here is a fully runnable script showcasing all built-in helpers and custom generators:

from ts_data_generator import DataGen
from ts_data_generator.utils.functions import (
    constant,
    ordered_choice,
    random_choice,
    random_float,
    random_int,
    auto_generate_name
)

# Initialize DataGen
dg = DataGen(seed=42)
dg.start_datetime = "2024-01-01"
dg.end_datetime = "2024-01-03"
dg.to_granularity("h")

# 1. Using a static constant
dg.add_dimension("environment", constant("production"))

# 2. Cycling through list in round-robin order
dg.add_dimension("node_id", ordered_choice(["node_01", "node_02", "node_03"]))

# 3. Uniformly picking random values
dg.add_dimension("region", random_choice(["North", "South", "East", "West"]))

# 4. Generating random integers
dg.add_dimension("user_segment", random_int(1, 5))

# 5. Generating random floats
dg.add_dimension("coefficient", random_float(0.0, 1.0))

# 6. Auto-generating prefixed names (e.g. dev_1, dev_2, etc.)
dg.add_dimension("device_group", auto_generate_name("dev"))

# 7. Creating and attaching a completely custom infinite generator
def custom_infinite_seq():
    index = 0
    while True:
        yield f"batch_val_{index}"
        index += 3

dg.add_dimension("custom_batch", custom_infinite_seq())

# Verify column outputs
df = dg.data
print(df.head())

Output:

                          epoch environment  node_id region  user_segment  coefficient device_group  custom_batch
2024-01-01 00:00:00  1704067200  production  node_01   East             3     0.719306          d_1   batch_val_0
2024-01-01 01:00:00  1704070800  production  node_02   West             3     0.323770          d_1   batch_val_3
2024-01-01 02:00:00  1704074400  production  node_03   West             3     0.092849          d_1   batch_val_6
2024-01-01 03:00:00  1704078000  production  node_01   East             2     0.189741          d_1   batch_val_9
2024-01-01 04:00:00  1704081600  production  node_02  North             4     0.138835          d_1  batch_val_12

๐Ÿ”— Advanced: Linked Dimensions (Multi-Items)

In real-world data, columns are often closely linked. For example, if you have a city column and a country column, you canโ€™t assign them independently (e.g., New York must map to US, not UK).

To generate multiple correlated columns simultaneously, use the add_multi_items API.

[!WARNING] Do not use .add_dimension() for linked columns, as they will be generated independently and lose correlation. Instead, use .add_multi_items().

Here is a complete, runnable example showing how to configure linked city-country columns:

import random
from ts_data_generator import DataGen

# 1. Define an infinite generator yielding tuples of values
def city_country_generator():
    options = [
        ("New York", "US", "North America"),
        ("London", "UK", "Europe"),
        ("Tokyo", "JP", "Asia"),
        ("Sydney", "AU", "Oceania")
    ]
    while True:
        # Uniformly pick one tuple
        yield random.choice(options)

# 2. Setup DataGen
dg = DataGen(seed=123)
dg.start_datetime = "2024-01-01"
dg.end_datetime = "2024-01-02"
dg.to_granularity("h")

# 3. Add linked dimensions by passing the columns names list and the generator function
dg.add_multi_items(
    names=["city", "country", "continent"],
    function=city_country_generator()
)

# Render and verify
df = dg.data
print(df[["city", "country", "continent"]].head(10))

Output:

                         city country      continent
2024-01-01 00:00:00    London      UK         Europe
2024-01-01 01:00:00    Sydney      AU        Oceania
2024-01-01 02:00:00  New York      US  North America
2024-01-01 03:00:00    London      UK         Europe
2024-01-01 04:00:00  New York      US  North America
2024-01-01 05:00:00  New York      US  North America
2024-01-01 06:00:00    Sydney      AU        Oceania
2024-01-01 07:00:00     Tokyo      JP           Asia
2024-01-01 08:00:00    Sydney      AU        Oceania
2024-01-01 09:00:00     Tokyo      JP           Asia

This site uses Just the Docs, a documentation theme for Jekyll.