Light-curve embeddings¶

light_curve.embed provides pretrained neural-network models that map raw photometric time series to dense fixed-length vectors for downstream ML tasks: classification, anomaly detection, and similarity search.

Requirements¶

Install light-curve together with its ML embedding dependencies so all versions resolve jointly:

pip install light-curve onnxruntime huggingface_hub   # CPU
pip install light-curve onnxruntime-gpu huggingface_hub  # NVIDIA GPU

onnxruntime is intentionally not bundled in the light-curve[full] extra — pick the variant that matches your hardware. See the onnxruntime install guide for other providers.

If you already have the ONNX model file locally, huggingface_hub is not required.

Available models¶

Model	Bands	Input	Embedding dim	Pretrained on
`AstraCLR`	3 (gri jointly)	time, mag, mag_err, band index	512	ZTF (Zubercal DR16)
`Astromer1`	single (or per-band)	time, mag	256	MACHO R-band
`Astromer1ZTF`	single (or per-band)	time, mag	256	ZTF DR20 g-band
`Astromer2`	single (or per-band)	time, mag	256	MACHO (1.5 M light curves)
`ATAT`	6 (ugrizY jointly)	time, flux, band index	192	ELAsTiCC
`ATCAT`	6 (ugrizY jointly)	time, flux, flux_err, band index	384	ELAsTiCC
`Chronos2`	single (magnitude only)	mag	768	time-series corpus
`ChronosBolt`	single (magnitude only)	mag	256–768	time-series corpus
`Moment1`	single (magnitude only)	mag	512–1024	Time-series Pile

Single-band: Astromer2¶

Astromer2 accepts irregularly-sampled (time, mag) pairs and returns 256-dimensional embeddings:

import numpy as np
from light_curve.embed import Astromer2

model = Astromer2.from_hf(output="mean")

rng = np.random.default_rng(0)
time = np.sort(rng.uniform(0, 500, 120)).astype(np.float64)
mag  = rng.normal(15, 0.5, 120).astype(np.float64)

embedding = model(time, mag)
print(embedding.shape)  # (1, 1, 1, 256)
# squeeze to (256,) for a single object
vec = embedding.squeeze()

For multi-band data, pass bands=["g", "r"] to get one embedding per band:

model = Astromer2.from_hf(output="mean", bands=["g", "r"])
embedding = model(time, mag, band=band)
print(embedding.shape)  # (2, 1, 1, 256)

Multi-band: AstraCLR¶

AstraCLR processes ZTF g, r, i bands jointly using a contrastive-learning transformer and returns 512-dimensional embeddings. Inputs are magnitudes, magnitude errors, MJD observation times, and band labels.

Important

Times must be in Modified Julian Date (MJD) — the model subtracts a fixed offset of 58 000 internally, so arbitrary time units will produce incorrect embeddings. The model was pretrained on ZTF DR16 (Zubercal DR16 × Gaia DR3), which covers MJD 58 194 – 59 951 (roughly 2018 Feb – 2023 Jan).

import numpy as np
from light_curve.embed import AstraCLR

model = AstraCLR.from_hf(band_groups={"g": 0, "r": 1, "i": 2})

rng = np.random.default_rng(3)
n = 300
# Times must be MJD — ZTF DR16 covers MJD 58 194 to 59 951
mjd    = np.sort(rng.uniform(58_194, 59_951, n)).astype(np.float64)
mag    = rng.normal(17, 0.5, n).astype(np.float32)
magerr = np.full(n, 0.02, dtype=np.float32)
band   = np.array(["g", "r", "i"])[rng.integers(0, 3, n)]

embedding = model(mjd, mag, magerr, band)
print(embedding.shape)  # (1, 1, 1, 512)
vec = embedding.squeeze()  # (512,)

Integer band indices (0=g, 1=r, 2=i) can be used directly without band_groups:

import numpy as np
from light_curve.embed import AstraCLR

model = AstraCLR.from_hf()

rng = np.random.default_rng(4)
n = 300
mjd    = np.sort(rng.uniform(58_194, 59_951, n)).astype(np.float64)
mag    = rng.normal(17, 0.5, n).astype(np.float32)
magerr = np.full(n, 0.02, dtype=np.float32)
band   = rng.integers(0, 3, n)  # 0=g, 1=r, 2=i

embedding = model(mjd, mag, magerr, band)
print(embedding.shape)  # (1, 1, 1, 512)

Multiple subsampling strategies (e.g. for test-time augmentation) are supported via :class:MultipleReductions. The new :class:Middle reduction selects observations centred on the light curve's temporal midpoint:

import numpy as np
from light_curve.embed import AstraCLR

rng = np.random.default_rng(5)
n = 400
mjd    = np.sort(rng.uniform(58_194, 59_951, n)).astype(np.float64)
mag    = rng.normal(17, 0.5, n).astype(np.float32)
magerr = np.full(n, 0.02, dtype=np.float32)
band   = rng.integers(0, 3, n)

model = AstraCLR.from_hf(reduction=["beginning", "end", "middle"])
embedding = model(mjd, mag, magerr, band)
print(embedding.shape)  # (1, 3, 1, 512) — one embedding per reduction

Multi-band: ATAT¶

ATAT processes all six LSST ugrizY bands jointly and returns 192-dimensional embeddings. Each band is embedded with a learned sinusoidal time modulation, the bands are merged and sorted by time, and a learnable CLS token is read out (the "token" output used in the paper). Inputs are flux (AB, zero-point 31.4 by default, no normalisation), time, and integer band index (u=0, g=1, r=2, i=3, z=4, Y=5):

import numpy as np
from light_curve.embed import ATAT

model = ATAT.from_hf(output="token")

rng = np.random.default_rng(4)
n = 120
time = np.sort(rng.uniform(0, 500, n)).astype(np.float32)
flux = rng.normal(100, 10, n).astype(np.float32)
band = np.array([i % 6 for i in range(n)])  # ugrizY → 0–5

embedding = model(time, flux, band)
print(embedding.shape)  # (1, 1, 1, 192)

Set mag_zp=27.5 for ELAsTiCC/SNANA FITS data, or mag_zp=8.9 for Jy.

Multi-band: ATCAT¶

ATCAT processes all six LSST ugrizY bands jointly and returns 384-dimensional embeddings. Inputs are flux (AB, zero-point 31.4 by default), flux error, time, and integer band index (u=0, g=1, r=2, i=3, z=4, Y=5):

from light_curve.embed import ATCAT

model = ATCAT.from_hf(output="last")

rng = np.random.default_rng(2)
n = 120
time     = np.sort(rng.uniform(0, 500, n)).astype(np.float32)
flux     = rng.normal(100, 10, n).astype(np.float32)
flux_err = np.full(n, 5.0, dtype=np.float32)
band     = np.array([i % 6 for i in range(n)])  # ugrizY → 0–5

embedding = model(time, flux, flux_err, band)
print(embedding.shape)  # (1, 1, 1, 384)

Set mag_zp=27.5 for ELAsTiCC/SNANA FITS data, or mag_zp=8.9 for Jy.

Single-band: Chronos 2 and Chronos-Bolt¶

Chronos models are univariate time-series foundation models. They embed a magnitude sequence only — timestamps are discarded and observations are treated as sequentially ordered (the StarEmbed approach). Chronos2 returns 768-dim embeddings; ChronosBolt comes in four sizes (tiny/mini/small/base → 256/384/512/768-dim).

import numpy as np
from light_curve.embed import Chronos2, ChronosBolt

rng = np.random.default_rng(6)
mag = rng.normal(18.0, 0.3, 150).astype(np.float32)  # chronological order

model = Chronos2.from_hf(output="mean")
embedding = model(mag)
print(embedding.shape)  # (1, 1, 1, 768)

bolt = ChronosBolt.from_hf(size="small", output="mean")
print(bolt(mag).shape)  # (1, 1, 1, 512)

Series longer than the native context (8192 for Chronos 2, 2048 for Chronos-Bolt) are reduced first; the default reduction="end" keeps the most recent observations.

Single-band: MOMENT-1¶

MOMENT is a T5-based time-series foundation model. Like Chronos it embeds a magnitude sequence only with timestamps discarded. It comes in three sizes (small/base/large → 512/768/1024-dim) and uses a fixed 512-observation context (64 patches of 8):

import numpy as np
from light_curve.embed import Moment1

rng = np.random.default_rng(7)
mag = rng.normal(18.0, 0.3, 150).astype(np.float32)  # chronological order

model = Moment1.from_hf(size="base", output="mean")
embedding = model(mag)
print(embedding.shape)  # (1, 1, 1, 768)

Light curves longer than 512 observations are reduced first; the default reduction="end" keeps the most recent 512. The "sequence" output always has 64 patches and supports only single-window reductions.

GPU and alternative runtimes¶

Pass ort_session_kwargs to select an execution provider:

model = Astromer2.from_hf(
    output="mean",
    ort_session_kwargs={"providers": ["CUDAExecutionProvider"]},
)

See the onnxruntime tips page for CUDA version compatibility, thread control on shared HPC nodes, and pixi-based GPU environment setup.

See the API reference for full signatures and reduction strategies.