Light-curve embeddings¶
light_curve.embed provides pretrained neural-network models that map raw photometric
time series to dense fixed-length vectors for downstream ML tasks: classification,
anomaly detection, and similarity search.
Requirements¶
Install light-curve together with its ML embedding dependencies so all versions
resolve jointly:
pip install light-curve onnxruntime huggingface_hub # CPU
pip install light-curve onnxruntime-gpu huggingface_hub # NVIDIA GPU
onnxruntime is intentionally not bundled in the light-curve[full] extra — pick
the variant that matches your hardware. See the onnxruntime install guide for other providers.
If you already have the ONNX model file locally, huggingface_hub is not required.
Available models¶
| Model | Bands | Input | Embedding dim | Pretrained on |
|---|---|---|---|---|
AstraCLR |
3 (gri jointly) | time, mag, mag_err, band index | 512 | ZTF (Zubercal DR16) |
Astromer1 |
single (or per-band) | time, mag | 256 | MACHO R-band |
Astromer1ZTF |
single (or per-band) | time, mag | 256 | ZTF DR20 g-band |
Astromer2 |
single (or per-band) | time, mag | 256 | MACHO (1.5 M light curves) |
ATAT |
6 (ugrizY jointly) | time, flux, band index | 192 | ELAsTiCC |
ATCAT |
6 (ugrizY jointly) | time, flux, flux_err, band index | 384 | ELAsTiCC |
Chronos2 |
single (magnitude only) | mag | 768 | time-series corpus |
ChronosBolt |
single (magnitude only) | mag | 256–768 | time-series corpus |
Moment1 |
single (magnitude only) | mag | 512–1024 | Time-series Pile |
Single-band: Astromer2¶
Astromer2 accepts
irregularly-sampled (time, mag) pairs and returns 256-dimensional embeddings:
import numpy as np
from light_curve.embed import Astromer2
model = Astromer2.from_hf(output="mean")
rng = np.random.default_rng(0)
time = np.sort(rng.uniform(0, 500, 120)).astype(np.float64)
mag = rng.normal(15, 0.5, 120).astype(np.float64)
embedding = model(time, mag)
print(embedding.shape) # (1, 1, 1, 256)
# squeeze to (256,) for a single object
vec = embedding.squeeze()
For multi-band data, pass bands=["g", "r"] to get one embedding per band:
model = Astromer2.from_hf(output="mean", bands=["g", "r"])
embedding = model(time, mag, band=band)
print(embedding.shape) # (2, 1, 1, 256)
Multi-band: AstraCLR¶
AstraCLR processes ZTF g, r, i bands jointly using a contrastive-learning transformer and returns 512-dimensional embeddings. Inputs are magnitudes, magnitude errors, MJD observation times, and band labels.
Important
Times must be in Modified Julian Date (MJD) — the model subtracts a fixed offset of 58 000 internally, so arbitrary time units will produce incorrect embeddings. The model was pretrained on ZTF DR16 (Zubercal DR16 × Gaia DR3), which covers MJD 58 194 – 59 951 (roughly 2018 Feb – 2023 Jan).
import numpy as np
from light_curve.embed import AstraCLR
model = AstraCLR.from_hf(band_groups={"g": 0, "r": 1, "i": 2})
rng = np.random.default_rng(3)
n = 300
# Times must be MJD — ZTF DR16 covers MJD 58 194 to 59 951
mjd = np.sort(rng.uniform(58_194, 59_951, n)).astype(np.float64)
mag = rng.normal(17, 0.5, n).astype(np.float32)
magerr = np.full(n, 0.02, dtype=np.float32)
band = np.array(["g", "r", "i"])[rng.integers(0, 3, n)]
embedding = model(mjd, mag, magerr, band)
print(embedding.shape) # (1, 1, 1, 512)
vec = embedding.squeeze() # (512,)
Integer band indices (0=g, 1=r, 2=i) can be used directly without band_groups:
import numpy as np
from light_curve.embed import AstraCLR
model = AstraCLR.from_hf()
rng = np.random.default_rng(4)
n = 300
mjd = np.sort(rng.uniform(58_194, 59_951, n)).astype(np.float64)
mag = rng.normal(17, 0.5, n).astype(np.float32)
magerr = np.full(n, 0.02, dtype=np.float32)
band = rng.integers(0, 3, n) # 0=g, 1=r, 2=i
embedding = model(mjd, mag, magerr, band)
print(embedding.shape) # (1, 1, 1, 512)
Multiple subsampling strategies (e.g. for test-time augmentation) are supported via
:class:MultipleReductions. The new :class:Middle reduction selects observations
centred on the light curve's temporal midpoint:
import numpy as np
from light_curve.embed import AstraCLR
rng = np.random.default_rng(5)
n = 400
mjd = np.sort(rng.uniform(58_194, 59_951, n)).astype(np.float64)
mag = rng.normal(17, 0.5, n).astype(np.float32)
magerr = np.full(n, 0.02, dtype=np.float32)
band = rng.integers(0, 3, n)
model = AstraCLR.from_hf(reduction=["beginning", "end", "middle"])
embedding = model(mjd, mag, magerr, band)
print(embedding.shape) # (1, 3, 1, 512) — one embedding per reduction
Multi-band: ATAT¶
ATAT processes all six
LSST ugrizY bands jointly and returns 192-dimensional embeddings. Each band is embedded with
a learned sinusoidal time modulation, the bands are merged and sorted by time, and a learnable
CLS token is read out (the "token" output used in the paper). Inputs are flux
(AB, zero-point 31.4 by default, no normalisation), time, and integer band index
(u=0, g=1, r=2, i=3, z=4, Y=5):
import numpy as np
from light_curve.embed import ATAT
model = ATAT.from_hf(output="token")
rng = np.random.default_rng(4)
n = 120
time = np.sort(rng.uniform(0, 500, n)).astype(np.float32)
flux = rng.normal(100, 10, n).astype(np.float32)
band = np.array([i % 6 for i in range(n)]) # ugrizY → 0–5
embedding = model(time, flux, band)
print(embedding.shape) # (1, 1, 1, 192)
Set mag_zp=27.5 for ELAsTiCC/SNANA FITS data, or mag_zp=8.9 for Jy.
Multi-band: ATCAT¶
ATCAT processes all six LSST ugrizY bands jointly and returns 384-dimensional embeddings. Inputs are flux (AB, zero-point 31.4 by default), flux error, time, and integer band index (u=0, g=1, r=2, i=3, z=4, Y=5):
from light_curve.embed import ATCAT
model = ATCAT.from_hf(output="last")
rng = np.random.default_rng(2)
n = 120
time = np.sort(rng.uniform(0, 500, n)).astype(np.float32)
flux = rng.normal(100, 10, n).astype(np.float32)
flux_err = np.full(n, 5.0, dtype=np.float32)
band = np.array([i % 6 for i in range(n)]) # ugrizY → 0–5
embedding = model(time, flux, flux_err, band)
print(embedding.shape) # (1, 1, 1, 384)
Set mag_zp=27.5 for ELAsTiCC/SNANA FITS data, or mag_zp=8.9 for Jy.
Single-band: Chronos 2 and Chronos-Bolt¶
Chronos models are univariate
time-series foundation models. They embed a magnitude sequence only —
timestamps are discarded and observations are treated as sequentially ordered
(the StarEmbed approach). Chronos2 returns 768-dim embeddings; ChronosBolt
comes in four sizes (tiny/mini/small/base → 256/384/512/768-dim).
import numpy as np
from light_curve.embed import Chronos2, ChronosBolt
rng = np.random.default_rng(6)
mag = rng.normal(18.0, 0.3, 150).astype(np.float32) # chronological order
model = Chronos2.from_hf(output="mean")
embedding = model(mag)
print(embedding.shape) # (1, 1, 1, 768)
bolt = ChronosBolt.from_hf(size="small", output="mean")
print(bolt(mag).shape) # (1, 1, 1, 512)
Series longer than the native context (8192 for Chronos 2, 2048 for
Chronos-Bolt) are reduced first; the default reduction="end" keeps the most
recent observations.
Single-band: MOMENT-1¶
MOMENT is a T5-based time-series
foundation model. Like Chronos it embeds a magnitude sequence only with
timestamps discarded. It comes in three sizes (small/base/large →
512/768/1024-dim) and uses a fixed 512-observation context (64 patches of 8):
import numpy as np
from light_curve.embed import Moment1
rng = np.random.default_rng(7)
mag = rng.normal(18.0, 0.3, 150).astype(np.float32) # chronological order
model = Moment1.from_hf(size="base", output="mean")
embedding = model(mag)
print(embedding.shape) # (1, 1, 1, 768)
Light curves longer than 512 observations are reduced first; the default
reduction="end" keeps the most recent 512. The "sequence" output always has
64 patches and supports only single-window reductions.
GPU and alternative runtimes¶
Pass ort_session_kwargs to select an execution provider:
model = Astromer2.from_hf(
output="mean",
ort_session_kwargs={"providers": ["CUDAExecutionProvider"]},
)
See the onnxruntime tips page for CUDA version compatibility, thread control on shared HPC nodes, and pixi-based GPU environment setup.
See the API reference for full signatures and reduction strategies.