Embedding Tutorial¶
This tutorial shows how to embed light curves with pretrained ONNX models
from the light_curve.embed submodule.
Models are distributed as ONNX files and downloaded from HuggingFace Hub by from_hf().
Requires: pip install onnxruntime (and optionally huggingface_hub for automatic downloads)
In [1]:
Copied!
# %pip install light-curve huggingface_hub onnxruntime
# %pip install light-curve huggingface_hub onnxruntime
In [2]:
Copied!
import numpy as np
from light_curve.embed import Astromer2
model = Astromer2.from_hf(output="mean")
print(f"Model loaded. Max sequence length: {model.seq_size}")
rng = np.random.default_rng(0)
time = np.sort(rng.uniform(0, 500, 120)).astype(np.float64)
mag = rng.normal(15, 0.5, 120).astype(np.float64)
embedding = model(time, mag)
print(f"Output shape: {embedding.shape} # (n_bands, n_subsamples, seq_windows, embed_dim)")
print(f"Squeezed: {embedding.squeeze().shape}")
import numpy as np
from light_curve.embed import Astromer2
model = Astromer2.from_hf(output="mean")
print(f"Model loaded. Max sequence length: {model.seq_size}")
rng = np.random.default_rng(0)
time = np.sort(rng.uniform(0, 500, 120)).astype(np.float64)
mag = rng.normal(15, 0.5, 120).astype(np.float64)
embedding = model(time, mag)
print(f"Output shape: {embedding.shape} # (n_bands, n_subsamples, seq_windows, embed_dim)")
print(f"Squeezed: {embedding.squeeze().shape}")
/home/runner/work/light-curve-python/light-curve-python/.venv/lib/python3.14/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Model loaded. Max sequence length: 200 Output shape: (1, 1, 1, 256) # (n_bands, n_subsamples, seq_windows, embed_dim) Squeezed: (256,)
2026-06-26 19:46:39.646093935 [W:onnxruntime:Default, device_discovery.cc:133 GetPciBusId] Skipping pci_bus_id for PCI path at "/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/MSFT1000:00/5620e0c7-8062-4dce-aeb7-520c7ef76171" because filename "5620e0c7-8062-4dce-aeb7-520c7ef76171" did not match expected pattern of [0-9a-f]+:[0-9a-f]+:[0-9a-f]+[.][0-9a-f]+
/home/runner/work/light-curve-python/light-curve-python/light-curve/light_curve/embed/model.py:60: ExperimentalWarning: light_curve.embed.astromer.Astromer2 is experimental and may change in future versions
warn_experimental(
Astromer2 — multi-band¶
Pass bands=[...] to embed each band independently.
The model returns one embedding per band:
In [3]:
Copied!
model_gr = Astromer2.from_hf(output="mean", bands=["g", "r"])
rng2 = np.random.default_rng(1)
n = 120
time_gr = np.sort(rng2.uniform(0, 400, n)).astype(np.float64)
mag_gr = rng2.normal(15, 0.4, n).astype(np.float64)
band_gr = np.array(["g", "r"] * (n // 2))
emb_gr = model_gr(time_gr, mag_gr, band=band_gr)
print(f"Output shape: {emb_gr.shape} # (2 bands, n_subsamples, seq_windows, embed_dim)")
model_gr = Astromer2.from_hf(output="mean", bands=["g", "r"])
rng2 = np.random.default_rng(1)
n = 120
time_gr = np.sort(rng2.uniform(0, 400, n)).astype(np.float64)
mag_gr = rng2.normal(15, 0.4, n).astype(np.float64)
band_gr = np.array(["g", "r"] * (n // 2))
emb_gr = model_gr(time_gr, mag_gr, band=band_gr)
print(f"Output shape: {emb_gr.shape} # (2 bands, n_subsamples, seq_windows, embed_dim)")
Output shape: (2, 1, 1, 256) # (2 bands, n_subsamples, seq_windows, embed_dim)
In [4]:
Copied!
from light_curve.embed import ATCAT
model_atcat = ATCAT.from_hf(output="last")
print(f"ATCAT loaded. Max sequence length: {model_atcat.seq_size}")
rng3 = np.random.default_rng(2)
n3 = 150
time3 = np.sort(rng3.uniform(0, 500, n3)).astype(np.float32)
flux3 = rng3.normal(100, 15, n3).astype(np.float32) # flux in nJy
flux_err3 = np.full(n3, 5.0, dtype=np.float32)
band3 = np.array([i % 6 for i in range(n3)]) # u=0, g=1, r=2, i=3, z=4, Y=5
emb3 = model_atcat(time3, flux3, flux_err3, band3)
print(f"Output shape: {emb3.shape} # (1, 1, 1, {emb3.shape[-1]})")
from light_curve.embed import ATCAT
model_atcat = ATCAT.from_hf(output="last")
print(f"ATCAT loaded. Max sequence length: {model_atcat.seq_size}")
rng3 = np.random.default_rng(2)
n3 = 150
time3 = np.sort(rng3.uniform(0, 500, n3)).astype(np.float32)
flux3 = rng3.normal(100, 15, n3).astype(np.float32) # flux in nJy
flux_err3 = np.full(n3, 5.0, dtype=np.float32)
band3 = np.array([i % 6 for i in range(n3)]) # u=0, g=1, r=2, i=3, z=4, Y=5
emb3 = model_atcat(time3, flux3, flux_err3, band3)
print(f"Output shape: {emb3.shape} # (1, 1, 1, {emb3.shape[-1]})")
ATCAT loaded. Max sequence length: 243 Output shape: (1, 1, 1, 384) # (1, 1, 1, 384)
/home/runner/work/light-curve-python/light-curve-python/light-curve/light_curve/embed/model.py:60: ExperimentalWarning: light_curve.embed.atcat.ATCAT is experimental and may change in future versions warn_experimental(
Notes¶
- Embeddings have shape
(n_bands, n_subsamples, seq_windows, embed_dim). Use.squeeze()to get a flat vector for a single object. huggingface_hubis only needed for automatic downloads viafrom_hf(). If you already have the ONNX file, it is not required.
Next steps¶
- Similarity search — nearest-neighbour retrieval with embeddings
- Classification — training a classifier on embeddings
- onnxruntime tips — thread control on shared HPC nodes, GPU/CUDA setup
- API reference