Skip to content

Embeddings API

light_curve.embed.EmbeddingSession

Bases: abc.ABC

Abstract base for ONNX-backed embedding models.

Subclasses implement :meth:preprocess_lc (convert raw arrays to model tensors) and :meth:predict_tensors (run the session and return embeddings).

Parameters:

Name Type Description Default
session InferenceSession

An onnxruntime.InferenceSession (or any object with a compatible .run() interface).

required
reduction str, list of str, or Reduction

Strategy for mapping variable-length light curves to fixed-length sequences.

required
reduction_kwargs dict

Extra keyword arguments forwarded to :func:reduction_from_str when reduction is given as a string.

None

from_hf classmethod

Load a model from the HuggingFace Hub.

Downloads (and caches) the ONNX model file, creates an onnxruntime.InferenceSession, and returns a ready-to-use instance.

Parameters:

Name Type Description Default
filename str

Path to the model file inside self.hf_repo.

required
ort_session_kwargs dict[str, object] | None

Options to pass to the onnxruntime.InferenceSession constructor: "sess_options", "providers", "provider_options".

None
**kwargs

Forwarded verbatim to the class constructor

required

Returns:

Type Description
instance of the calling class

Instance with a live ONNX inference session.

Raises:

Type Description
ImportError

If huggingface_hub is not installed.

ImportError

If no onnxruntime variant is installed.

predict_tensors

Run the ONNX session on pre-processed tensors and return embeddings.

Parameters:

Name Type Description Default
tensors InputTensors

Pre-processed model inputs as returned by :meth:preprocess_lc.

required

Returns:

Type Description
ndarray

Embedding array with shape depending on the model and time reduction.

preprocess_lc

Convert raw light curve arrays to model input tensors.

Parameters:

Name Type Description Default
*arrays array - like

Raw light curve arrays (e.g. time, magnitude).

required

Returns:

Type Description
InputTensors

Tensors ready to be passed to :meth:predict_tensors.

light_curve.embed.SingleBandModel

Bases: light_curve.embed.model.EmbeddingSession, abc.ABC

Embedding model that processes one photometric band at a time.

When bands is None the full light curve is treated as a single band. When bands is provided, the light curve is split by band label, each band is embedded independently, and the results are concatenated along :attr:Dim.BAND.

Parameters:

Name Type Description Default
session InferenceSession

ONNX inference session.

required
bands sequence of str or int

Ordered band labels to embed. None treats the whole light curve as one band.

None
reduction str, list of str, or Reduction

Windowing / subsampling strategy. Defaults to "non-overlapping-windows".

'non-overlapping-windows'
reduction_kwargs dict

Extra kwargs forwarded to :func:reduction_from_str.

None

light_curve.embed.Astromer1

Bases: light_curve.embed.astromer._AstromerModel

Astromer 1 embedding model.

Transformer encoder pretrained on MACHO R-band light curves via masked magnitude prediction. Accepts single-band photometry and returns a 256-dimensional embedding (2 layers, 4 attention heads).

The ONNX model is hosted on HuggingFace at https://huggingface.co/light-curve/astromer1 (astromer1.onnx). Three named outputs are available; select with the output parameter:

  • "mean" (default) — masked mean pooling → shape (batch, 256)
  • "max" — masked max pooling → shape (batch, 256)
  • "sequence" — per-timestep features → shape (batch, 200, 256)

Use :meth:from_hf to download and load the model directly.

Model license

MIT.

References

Donoso-Oliva et al. (2023), ASTROMER: A transformer-based embedding for the representation of light curves, A&A 670, A54. https://ui.adsabs.harvard.edu/abs/2023A%26A...670A..54D/abstract

Parameters:

Name Type Description Default
session

ONNX inference session for the Astromer 1 model file.

required
output str

Which named output to return: "mean", "max", or "sequence". Defaults to "mean".

'mean'
bands sequence of str or int

Band labels. None (default) treats the whole light curve as one band.

None
reduction str, list of str, or Reduction

Windowing strategy. Defaults to :class:NonOverlappingWindows.

'non-overlapping-windows'

hf_filename = 'astromer1.onnx' class-attribute

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

hf_repo = 'light-curve/astromer1' class-attribute

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

light_curve.embed.Astromer2

Bases: light_curve.embed.astromer._AstromerModel

Astromer 2 embedding model.

Pretrained on 1.5 million MACHO light curves. Accepts single-band photometry and returns a 256-dimensional embedding.

The ONNX model is hosted on HuggingFace at https://huggingface.co/light-curve/astromer2 (astromer2.onnx). Three named outputs are available; select with the output parameter:

  • "mean" (default) — masked mean pooling → shape (batch, 256)
  • "max" — masked max pooling → shape (batch, 256)
  • "sequence" — per-timestep features → shape (batch, 200, 256)

Use :meth:from_hf to download and load the model directly.

Model license

MIT.

References

Donoso-Oliva et al. (2026), Generalizing across astronomical surveys: Few-shot light curve classification with Astromer 2, A&A 707, A170. https://ui.adsabs.harvard.edu/abs/2026A%26A...707A.170D/abstract

Parameters:

Name Type Description Default
session

ONNX inference session for the Astromer 2 model file.

required
output str

Which named output to return: "mean", "max", or "sequence". Defaults to "mean".

'mean'
bands sequence of str or int

Band labels. None (default) treats the whole light curve as one band.

None
reduction str, list of str, or Reduction

Windowing strategy. Defaults to :class:NonOverlappingWindows, which matches the sequential-window preprocessing used to produce the reference embeddings on HuggingFace.

'non-overlapping-windows'

hf_filename = 'astromer2.onnx' class-attribute

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

hf_repo = 'light-curve/astromer2' class-attribute

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

light_curve.embed.ATCAT

Bases: light_curve.embed.model.ExplicitMultiBandModel

ATCAT multiband transformer embedding model.

ATCAT (Astronomical Transformer for Classification and Analysis of Transients) is a transformer-based model trained on LSST-like multiband light curves. It accepts flux, flux-error, time, and integer channel-index arrays and produces dense embeddings.

The model expects fluxes calibrated to AB zero-point 27.5 (ELAsTiCC / SNANA FITS convention). Use mag_zp to convert from a different zero-point at call time — common values are 31.4 (LSST nJy) and 8.9 (Jy).

Valid model band indices are 0–5, corresponding to LSST u g r i z Y. Pass a band_groups dict (e.g. {"u": 0, "g": 1, ...}) to use string band labels instead of integers.

Parameters:

Name Type Description Default
session InferenceSession

An onnxruntime.InferenceSession for the ATCAT ONNX model.

required
output (last, mean, sequence)

Which model head to return:

  • "last" — embedding of the last valid timestep, output shape (n_band_groups, n_subsamples, 1, 384)
  • "mean" — masked mean pooling over valid timesteps, output shape (n_band_groups, n_subsamples, 1, 384)
  • "sequence" — per-timestep embeddings, output shape (n_band_groups, n_subsamples, 243, 384)
"last"
band_groups Mapping, list of Mapping, or None

Band label → model integer mapping(s). See :class:~light_curve.embed.model.ExplicitMultiBandModel for details.

None
allow_extra_bands bool

If False (default), raises :exc:ValueError when the input contains band labels not in band_groups (or not in 0–5 when band_groups is None).

False
reduction str, list of str, or Reduction

Windowing / subsampling strategy. Defaults to "non-overlapping-windows".

'non-overlapping-windows'
reduction_kwargs dict

Extra keyword arguments forwarded to :func:reduction_from_str.

None
mag_zp float

AB zero-point of the input fluxes. Fluxes are rescaled to ZP = 27.5 (ELAsTiCC / SNANA FITS convention) before inference. Common values: 31.4 (LSST nJy, default), 27.5 (no rescaling needed), 8.9 (Jy).

31.4

Examples:

>>> import numpy as np
>>> import light_curve.embed as lce
>>> model = lce.ATCAT.from_hf(
...     output="last",
...     band_groups={"u": 0, "g": 1, "r": 2, "i": 3, "z": 4, "Y": 5},
... )
>>> time = np.linspace(0, 200, 100, dtype=np.float32)
>>> flux = np.ones(100, dtype=np.float32)
>>> flux_err = np.full(100, 0.1, dtype=np.float32)
>>> band = np.array(["g", "r"] * 50)
>>> embedding = model(time, flux, flux_err, band)
>>> embedding.shape
(1, 1, 1, 384)
Model license

Modified MIT with a non-military-use restriction (upstream ATCAT license).

References

Tung (2025), ATCAT: Astronomical Timeseries CAusal Transformer, arXiv:2511.00614. https://ui.adsabs.harvard.edu/abs/2025arXiv251100614T/abstract

hf_filename = 'atcat.onnx' class-attribute

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

hf_repo = 'light-curve/atcat' class-attribute

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

model_outputs = frozenset({'mean', 'last', 'sequence'}) class-attribute

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

seq_size = 243 class-attribute

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.int(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal.

int('0b100', base=0) 4

valid_model_bands = frozenset({0, 1, 2, 3, 4, 5}) class-attribute

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

from_hf classmethod

Load a model from the HuggingFace Hub.

Downloads (and caches) the ONNX model file, creates an onnxruntime.InferenceSession, and returns a ready-to-use instance. Only the requested output is computed at inference time — onnxruntime prunes the unused computation graph automatically.

Parameters:

Name Type Description Default
output str

Named ONNX output to return. One of:

  • "last" — embedding of the last valid timestep, output shape (band_groups, reductions, 1, 384)
  • "mean" (default) — masked mean pooling over valid timesteps, output shape (band_groups, reductions, 1, 384)
  • "sequence" — per-timestep embeddings (no aggregation), output shape (band_groups, reductions, 243, 384)
'last'
use_fp16 bool

Whether to load the model in float16 precision if supported. Defaults to False (float32) for maximum compatibility; set to True to use original model precision and reduce memory usage if your hardware supports it.

False
band_groups sequence of str or int or None

Ordered band labels to embed. None (default) treats the whole light curve as one band.

None
allow_extra_bands bool

If False (default), raises an error if the input light curve contains any bands which may not be used by the model: either bands specified as dict keys in band_groups or integers 0,1,2,3,4,5 (LSST ugrizy) if band_groups is None.

False
reduction str, list of str, or Reduction

Windowing / subsampling strategy. Defaults to "non-overlapping-windows".

'non-overlapping-windows'
reduction_kwargs dict or None

Extra keyword arguments forwarded to :func:reduction_from_str when reduction is given as a string.

None
mag_zp float

AB zero-point of the input fluxes. Fluxes are rescaled to ZP = 27.5 (ELAsTiCC / SNANA FITS convention) before inference. Common values: 31.4 (LSST nJy, default), 27.5 (no rescaling needed), 8.9 (Jy).

31.4
ort_session_kwargs dict or None

Additional keyword arguments forwarded to onnxruntime.InferenceSession: "sess_options", "providers", "provider_options".

None

Returns:

Type Description
instance of the calling class

Instance with a live ONNX inference session.

Raises:

Type Description
ValueError

If output is not one of the recognized output names.

ImportError

If huggingface_hub is not installed.

ImportError

If no onnxruntime variant is installed.

predict_tensors

Run the ONNX model on pre-processed tensors and return reduced embeddings.

Parameters:

Name Type Description Default
tensors ATCATInputs

As returned by :meth:preprocess_lc.

required

Returns:

Type Description
np.ndarray, shape ``(n_subsamples, seq_size, embed_dim)``

Embeddings after applying the time reduction's aggregation. For aggregated models (mean / last) seq_size is 1.

preprocess_lc

Preprocess a light curve into Astromer model input tensors.

Parameters:

Name Type Description Default
time ArrayLike

Observation times in days (e.g. MJD).

required
flux ArrayLike

AB-calibrated bandflux, zero-point is given by self.mag_zp.

required
flux_err ArrayLike

Uncertainties on the fluxes, in the same units as flux.

required
band ArrayLike

Passband labels, 0,1,2,3,4,5 (LSST ugrizy).

required

Returns:

Type Description
ATCATInputs

Reduction strategies

light_curve.embed.Beginning

Bases: light_curve.embed.reduction.SingleSubsampleReduction

Select the chronologically first seq_size observations of the light curve.

single_subsample_lc

Return the leading seq_size elements of each array.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Number of observations to keep from the start.

required

Returns:

Type Description
tuple of np.ndarray

First seq_size elements of each input array.

light_curve.embed.End

Bases: light_curve.embed.reduction.SingleSubsampleReduction

Select the chronologically last seq_size observations of the light curve.

single_subsample_lc

Return the trailing seq_size elements of each array.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Number of observations to keep from the end.

required

Returns:

Type Description
tuple of np.ndarray

Last seq_size elements of each input array.

light_curve.embed.RandomSubsample

Bases: light_curve.embed.reduction.SingleSubsampleReduction

Draw seq_size observations uniformly at random without replacement.

Parameters:

Name Type Description Default
rng int, np.random.Generator, or None

Seed or generator for reproducible sampling.

required

single_subsample_lc

Return a random subsample of at most seq_size observations, in original order.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Maximum number of observations to sample.

required

Returns:

Type Description
tuple of np.ndarray

min(len, seq_size) randomly selected observations, sorted by original index so temporal order is preserved.

light_curve.embed.NonOverlappingWindows

Bases: light_curve.embed.reduction.Reduction

Split the light curve into consecutive non-overlapping windows of seq_size observations.

A light curve of length L yields ceil(L / seq_size) windows; the last window may be shorter than seq_size and is zero-padded. Per-window embeddings are averaged to produce a single embedding per light curve.

reduce_embeddings

Reduce per-window embeddings to a single representation.

For aggregated outputs (output != "sequence") the window embeddings are averaged, yielding shape (1, 1, embed_dim).

For output == "sequence" a masked mean is computed across windows for each timestep position, yielding shape (1, seq_size, embed_dim) regardless of how many windows the light curve was split into.

Parameters:

Name Type Description Default
embeddings np.ndarray, shape ``(n_windows, seq_size, embed_dim)``

Per-window embeddings.

required
tensors InputTensors

Preprocessed input tensors; tensors.bool_mask (shape (n_windows, seq_size)) identifies valid vs. padded positions for the "sequence" output.

required
output str

Model output name. Determines aggregation strategy.

required

Returns:

Type Description
np.ndarray, shape ``(1, 1, embed_dim)`` or ``(1, seq_size, embed_dim)``

For mean / max: mean over windows, shape (1, 1, embed_dim). For sequence: masked mean over windows per timestep, shape (1, seq_size, embed_dim).

subsample_lc

Yield consecutive slices of length seq_size.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Window size.

required

Returns:

Type Description
list of tuple of np.ndarray

ceil(len / seq_size) windows, each a tuple of sliced arrays.

light_curve.embed.MultipleReductions

Bases: light_curve.embed.reduction.Reduction

Apply several :class:SingleSubsampleReduction strategies in parallel.

Each strategy produces one window; embeddings are stacked along the subsample axis rather than aggregated, giving one embedding per strategy.

Parameters:

Name Type Description Default
reductions list of SingleSubsampleReduction

Ordered list of strategies to apply.

required

Raises:

Type Description
ValueError

If any element of reductions is not a :class:SingleSubsampleReduction.

from_strings classmethod

Construct from a list of strategy name strings.

Parameters:

Name Type Description Default
reductions list of str

Strategy names recognised by :func:reduction_from_str.

required
**kwargs

Forwarded to each strategy constructor. If rng is an integer seed it is converted to a :class:numpy.random.Generator first so that each stochastic strategy gets an independent random stream.

required

Returns:

Type Description
MultipleReductions

Instance wrapping the instantiated strategies.

reduce_embeddings

Return embeddings unchanged — one per strategy, already stacked.

Parameters:

Name Type Description Default
embeddings array - like

Per-strategy embeddings from the model.

required
tensors InputTensors

Unused; accepted for interface compatibility.

required
output str

Unused; accepted for interface compatibility.

required

Returns:

Type Description
array - like

The input unchanged.

subsample_lc

Apply each strategy and return one window per strategy.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Maximum observations per window.

required

Returns:

Type Description
list of tuple of np.ndarray

One element per strategy, each a tuple of subsampled arrays.

light_curve.embed.SingleSubsampleReduction

Bases: light_curve.embed.reduction.Reduction, abc.ABC

Base for strategies that produce exactly one window per light curve.

reduce_embeddings

Return embeddings unchanged (single window — no aggregation needed).

Parameters:

Name Type Description Default
embeddings array - like

Per-window embeddings from the model.

required
tensors InputTensors

Unused; accepted for interface compatibility.

required
output str

Unused; accepted for interface compatibility.

required

Returns:

Type Description
array - like

The input unchanged.

single_subsample_lc

Return one subsampled window of at most seq_size observations.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Maximum number of observations to keep.

required

Returns:

Type Description
tuple of np.ndarray

Subsampled arrays, each of length <= seq_size.

subsample_lc

Return a single-element list wrapping :meth:single_subsample_lc.

Parameters:

Name Type Description Default
*arrays ndarray

1-D arrays of equal length.

required
seq_size int

Maximum observations per window.

required

Returns:

Type Description
list of tuple of np.ndarray

A one-element list containing the subsampled arrays.