Feature extractors: basics¶

This tutorial introduces the light_curve feature extractor interface: creating features, combining them with Extractor, reading names and descriptions, and batch processing.

In [1]:

Copied!

# %pip install light-curve nested-pandas polars universal-pathlib
# %pip install light-curve nested-pandas polars universal-pathlib

Single feature¶

Each feature class is callable. It accepts (t, m, sigma) arrays and returns a NumPy array. The .names attribute lists the output column names.

Here we use Amplitude — the half peak-to-peak amplitude:

In [2]:

Copied!





import light_curve as licu
import numpy as np

rng = np.random.default_rng(42)
t = np.sort(rng.uniform(0, 100, 200))
m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)
err = np.full(200, 0.05)

amp = licu.Amplitude()
result = amp(t, m, err)
print(f'names:  {amp.names}')
print(f'result: {result}')
import light_curve as licu
import numpy as np

rng = np.random.default_rng(42)
t = np.sort(rng.uniform(0, 100, 200))
m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)
err = np.full(200, 0.05)

amp = licu.Amplitude()
result = amp(t, m, err)
print(f'names:  {amp.names}')
print(f'result: {result}')

names:  ['amplitude']
result: [0.40518549]

.descriptions gives a human-readable explanation of each output:

In [3]:

Copied!

import light_curve as licu

f = licu.EtaE()
print(f.descriptions)
import light_curve as licu

f = licu.EtaE()
print(f.descriptions)

['generalised Von Neummann eta for irregular time-series']

Combining features with `Extractor`¶

Extractor combines multiple features into a single callable evaluated in one pass. It is especially efficient for cheap features (statistical moments, variability indices, etc.) because it avoids some computations and reduces Python–Rust call overhead:

In [4]:

Copied!





import light_curve as licu
import numpy as np

rng = np.random.default_rng(42)
n = 200
t = np.sort(rng.uniform(0, 100, n))
m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.1, n)
err = np.full(n, 0.1)

ext = licu.Extractor(
    licu.InterPercentileRange(quantile=0.25),
    licu.BeyondNStd(nstd=1),
    licu.BeyondNStd(nstd=2),
    licu.StandardDeviation(),
    licu.WeightedMean(),
    licu.LinearFit(),
    licu.StetsonK(),
)
result = ext(t, m, err)
for name, value in zip(ext.names, result):
    print(f'  {name:35s} = {value:.5f}')
import light_curve as licu
import numpy as np

rng = np.random.default_rng(42)
n = 200
t = np.sort(rng.uniform(0, 100, n))
m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.1, n)
err = np.full(n, 0.1)

ext = licu.Extractor(
    licu.InterPercentileRange(quantile=0.25),
    licu.BeyondNStd(nstd=1),
    licu.BeyondNStd(nstd=2),
    licu.StandardDeviation(),
    licu.WeightedMean(),
    licu.LinearFit(),
    licu.StetsonK(),
)
result = ext(t, m, err)
for name, value in zip(ext.names, result):
    print(f'  {name:35s} = {value:.5f}')

  inter_percentile_range_25           = 0.38027
  beyond_1_std                        = 0.37500
  beyond_2_std                        = 0.02000
  standard_deviation                  = 0.23862
  weighted_mean                       = 14.99949
  linear_fit_slope                    = -0.00081
  linear_fit_slope_sigma              = 0.00025
  linear_fit_reduced_chi2             = 5.66968
  stetson_K                           = 0.85607

Batch processing with `.many()`¶

.many() processes a list of (t, m, sigma) tuples and returns a 2-D NumPy array (shape (N, n_features)). It supports multi-threading (enabled by default via the n_jobs parameter) and is the preferred path for large datasets:

In [5]:

Copied!





import light_curve as licu
import numpy as np

rng = np.random.default_rng(0)
light_curves = [
    (np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)
    for _ in range(1000)
]

feature = licu.Extractor(licu.Skew(), licu.Kurtosis(), licu.ReducedChi2())
results = feature.many(light_curves)
print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')
for name, col in zip(feature.names, results.T):
    print(f'  {name:20s} mean = {col.mean():.4f}')
import light_curve as licu
import numpy as np

rng = np.random.default_rng(0)
light_curves = [
    (np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)
    for _ in range(1000)
]

feature = licu.Extractor(licu.Skew(), licu.Kurtosis(), licu.ReducedChi2())
results = feature.many(light_curves)
print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')
for name, col in zip(feature.names, results.T):
    print(f'  {name:20s} mean = {col.mean():.4f}')

Extracted from 1000 light curves: shape = (1000, 3)
  skew                 mean = 0.0003
  kurtosis             mean = -1.1438
  chi2                 mean = 2693.5912

Batch processing with nested-pandas¶

nested-pandas stores each light curve as a nested Arrow column, letting .many() consume it with zero copies. The generate_data helper creates a toy NestedFrame — its nested column holds t, flux, and band fields:

In [6]:

Copied!

# %pip install light-curve nested-pandas
# %pip install light-curve nested-pandas

In [7]:

Copied!





import light_curve as licu
import pyarrow as pa
from nested_pandas.datasets import generate_data

ndf = generate_data(100, 50, seed=42)

feature = licu.Extractor(licu.ObservationCount(bands=["g", "r"]), licu.StandardDeviation(bands=["g", "r"]))
result = feature.many(pa.array(ndf["nested"]), arrow_fields={"t": "t", "m": "flux", "band": "band"})
ndf = ndf.assign(**dict(zip(feature.names, result.T)))
ndf[["a", "b", *feature.names]].head()
import light_curve as licu
import pyarrow as pa
from nested_pandas.datasets import generate_data

ndf = generate_data(100, 50, seed=42)

feature = licu.Extractor(licu.ObservationCount(bands=["g", "r"]), licu.StandardDeviation(bands=["g", "r"]))
result = feature.many(pa.array(ndf["nested"]), arrow_fields={"t": "t", "m": "flux", "band": "band"})
ndf = ndf.assign(**dict(zip(feature.names, result.T)))
ndf[["a", "b", *feature.names]].head()

Out[7]:

	a	b	observation_count_g	observation_count_r	standard_deviation_g	standard_deviation_r
0	0.374540	0.062858	29.0	21.0	28.647131	29.084164
1	0.950714	1.272821	27.0	23.0	26.316460	32.761964
2	0.731994	0.628712	23.0	27.0	29.437034	27.870809
3	0.598658	1.017141	22.0	28.0	27.637694	28.605570
4	0.156019	1.815133	23.0	27.0	26.461527	28.822292

5 rows × 6 columns

Multiband light curves¶

Every feature accepts a bands= constructor argument to switch into per-passband mode. When set, __call__ expects a fourth band string array; outputs are named with a passband suffix (e.g. amplitude_g, amplitude_r).

Extractor freely mixes single-band and multiband features — it filters the band array automatically for each sub-feature:

In [8]:

Copied!





import light_curve as licu
import numpy as np

rng = np.random.default_rng(42)
t = np.sort(rng.uniform(0, 100, 200))
m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)
err = np.full(200, 0.05)
band = np.tile(["g", "r"], 100)

sk = licu.StetsonK(bands=["g", "r"])
print("StetsonK per band:", dict(zip(sk.names, sk(t, m, err, band))))

ext = licu.Extractor(
    licu.EtaE(bands=["g", "r"]),
    licu.LinearFit(bands=["g", "r"]),
    licu.WeightedMean(),
)
result = ext(t, m, err, band)
for name, val in zip(ext.names, result):
    print(f"  {name:35s} = {val:.4f}")
import light_curve as licu
import numpy as np

rng = np.random.default_rng(42)
t = np.sort(rng.uniform(0, 100, 200))
m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)
err = np.full(200, 0.05)
band = np.tile(["g", "r"], 100)

sk = licu.StetsonK(bands=["g", "r"])
print("StetsonK per band:", dict(zip(sk.names, sk(t, m, err, band))))

ext = licu.Extractor(
    licu.EtaE(bands=["g", "r"]),
    licu.LinearFit(bands=["g", "r"]),
    licu.WeightedMean(),
)
result = ext(t, m, err, band)
for name, val in zip(ext.names, result):
    print(f"  {name:35s} = {val:.4f}")

StetsonK per band: {'stetson_K_g': np.float64(0.8993250473662492), 'stetson_K_r': np.float64(0.88314411217402)}
  eta_e_g                             = 0.4937
  eta_e_r                             = 1.1176
  linear_fit_slope_g                  = -0.0010
  linear_fit_slope_sigma_g            = 0.0002
  linear_fit_reduced_chi2_g           = 20.8226
  linear_fit_slope_r                  = -0.0008
  linear_fit_slope_sigma_r            = 0.0002
  linear_fit_reduced_chi2_r           = 18.8707
  weighted_mean                       = 14.9987

Next steps¶

Feature table — all 40+ extractors grouped by category
API reference — full signatures and equations
Periodogram tutorial — Lomb–Scargle and period search
Multiband tutorial — per-band and cross-band features
Rainbow fit tutorial — blackbody temperature and radius evolution
Batch processing tutorial — nested-pandas with real survey data, PyArrow, Polars