Batch processing¶
This tutorial covers the .many() method for efficient bulk feature extraction:
- Plain Python lists of
(t, m, sigma)tuples - nested-pandas with real ZTF survey data
- PyArrow
List<Struct>arrays - Polars Series
All Arrow-compatible inputs avoid Python-level iteration and pass data to Rust with zero copies.
In [1]:
Copied!
# %pip install light-curve
# %pip install light-curve
Plain list of tuples¶
.many() accepts a list of (t, m, sigma) tuples and returns a 2-D NumPy array of shape
(N, n_features). Multi-threading is enabled by default via the n_jobs parameter:
In [2]:
Copied!
import light_curve as licu
import numpy as np
rng = np.random.default_rng(0)
light_curves = [
(np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)
for _ in range(1000)
]
results = licu.Amplitude().many(light_curves)
print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')
print(f'Mean amplitude = {results.mean():.4f} mag')
import light_curve as licu
import numpy as np
rng = np.random.default_rng(0)
light_curves = [
(np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)
for _ in range(1000)
]
results = licu.Amplitude().many(light_curves)
print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')
print(f'Mean amplitude = {results.mean():.4f} mag')
Extracted from 1000 light curves: shape = (1000, 1) Mean amplitude = 0.4806 mag
nested-pandas with ZTF survey data¶
nested-pandas extends pandas with nested Arrow column support, useful for catalog data such as ZTF or Rubin LSST.
In [3]:
Copied!
# %pip install light-curve nested-pandas s3fs universal-pathlib
# %pip install light-curve nested-pandas s3fs universal-pathlib
In [4]:
Copied!
import light_curve as licu
import nested_pandas as npd
import numpy as np
import pyarrow as pa
from upath import UPath
s3_path = UPath(
"s3://ipac-irsa-ztf/contributed/dr23/lc/hats/ztf_dr23_lc-hats/dataset/Norder=6/Dir=30000/Npix=34623/part0.snappy.parquet",
anon=True,
)
ndf = npd.read_parquet(
s3_path,
columns=["objectid", "lightcurve.hmjd", "lightcurve.mag", "lightcurve.magerr"],
)
ndf = ndf.loc[ndf["lightcurve"].list_lengths > 10]
ndf["lightcurve.t"] = np.asarray(ndf["lightcurve.hmjd"] - 58000, dtype=np.float32)
feature = licu.Extractor(licu.Chi2Pvar(), licu.InterPercentileRange(quantile=0.25), licu.LinearFit())
result = feature.many(pa.array(ndf["lightcurve"]), n_jobs=-1,
arrow_fields={"t": "t", "m": "mag", "sigma": "magerr"})
ndf = ndf.assign(**dict(zip(feature.names, result.T)))
ndf.head()
import light_curve as licu
import nested_pandas as npd
import numpy as np
import pyarrow as pa
from upath import UPath
s3_path = UPath(
"s3://ipac-irsa-ztf/contributed/dr23/lc/hats/ztf_dr23_lc-hats/dataset/Norder=6/Dir=30000/Npix=34623/part0.snappy.parquet",
anon=True,
)
ndf = npd.read_parquet(
s3_path,
columns=["objectid", "lightcurve.hmjd", "lightcurve.mag", "lightcurve.magerr"],
)
ndf = ndf.loc[ndf["lightcurve"].list_lengths > 10]
ndf["lightcurve.t"] = np.asarray(ndf["lightcurve.hmjd"] - 58000, dtype=np.float32)
feature = licu.Extractor(licu.Chi2Pvar(), licu.InterPercentileRange(quantile=0.25), licu.LinearFit())
result = feature.many(pa.array(ndf["lightcurve"]), n_jobs=-1,
arrow_fields={"t": "t", "m": "mag", "sigma": "magerr"})
ndf = ndf.assign(**dict(zip(feature.names, result.T)))
ndf.head()
Out[4]:
| objectid | lightcurve | chi2_pvar | inter_percentile_range_25 | linear_fit_slope | linear_fit_slope_sigma | linear_fit_reduced_chi2 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1248202100002710 |
|
0.000000 | 0.305351 | 0.001163 | 0.000185 | 5.509358 | ||||||||||||
| 1 | 1248202100002733 |
|
0.204448 | 0.216278 | 0.000245 | 0.000145 | 1.167804 | ||||||||||||
| 2 | 1248202100002739 |
|
0.734284 | 0.092379 | 0.000048 | 0.000083 | 0.818141 | ||||||||||||
| 5 | 1248202100002819 |
|
0.000120 | 0.042589 | 0.000031 | 0.000028 | 2.430087 | ||||||||||||
| 8 | 1248202100002918 |
|
0.008479 | 0.045643 | 0.000043 | 0.000040 | 1.825333 |
In [5]:
Copied!
# %pip install light-curve pyarrow
# %pip install light-curve pyarrow
In [6]:
Copied!
import light_curve as licu
import numpy as np
import pyarrow as pa
BANDS = ["g", "r"]
rng = np.random.default_rng(42)
n_lc, n_per_band = 200, 40
struct_type = pa.struct([
("t", pa.float64()),
("m", pa.float64()),
("band", pa.string()),
])
def make_lc():
rows = []
for b in BANDS:
t = rng.uniform(0, 100, n_per_band)
m = rng.normal(15.0 if b == "g" else 15.3, 0.3, n_per_band)
rows.extend({"t": float(ti), "m": float(mi), "band": b} for ti, mi in zip(t, m))
rows.sort(key=lambda r: r["t"])
return rows
lcs_arrow = pa.array([make_lc() for _ in range(n_lc)], type=pa.list_(struct_type))
feature = licu.Extractor(
licu.InterPercentileRange(quantile=0.1, bands=BANDS), # robust amplitude per band
licu.AndersonDarlingNormal(bands=BANDS), # normality test per band
licu.ColorOfMaximum(BANDS), # colour at brightness peak
licu.ColorOfMinimum(BANDS), # colour at brightness trough
)
result = feature.many(
lcs_arrow,
sorted=True,
arrow_fields={"t": "t", "m": "m", "band": "band"},
)
print(f"shape: {result.shape}") # (200, 6)
print("names:", feature.names)
import light_curve as licu
import numpy as np
import pyarrow as pa
BANDS = ["g", "r"]
rng = np.random.default_rng(42)
n_lc, n_per_band = 200, 40
struct_type = pa.struct([
("t", pa.float64()),
("m", pa.float64()),
("band", pa.string()),
])
def make_lc():
rows = []
for b in BANDS:
t = rng.uniform(0, 100, n_per_band)
m = rng.normal(15.0 if b == "g" else 15.3, 0.3, n_per_band)
rows.extend({"t": float(ti), "m": float(mi), "band": b} for ti, mi in zip(t, m))
rows.sort(key=lambda r: r["t"])
return rows
lcs_arrow = pa.array([make_lc() for _ in range(n_lc)], type=pa.list_(struct_type))
feature = licu.Extractor(
licu.InterPercentileRange(quantile=0.1, bands=BANDS), # robust amplitude per band
licu.AndersonDarlingNormal(bands=BANDS), # normality test per band
licu.ColorOfMaximum(BANDS), # colour at brightness peak
licu.ColorOfMinimum(BANDS), # colour at brightness trough
)
result = feature.many(
lcs_arrow,
sorted=True,
arrow_fields={"t": "t", "m": "m", "band": "band"},
)
print(f"shape: {result.shape}") # (200, 6)
print("names:", feature.names)
shape: (200, 6) names: ['inter_percentile_range_10_g', 'inter_percentile_range_10_r', 'anderson_darling_normal_g', 'anderson_darling_normal_r', 'color_max_g_r', 'color_min_g_r']
In [7]:
Copied!
# %pip install light-curve polars
# %pip install light-curve polars
In [8]:
Copied!
import light_curve as licu
import numpy as np
import polars as pl
BANDS = ["g", "r"]
rng = np.random.default_rng(42)
n_obj, n_per_band = 200, 40
object_id = np.repeat(np.arange(n_obj), n_per_band * len(BANDS))
band_col = np.tile(np.repeat(BANDS, n_per_band), n_obj)
t = np.sort(rng.uniform(0, 100, n_obj * n_per_band * len(BANDS)))
m = rng.normal(15.0, 0.3, len(object_id))
sigma = rng.uniform(0.01, 0.1, len(object_id))
df = pl.DataFrame({"object_id": object_id, "band": band_col, "t": t, "m": m, "sigma": sigma})
nested = df.group_by("object_id").agg(pl.struct("t", "m", "sigma", "band").alias("lc"))
feature = licu.Extractor(
licu.ExcessVariance(bands=BANDS), # variability excess over noise per band
licu.StetsonK(bands=BANDS), # variability index per band
licu.BeyondNStd(nstd=1.5, bands=BANDS), # outlier fraction per band
licu.ColorOfMedian(BANDS), # colour at median brightness
licu.ColorSpread(BANDS), # std dev of per-band means
)
result = feature.many(
nested["lc"],
arrow_fields={"t": "t", "m": "m", "sigma": "sigma", "band": "band"},
)
nested = nested.with_columns(
[pl.Series(name, result[:, i]) for i, name in enumerate(feature.names)]
)
nested.select(["object_id"] + feature.names)
import light_curve as licu
import numpy as np
import polars as pl
BANDS = ["g", "r"]
rng = np.random.default_rng(42)
n_obj, n_per_band = 200, 40
object_id = np.repeat(np.arange(n_obj), n_per_band * len(BANDS))
band_col = np.tile(np.repeat(BANDS, n_per_band), n_obj)
t = np.sort(rng.uniform(0, 100, n_obj * n_per_band * len(BANDS)))
m = rng.normal(15.0, 0.3, len(object_id))
sigma = rng.uniform(0.01, 0.1, len(object_id))
df = pl.DataFrame({"object_id": object_id, "band": band_col, "t": t, "m": m, "sigma": sigma})
nested = df.group_by("object_id").agg(pl.struct("t", "m", "sigma", "band").alias("lc"))
feature = licu.Extractor(
licu.ExcessVariance(bands=BANDS), # variability excess over noise per band
licu.StetsonK(bands=BANDS), # variability index per band
licu.BeyondNStd(nstd=1.5, bands=BANDS), # outlier fraction per band
licu.ColorOfMedian(BANDS), # colour at median brightness
licu.ColorSpread(BANDS), # std dev of per-band means
)
result = feature.many(
nested["lc"],
arrow_fields={"t": "t", "m": "m", "sigma": "sigma", "band": "band"},
)
nested = nested.with_columns(
[pl.Series(name, result[:, i]) for i, name in enumerate(feature.names)]
)
nested.select(["object_id"] + feature.names)
Out[8]:
shape: (200, 9)
| object_id | excess_variance_g | excess_variance_r | stetson_K_g | stetson_K_r | beyond_2_std_g | beyond_2_std_r | color_median_g_r | color_spread |
|---|---|---|---|---|---|---|---|---|
| i64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| 42 | 0.000477 | 0.000423 | 0.725276 | 0.685257 | 0.075 | 0.075 | -0.030723 | 0.008347 |
| 161 | 0.000328 | 0.000332 | 0.713015 | 0.773202 | 0.125 | 0.05 | -0.036714 | 0.042266 |
| 191 | 0.000426 | 0.000458 | 0.686024 | 0.678284 | 0.125 | 0.1 | 0.243003 | 0.136195 |
| 188 | 0.000251 | 0.000381 | 0.799913 | 0.628956 | 0.1 | 0.1 | -0.016024 | 0.023173 |
| 197 | 0.000271 | 0.000462 | 0.692505 | 0.657601 | 0.1 | 0.125 | -0.029821 | 0.029909 |
| … | … | … | … | … | … | … | … | … |
| 83 | 0.000351 | 0.000289 | 0.61676 | 0.738528 | 0.15 | 0.175 | -0.05787 | 0.084151 |
| 175 | 0.000302 | 0.000372 | 0.624619 | 0.613073 | 0.075 | 0.15 | -0.117935 | 0.060557 |
| 44 | 0.00048 | 0.00041 | 0.633319 | 0.74991 | 0.15 | 0.05 | 0.108399 | 0.005772 |
| 95 | 0.000245 | 0.000263 | 0.705973 | 0.784646 | 0.075 | 0.15 | 0.02017 | 0.026593 |
| 47 | 0.000514 | 0.000542 | 0.618895 | 0.710341 | 0.125 | 0.075 | -0.049602 | 0.062047 |
Next steps¶
- Feature basics tutorial — single features, Extractor, multiband intro
- Multiband tutorial — per-band and cross-band features
- Periodogram tutorial — Lomb–Scargle and period search
- API reference — full signatures and equations