I have recently entered 12th grade, and I've been obsessed with exoplanets for a while now. Not in a casual way — I mean the kind of obsession where you start wondering if you could just... build something that finds them.
So I did.
What even is an exoplanet classifier?
When a planet passes in front of a star, it blocks a tiny fraction of the star's light. Kepler spent 4 years staring at 150,000 stars looking for exactly that — those tiny dips. The result is thousands of light curves, each one a time series of a star's brightness over time.
Some dips are planets. A lot aren't — instrument noise, binary stars, other stuff. NASA labels them as confirmed, false positive, or candidate.
I wanted to see if a neural network could learn the difference.
What I built
A 1D CNN that takes a phase-folded light curve — 400 data points representing one orbit's worth of brightness — and outputs a probability: real planet or not.
Ended up hitting 0.96 ROC-AUC on the test set, which honestly surprised me.
The stuff that actually mattered
Most tutorials would've had me just throw data at a model and call it done. A few things made a real difference:
I excluded CANDIDATE labels entirely. They're unverified — could be planets, could be noise. Training on them just teaches the model to be confidently wrong.
I was careful about the train/val/test split. Easy to accidentally let information from the test set leak into training. Took a while to get this right.
Class weights saved me. Confirmed planets are rare — about 1% of the dataset. Without telling the model this, it just learned to predict "not a planet" for everything and got 99% accuracy.
Technically correct, completely useless.
The data pipeline runs 8 workers in parallel to fetch light curves from NASA's archive. This one was just satisfying to build.
There's a live demo
I made a Streamlit app where you can load the model and see it run on real test data — ROC curve, confusion matrix, the works. You can also see the actual light curves it's most confident about.
Code is here if you want to poke around:
What I want to try next
The precision on confirmed planets is still rough — there are only 5 confirmed planets in the test set vs 565 false positives, so even a good model looks bad on that metric. I want to try LSTMs, maybe attention, and eventually TESS data, which is more recent than Kepler.
If you know this space and think I'm doing something dumb, genuinely tell me. I'd rather know.

Top comments (0)