Does Your Model Have What It Takes?

Competition format: RALLY

Join the DataCrunch rally, predict the market and shape the future of a market neutral hedge fund.


supported by

Market Noise: Can Your Predictions Cut Through?

The Cross-Section Forecast Problem

The cross-sectional forecast problem involves monitoring a predefined set of investment vehicles, or "the universe," such as the S&P 500's top companies, at various times. The competition's aim is to accurately predict the relative performance of these assets, for each date. Predictions are evaluated using a risk-adjusted metric, taking into account consistency and Portfolio Management constraints.

Avoiding Overfitting

Overfitting in finance, where models capture noise instead of genuine market signals, severely impairs future predictive accuracy. Competitors must prioritize model simplicity and generalization to prevent it, employing strategies like cross-validation and regularization. The true challenge lies in crafting models that perform well on unseen data, not just historical patterns. Avoiding overfitting is crucial for developing robust, actionable financial insights.

Navigating with Obfuscated Data

DataCrunch provides its community with curated, high-quality obfuscated datasets, ensuring ethical use of institutional data while minimizing model bias. This approach promotes objective analysis, encouraging unbiased, innovative solutions in data science. This approach promotes domain agnostic and objective analysis, encouraging unbiased, innovative solutions in data science.
workshops + discussion
and much more.
Join Now

First Round
Prize Pool

3125 $CRUNCH



Key elements

Access Documentation
The dataset is a weekly stock market snapshot, capturing the ups and downs of market dynamics over 8 years.

With over 700 features, this dataset provides a comprehensive view of stocks. These features are thoughtfully organised into 6 distinct groups, each offering a unique perspective on market behaviour.
Weekly observations: Each row in the dataset corresponds to a particular stock attribute on a particular date.
Refined attributes: The Datacrunch team has carefully curated over 1100 attributes. These attributes cover a wide range and provide a different insight into a stock at a given date.
Industry Feature: An outstanding feature is the Feature_Industry. It reveals the underlying structure of stocks by assigning them to specific categories. This insight can guide your modelling work.
Training and Scoring Sets
Training Set:
Containing approximately 6 years of data, this set serves as a playground for cross-validation.
Participants can fine-tune their models, experiment with different algorithms and validate their predictions against historical data.
Out-of-sample set:
This subset spans approximately 2 years.
It's the real thing - the period during which competitors' predictions are evaluated.
The dataset is tabular and deliberately obfuscated. We've masked certain details to ensure obfuscation of propriatary data.
4 Weeks
(March 15th - April 15th)

Submission Phase

4 Weeks
(April 15th - May 10th)

Scoring period

May 23rd

Awards Ceremony

About DataCrunch

DataCrunch uses the quantitative research of the CrunchDAO to manage its systematic equity long-short portfolio.

The Alternative Risk Premia portfolio is rebalanced weekly with daily calibrations, targeting low exposure to BARRA's multi-factor risk model (Market, Size, Momentum…) and turnover.