Evaluation of Industrial Factors

Typical Workflow (copy-paste ready)

start_date = '2010-01-01'
end_date   = '2023-03-31'
#%%
# Load data
from firefin.data.gateway import *
data = fetch_data(["open", "close", "volume","return_adj"])

close_price = data["close"].loc[start_date:end_date]
open_price = data["open"].loc[start_date:end_date]
volume = data["volume"].loc[start_date:end_date]

#%%
# Compute forward returns for 1, 5, 10 days
from firefin.core.eva_utils import *
# compute forward returns
fr = compute_forward_returns(open_price.shift(-1), [1, 5, 10])
#%%
# Construct a factor (shown here just for demonstration; the factor format is a T×N pd.DataFrame)
def ts_corr(x, y, window=10):
    """
    Wrapper function to estimate rolling correlations.
    :param x, y: pandas DataFrames.
    :param window: the rolling window.
    :return: a pandas DataFrame with the time-series min over the past 'window' days.
    """
    return x.rolling(window).corr(y)

factor = ts_corr(close_price, volume, 20)
#%%
# Evaluator for industrial factors; it can evaluate the parameters of a new factor. This tool is not yet fully refined.
from firefin.evaluation.industry.evaluator import Evaluator
mng = Evaluator(factor, fr)
#%%
# Get the factor’s IC backtest data
df_ic = mng.get_ic("pearson")
#%%
# Get the factor’s quantile return backtest data
df_qr = mng.get_quantile_returns(5)

Step 0. Prepare Two Types of Data

  • Shape: both should be T×N pandas.DataFrames; rows are dates, columns are stocks (or industry constituents).
  • Date index: ideally a DatetimeIndex, but it’s fine if not—Evaluator will convert automatically.
  • Data: build a forward-return dictionary where the key is the holding period in days, and the value is the DataFrame of returns.

Example:

# Set the backtest period
start_date = '2019-01-01'
end_date   = '2023-03-31'
# Pull data
from firefin.data.gateway import fetch_data
data = fetch_data(["open", "close", "volume","return_adj"])
close_price = data["close"].loc[start_date:end_date]
open_price  = data["open"].loc[start_date:end_date]
return_adj  = data["return_adj"].loc[start_date:end_date]
# Build the forward return dictionary
from firefin.core.eva_utils import compute_forward_returns
# Note shift(-1): align the "next day’s open" to the signal day
fr = compute_forward_returns(open_price.shift(-1), [1, 5, 10])  # returns a dict: {1: df, 5: df, 10: df}

Step 1. Use eva_utils to Compute “Forward Returns” and Understand IC/Quantiles

eva_utils provides the low-level evaluation capabilities that Evaluator uses directly.

  • compute_forward_returns(price, periods): given adjusted prices, compute forward returns for 1/5/10 days.
  • compute_ic(factor, forward_returns, method): IC = cross-sectional correlation (most commonly Spearman, i.e., RankIC).
  • compute_quantile_returns(factor, forward_returns, quantiles): portfolio returns grouped by factor quantiles.
  • summarise_ic(ic): summarize IC mean/standard deviation/IR/proportions, etc., into a table.

Step 2. Prepare the Factor (T×N)

# Construct a factor (shown here just for demonstration; the factor format is a T×N pd.DataFrame)
def ts_corr(x, y, window=10):
    """
    Wrapper function to estimate rolling correlations.
    :param x, y: pandas DataFrames.
    :param window: the rolling window.
    :return: a pandas DataFrame with the time-series min over the past 'window' days.
    """
    return x.rolling(window).corr(y)

factor = ts_corr(close_price, volume, 20)

Step 3. One-Click Evaluation

Evaluator wraps the core capabilities of eva_utils into two out-of-the-box tasks—IC and quantile portfolio returns—and plots them automatically.

from firefin.evaluation.industry.evaluator import Evaluator
# Initialize the evaluator class; provide two parameters: the factor DataFrame `factor` and the factor’s forward returns `fr`
evaluator = Evaluator(factor, fr)

Get IC-related test metrics for this factor:

evaluator.get_ic(method="spearman", plot=True) 
# methods: Spearman/Kendall/Pearson; the most common is Spearman

img

  • IC time series plots (the more stable the better; high IR, high proportion > 0; strong factors typically have average RankIC > 0).
  • Top-left three plots: 20-day moving averages of the IC between the factor values and the 1-, 5-, and 10-day forward returns.
  • Bottom-left plot: cumulative IC for 1/5/10 days, along with the IC mean, standard deviation, and IR (IC mean / standard deviation).
  • Middle three plots: distributions of IC values—top to bottom correspond to 1-, 5-, and 10-day ICs (the bottom of each shows the mean, kurtosis, and skewness).
  • Right-hand three plots: IC normality (values above the red line indicate probabilities exceeding those under a normal distribution in that region; if both tails are above, it suggests the IC may take more extreme values).

Get the factor’s grouped return series metrics:

# Get the factor’s quantile return backtest data
df_qr = evaluator.get_quantile_returns(5)

img

This figure shows the factor’s quantile returns.

The top, middle, and bottom regions correspond to the 1-, 5-, and 10-day return charts, respectively.

The left side shows the return curves for the specified period start_date = '2019-01-01', end_date = '2023-03-31'.

The right side shows the return curves for the most recent three years.

Each quantile (five quantiles plus a max-minus-min spread portfolio) is marked in a different color. The top-left of each chart shows the annualized return (ANN) and the total return over the backtest period.

Other Functions

Compute Forward Return

compute_forward_returns function

Used to compute forward returns for assets under different holding periods.

  • Computes forward returns for specified holding periods
  • Supports multiple holding periods (e.g., 1, 5, 10, 20 days)
  • Uses log returns and then converts to simple returns
# Compute quantile returns for a value factor
value_factor = calculate_value_factor()  # assume this computes the value factor
forward_returns = compute_forward_returns(close_prices, [1, 5, 20])

Compute Quantile Return

compute_quantile_returns function

This is a core function in factor investing, used to compute quantile returns after grouping by factor values. Functionality:

  • Group factor values into quantiles (5-quantile by default)
  • Compute the average return for each quantile
  • Supports weighting (e.g., market-cap weighting)
  • Supports multiple holding periods
# 5-quantile analysis
quantile_returns = compute_quantile_returns(value_factor, forward_returns, quantiles=5)

# 10-quantile analysis (more granular)
quantile_returns_10 = compute_quantile_returns(value_factor, forward_returns, quantiles=10)

# View the return spread between the highest and lowest quantiles
high_low_spread = quantile_returns[1][5] - quantile_returns[1][1]

Industrial applications:

  • Factor stratification analysis: compare performance of high-factor vs. low-factor stocks
  • Long–short strategy construction: build long-short portfolios
  • Factor efficacy evaluation: judge predictive power via quantile return spreads

Winsorizer Class

An important tool for handling outliers, widely used in financial data preprocessing.

MAD_winsorization — Median Absolute Deviation (MAD) Method

Description:

  • Uses median absolute deviation (MAD) to identify outliers
  • More robust than the mean–standard deviation method; less sensitive to extreme values
# Handle outliers using the MAD method
cleaned_factor = Winsorizer.MAD_winsorization(factor_exposure, scaled=True, k=3)

Industrial applications:

  • Handling outliers in financial data
  • Data cleaning before factor standardization
  • Outlier processing in risk models
sigma_winsorization — K-σ Rule

Description:

  • Uses mean ± K×standard deviation as truncation bounds
  • A classic statistical outlier handling method
# 3-sigma rule
cleaned_factor_sigma = Winsorizer.sigma_winsorization(factor_exposure, k=3)

# 2.5-sigma rule (more aggressive)
cleaned_factor_sigma_aggressive = Winsorizer.sigma_winsorization(factor_exposure, k=2.5)

Industrial applications:

  • Outlier handling for return data
  • Standardizing factor exposures
  • Computing risk metrics
percentile_winsorization — Percentile Truncation

Description:

  • Uses specified percentiles as truncation bounds
  • The most straightforward outlier handling method

Key parameters:

  • percentile: truncation percentiles, default (0.01, 0.99)
  • set_outlier_nan: whether to set outliers to NaN
# Truncate at the 1% and 99% percentiles
cleaned_factor_percentile = Winsorizer.percentile_winsorization(
    factor_exposure, 
    percentile=(0.01, 0.99)
)

# Truncate at the 5% and 95% percentiles (more conservative)
cleaned_factor_percentile_conservative = Winsorizer.percentile_winsorization(
    factor_exposure, 
    percentile=(0.05, 0.95)
)

# Set outliers to NaN instead of truncating
cleaned_factor_nan = Winsorizer.percentile_winsorization(
    factor_exposure, 
    percentile=(0.01, 0.99),
    set_outlier_nan=True
)

Industrial applications:

  • Standardizing factor data
  • Data cleaning before portfolio construction
  • Outlier handling in backtesting systems