Custom Calculator Workflow#
In “custom calculator” mode, you start from a Zero-Coder installation of Data Curator (i.e., you have already installed Data Curator, run kaxanuk.data_curator init excel, and populated Config/parameters_datacurator.xlsx as described in the Zero-Coder guide). Then, in addition to configuring providers, dates, tickers, and default output columns via Excel, you add one or more Python functions that generate extra columns on a per-row basis. Follow these steps to install (if you haven’t already), configure, and run Data Curator with your own calculations.
Prerequisites (Zero-Coder Setup)#
Before adding custom calculations, ensure you have completed the Zero-Coder steps.
Create a Python 3.12 Environment#
Use Conda or venv to isolate Data Curator’s dependencies.
Conda example (Windows/macOS/Linux):
conda create --name datacurator_env python=3.12
conda activate datacurator_env
venv example:
python3.12 -m venv datacurator_env
source datacurator_env/bin/activate # macOS/Linux
datacurator_env\Scripts\activate.bat # Windows
Install Data Curator via pip#
With the virtual environment active, run:
pip install --upgrade kaxanuk.data_curator
This installs Data Curator along with its dependencies (e.g., openpyxl, pandas, pyarrow, pandas_ta, etc.).
Initialize the Configuration#
Choose or create a project directory and move into it:
mkdir ~/data_curator_project
cd ~/data_curator_project
Run the initializer:
kaxanuk.data_curator init excel
After this command, your directory will contain:
__main__.pyConfig/(empty configuration folder)Output/(empty output folder)
Configure Data Curator (Zero-Coder Settings)#
Open Config/parameters_datacurator.xlsx and fill in the worksheets as follows:
Providers -
market_data_provider: select a market-data vendor. -market_data_api_key: enter its API key here (or leave blank to use.env). -fundamental_data_provider: select a fundamental-data vendor. -fundamental_data_api_key: enter its API key here (or leave blank to use.env).Date Range -
start_date(YYYY-MM-DD): first date of data fetch. -end_date(YYYY-MM-DD): last date of data fetch. -period: frequency (e.g.,1d,1w,1m).Instruments - List ticker symbols (one per row), e.g.,
AAPL,MSFT.Output Settings -
output_format: choose betweencsvorparquet. -logger_level: e.g.,INFO,DEBUG.Columns/Calculations - Tick the raw data columns you want (e.g.,
open,close,volume). - Under Predefined Calculations, tick any built-in features (e.g., “Simple Moving Average 5d”). - Under Custom Calculations, list any function names defined inConfig/custom_calculations.py(each prefixed withc_).
If you left any API keys blank in Excel, create or edit Config/.env:
KNDC_API_KEY_MARKET_DATA=<your_market_data_api_key>
KNDC_API_KEY_FUNDAMENTAL_DATA=<your_fundamental_data_api_key>
After saving parameters_datacurator.xlsx and (if needed) .env, you can run:
python /path/to/data_curator_project
to verify that Data Curator fetches default data and writes output into Output/.
Create Your Custom Calculation Function#
Data Curator looks for any Python function in Config/custom_calculations.py whose name begins with c_. Each such function is applied row-wise over the assembled dataset once the raw market/fundamental data has been collected. A custom function should:
Be defined in
Config/custom_calculations.py.Take as positional arguments the column names (as Pandas Series) needed for the computation.
Return a Pandas Series of the same length, with
NoneorNaNin rows where inputs are missing or the operation is undefined.
Locate the Custom Calculations File#
In your project directory, open:
Config/custom_calculations.py
This file already contains template functions and import statements. At the top you’ll see helper imports such as:
import pandas as pd
from datetime import datetime
from kaxanuk.data_curator.features.helpers import (
cumulative_return,
log_return,
...
)
Define a New Function#
Choose a clear, snake_case name prefixed with c_. For example, to compute a 10-day price difference, you might write:
def c_price_difference_10d(m_close: pd.Series) -> pd.Series:
"""
Returns the difference between the close price and its value 10 trading days ago.
Leaves first 10 rows as NaN.
"""
# Use Pandas to shift by 10 rows
return m_close - m_close.shift(10)
If you need multiple input columns, add them as separate parameters. For example:
def c_return_over_volume(m_close: pd.Series, m_volume: pd.Series) -> pd.Series:
"""
Returns the ratio of daily log returns to volume.
Rows with zero or missing volume will be NaN.
"""
# Compute the log return using a helper
log_ret = log_return(m_close)
# Avoid division by zero
return log_ret.where(m_volume != 0, None) / m_volume
Save your changes. Any function name not prefixed with c_ will be ignored.
Best Practices for Custom Functions#
Use only Pandas operations or existing helper functions for performance and consistency.
Handle missing data explicitly (e.g., avoid dividing by zero; propagate
NaNwhere appropriate).Document your function with a short docstring explaining inputs, outputs, and any edge-case behavior.
If you import new libraries (e.g.,
numpy), ensure they are already installed in your environment.
Add Your Custom Calculation to the Excel File#
After defining one or more functions in Config/custom_calculations.py, you must tell Data Curator to include them in the output.
Open Config/parameters_datacurator.xlsx#
Switch to the Columns/Calculations worksheet.
Under the Custom Calculations section, add each function name (including the
c_prefix) on its own row. For example, if your function is:def c_price_difference_10d(m_close: pd.Series) -> pd.Series: …
then enter:
c_price_difference_10d
Verify the Naming#
The Excel entry must exactly match the function name in
custom_calculations.py.Do not include parentheses or arguments—only the bare function name.
Save the Workbook#
Once you’ve added all desired custom-calculation names, save parameters_datacurator.xlsx.
If you are editing on macOS and don’t see hidden files (e.g., .env), press Command+Shift+Period in Finder dialogs to reveal them.
Run Data Curator with Custom Calculations#
With both Config/custom_calculations.py and Config/parameters_datacurator.xlsx updated, run:
python /path/to/data_curator_project
What happens under the hood:
Data Curator loads all raw data providers and writes default columns into memory.
It then imports
Config/custom_calculations.pyand looks for any functions whose names start withc_.For each such function, it calls the function with the specified input columns (as Pandas Series).
The returned Series is appended as a new column in the in-memory DataFrame.
Finally, Data Curator writes one output file per ticker under
Output/, with separate sheets (or sections) for: - Market data - Fundamental data - Dividends (if enabled) - Splits (if enabled) - Calculations (including your custom columns prefixedc_)
Troubleshooting & Tips#
No output for your custom column?
- Verify there are no syntax errors in custom_calculations.py.
- Ensure the function name appears under Custom Calculations in parameters_datacurator.xlsx.
- Check that the input column names you referenced (e.g., m_close, m_volume) match the raw-data columns exactly.
Getting many NaNs in your new column?
- By design, custom calculations propagate NaN for rows where inputs are missing or invalid.
- Review your logic to see if you need to “forward-fill” or otherwise handle gaps before applying the calculation.
Want to test a function interactively? 1. Open a Python REPL (or Jupyter Notebook) in the same virtual environment. 2. Run:
import pandas as pd # Load a small sample of raw data to a DataFrame df = pd.read_parquet("Output/AAPL_Market_and_Fundamental_Data.parquet", engine="pyarrow") from Config.custom_calculations import c_price_difference_10d # Apply it to the 'm_close' column sample = c_price_difference_10d(df["m_close"]) print(sample.head())
Reordering or renaming columns If you need to change the column order or rename your custom columns, do so in the Output Settings section of the Excel file before rerunning.
(Optional) Containerized Workflow#
If you prefer using containers (Podman/Docker) instead of installing locally, follow these steps once you’ve added your custom functions.
Pull and Run the Data Curator Image#
See the Zero-Coder Container Setup under “Pull the Data Curator Image” and “Run the Container for the First Time.” Ensure your host directory (containing Config/ and Output/) is mounted at /app inside the container.
Edit Custom Calculations Inside the Container#
In
Config/custom_calculations.py, create or modify your functions as described above.Update
Config/parameters_datacurator.xlsxto reference your newc_-functions.
Start the Container#
In Podman Desktop or via the CLI:
podman start data-curator
The container will read the updated configuration and write output (including your custom columns) into the host’s Output/ folder.
Next Steps#
Organize Multiple Custom Functions If you plan to maintain many custom calculations, group related helpers into separate Python modules under
Config/and import them fromcustom_calculations.py.Version Control Commit both
custom_calculations.pyandparameters_datacurator.xlsxinto your git repository to track changes to your custom logic.Automated Testing Write small unit tests for your custom functions (e.g., using
pytest) to ensure they behave as expected when inputs have gaps or extreme values.
See also#
Zero-Coder Workflow for end-user installation and usage.
Component Integrator Workflow for programmatic integration.
Developer/Tester Workflow for contributing code and running tests.