Custom Calculator Workflow#

In “custom calculator” mode, you start from a Zero-Coder installation of Data Curator (i.e., you have already installed Data Curator, run kaxanuk.data_curator init excel, and populated Config/parameters_datacurator.xlsx as described in the Zero-Coder guide). Then, in addition to configuring providers, dates, tickers, and default output columns via Excel, you add one or more Python functions that generate extra columns on a per-row basis. Follow these steps to install (if you haven’t already), configure, and run Data Curator with your own calculations.

Prerequisites (Zero-Coder Setup)#

Before adding custom calculations, ensure you have completed the Zero-Coder steps.

Create a Python 3.12 Environment#

Use Conda or venv to isolate Data Curator’s dependencies.

Conda example (Windows/macOS/Linux):

conda create --name datacurator_env python=3.12
conda activate datacurator_env

venv example:

python3.12 -m venv datacurator_env
source datacurator_env/bin/activate    # macOS/Linux
datacurator_env\Scripts\activate.bat   # Windows

Install Data Curator via pip#

With the virtual environment active, run:

pip install --upgrade kaxanuk.data_curator

This installs Data Curator along with its dependencies (e.g., openpyxl, pandas, pyarrow, pandas_ta, etc.).

Initialize the Configuration#

Choose or create a project directory and move into it:

mkdir ~/data_curator_project
cd ~/data_curator_project

Run the initializer:

kaxanuk.data_curator init excel

After this command, your directory will contain:

__main__.py
Config/ (empty configuration folder)
Output/ (empty output folder)

Configure Data Curator (Zero-Coder Settings)#

Open Config/parameters_datacurator.xlsx and fill in the worksheets as follows:

Providers - market_data_provider: select a market-data vendor. - market_data_api_key: enter its API key here (or leave blank to use .env). - fundamental_data_provider: select a fundamental-data vendor. - fundamental_data_api_key: enter its API key here (or leave blank to use .env).
Date Range - start_date (YYYY-MM-DD): first date of data fetch. - end_date (YYYY-MM-DD): last date of data fetch. - period: frequency (e.g., 1d, 1w, 1m).
Instruments - List ticker symbols (one per row), e.g., AAPL, MSFT.
Output Settings - output_format: choose between csv or parquet. - logger_level: e.g., INFO, DEBUG.
Columns/Calculations - Tick the raw data columns you want (e.g., open, close, volume). - Under Predefined Calculations, tick any built-in features (e.g., “Simple Moving Average 5d”). - Under Custom Calculations, list any function names defined in Config/custom_calculations.py (each prefixed with c_).

If you left any API keys blank in Excel, create or edit Config/.env:

KNDC_API_KEY_MARKET_DATA=<your_market_data_api_key>
KNDC_API_KEY_FUNDAMENTAL_DATA=<your_fundamental_data_api_key>

After saving parameters_datacurator.xlsx and (if needed) .env, you can run:

python /path/to/data_curator_project

to verify that Data Curator fetches default data and writes output into Output/.

Create Your Custom Calculation Function#

Data Curator looks for any Python function in Config/custom_calculations.py whose name begins with c_. Each such function is applied row-wise over the assembled dataset once the raw market/fundamental data has been collected. A custom function should:

Be defined in Config/custom_calculations.py.
Take as positional arguments the column names (as Pandas Series) needed for the computation.
Return a Pandas Series of the same length, with None or NaN in rows where inputs are missing or the operation is undefined.

Locate the Custom Calculations File#

In your project directory, open:

Config/custom_calculations.py

This file already contains template functions and import statements. At the top you’ll see helper imports such as:

import pandas as pd
from datetime import datetime
from kaxanuk.data_curator.features.helpers import (
    cumulative_return,
    log_return,
    ...
)

Define a New Function#

Choose a clear, snake_case name prefixed with c_. For example, to compute a 10-day price difference, you might write:

def c_price_difference_10d(m_close: pd.Series) -> pd.Series:
    """
    Returns the difference between the close price and its value 10 trading days ago.
    Leaves first 10 rows as NaN.
    """
    # Use Pandas to shift by 10 rows
    return m_close - m_close.shift(10)

If you need multiple input columns, add them as separate parameters. For example:

def c_return_over_volume(m_close: pd.Series, m_volume: pd.Series) -> pd.Series:
    """
    Returns the ratio of daily log returns to volume.
    Rows with zero or missing volume will be NaN.
    """
    # Compute the log return using a helper
    log_ret = log_return(m_close)
    # Avoid division by zero
    return log_ret.where(m_volume != 0, None) / m_volume

Save your changes. Any function name not prefixed with c_ will be ignored.

Best Practices for Custom Functions#

Use only Pandas operations or existing helper functions for performance and consistency.
Handle missing data explicitly (e.g., avoid dividing by zero; propagate NaN where appropriate).
Document your function with a short docstring explaining inputs, outputs, and any edge-case behavior.
If you import new libraries (e.g., numpy), ensure they are already installed in your environment.

Add Your Custom Calculation to the Excel File#

After defining one or more functions in Config/custom_calculations.py, you must tell Data Curator to include them in the output.

Open `Config/parameters_datacurator.xlsx`#

Switch to the Columns/Calculations worksheet.
Under the Custom Calculations section, add each function name (including the c_ prefix) on its own row. For example, if your function is:
```
def c_price_difference_10d(m_close: pd.Series) -> pd.Series: …
```
then enter:
```
c_price_difference_10d
```

Verify the Naming#

The Excel entry must exactly match the function name in custom_calculations.py.
Do not include parentheses or arguments—only the bare function name.

Save the Workbook#

Once you’ve added all desired custom-calculation names, save parameters_datacurator.xlsx. If you are editing on macOS and don’t see hidden files (e.g., .env), press Command+Shift+Period in Finder dialogs to reveal them.

Run Data Curator with Custom Calculations#

With both Config/custom_calculations.py and Config/parameters_datacurator.xlsx updated, run:

python /path/to/data_curator_project

What happens under the hood:

Data Curator loads all raw data providers and writes default columns into memory.
It then imports Config/custom_calculations.py and looks for any functions whose names start with c_.
For each such function, it calls the function with the specified input columns (as Pandas Series).
The returned Series is appended as a new column in the in-memory DataFrame.
Finally, Data Curator writes one output file per ticker under Output/, with separate sheets (or sections) for: - Market data - Fundamental data - Dividends (if enabled) - Splits (if enabled) - Calculations (including your custom columns prefixed c_)

Troubleshooting & Tips#

No output for your custom column? - Verify there are no syntax errors in custom_calculations.py. - Ensure the function name appears under Custom Calculations in parameters_datacurator.xlsx. - Check that the input column names you referenced (e.g., m_close, m_volume) match the raw-data columns exactly.

Getting many NaNs in your new column? - By design, custom calculations propagate NaN for rows where inputs are missing or invalid. - Review your logic to see if you need to “forward-fill” or otherwise handle gaps before applying the calculation.

Want to test a function interactively? 1. Open a Python REPL (or Jupyter Notebook) in the same virtual environment. 2. Run:

import pandas as pd
# Load a small sample of raw data to a DataFrame
df = pd.read_parquet("Output/AAPL_Market_and_Fundamental_Data.parquet", engine="pyarrow")
from Config.custom_calculations import c_price_difference_10d
# Apply it to the 'm_close' column
sample = c_price_difference_10d(df["m_close"])
print(sample.head())

Reordering or renaming columns If you need to change the column order or rename your custom columns, do so in the Output Settings section of the Excel file before rerunning.

(Optional) Containerized Workflow#

If you prefer using containers (Podman/Docker) instead of installing locally, follow these steps once you’ve added your custom functions.

Pull and Run the Data Curator Image#

See the Zero-Coder Container Setup under “Pull the Data Curator Image” and “Run the Container for the First Time.” Ensure your host directory (containing Config/ and Output/) is mounted at /app inside the container.

Edit Custom Calculations Inside the Container#

In Config/custom_calculations.py, create or modify your functions as described above.
Update Config/parameters_datacurator.xlsx to reference your new c_-functions.

Start the Container#

In Podman Desktop or via the CLI:

podman start data-curator

The container will read the updated configuration and write output (including your custom columns) into the host’s Output/ folder.

Next Steps#

Organize Multiple Custom Functions If you plan to maintain many custom calculations, group related helpers into separate Python modules under Config/ and import them from custom_calculations.py.
Version Control Commit both custom_calculations.py and parameters_datacurator.xlsx into your git repository to track changes to your custom logic.
Automated Testing Write small unit tests for your custom functions (e.g., using pytest) to ensure they behave as expected when inputs have gaps or extreme values.