Custom Calculations#

You can easily define your own custom feature functions to extend the capabilities of Data Curator. These are written in pure Python and live inside the file:

Config/custom_calculations.py

This file must exist under your project’s root `Config/` directory, not in the templates.

How It Works#

Custom calculations allow you to define reusable features or metrics that are computed using one or more input columns.

Each custom function:

  • Must start with the prefix c_.

  • Can have any valid Python name after the prefix.

  • Receives `DataColumn` objects as arguments.

  • Must return an iterable (preferably a DataColumn, pyarrow.Array) of the same length.

  • Functions can be composed, they may be used directly or as intermediate steps within other calculations.

Example#

Here’s a simple example of a custom function that calculates net margin:

def c_net_margin(net_income, revenue):
    return net_income / revenue

When this function is called, net_income and revenue are passed as DataColumn objects, so this function supports operations like +, -, /, comparisons, and null-safe logic automatically.

Why DataColumn?#

All arguments to your functions are instances of DataColumn. This provides:

  • Arithmetic like col1 + col2

  • Logical operators like col1 > 5

  • Null-safe operations (any operation involving a null yields null)

  • Seamless conversion to pandas, pyarrow, or numpy if needed

Usage in Excel#

Once you’ve written your function in custom_calculations.py, reference it by name directly in the columns list of your Excel configuration file.

For example, to use a function called c_net_margin, simply add it like this:

c_net_margin

Each entry should match exactly the name of the function defined in your code, including the c_ prefix.

Best Practices#

  • Chain your functions to make them modular and testable.

  • Prefer returning a DataColumn directly for consistency.

More Examples#

To explore how advanced functions are built using DataColumn, see:

src/kaxanuk/data_curator/features/calculations.py