Zero-Coder Workflow#
In “zero-coder” mode, everything is configured via an Excel workbook (Config/parameters_datacurator.xlsx) and an optional .env file—no Python coding is required. Follow these steps to install, configure, and run Data Curator.
Direct Setup (No Containers)#
Create a Python 3.12 Environment#
Use Conda or venv to isolate Data Curator’s dependencies.
Conda example (Windows/macOS/Linux):
conda create --name datacurator_env python=3.12
conda activate datacurator_env
venv example:
python3.12 -m venv datacurator_env
source datacurator_env/bin/activate # macOS/Linux
datacurator_env\Scripts\activate.bat # Windows
Install Data Curator via pip#
With the virtual environment active, run:
pip install --upgrade kaxanuk.data_curator
This installs Data Curator plus all dependencies specified in pyproject.toml (e.g., openpyxl, pandas, pyarrow, pandas_ta, etc.).
Initialize the Configuration#
Choose or create a project directory and move into it:
mkdir ~/data_curator_project
cd ~/data_curator_project
Run the initializer:
kaxanuk.data_curator init excel
After this command, your directory will contain:
__main__.pyConfig/(empty configuration folder)Output/(empty output folder)
Configure Data Curator#
Edit the Excel workbook and, if needed, the .env file as follows.
Edit Config/parameters_datacurator.xlsx#
Open the Excel workbook and fill in these worksheets:
Providers -
market_data_provider: pick a market-data vendor (e.g., “Financial Modeling Prep”, “AlphaVantage”). -market_data_api_key: enter its API key here, or leave blank to use .env. -fundamental_data_provider: pick a fundamental-data vendor. -fundamental_data_api_key: enter its API key here, or leave blank to use .env.Date Range -
start_date(YYYY-MM-DD): first date of data fetch. -end_date(YYYY-MM-DD): last date of data fetch. -period: frequency (e.g.,1dfor daily,1wfor weekly,1mfor monthly).Instruments - List ticker symbols (one per row), e.g.,
AAPL,MSFT.Output Settings -
output_format: choose betweencsvorparquet. -logger_level: log verbosity (e.g.,INFO,DEBUG).Columns/Calculations - Tick raw data columns you want (e.g.,
open,close,volume). - Under Predefined Calculations, tick any built-in features (e.g., “Simple Moving Average 5d”, “Annualized Volatility 21d”). - Under Custom Calculations, list any function names defined inConfig/custom_calculations.py(prefix withc_).
Edit Config/.env#
If you left API keys blank in Excel, create or open Config/.env and add:
KNDC_API_KEY_MARKET_DATA=<your_market_data_api_key>
KNDC_API_KEY_FUNDAMENTAL_DATA=<your_fundamental_data_api_key>
On macOS, press Command+Shift+Period in Finder dialogs to reveal hidden files when editing.
Run Data Curator#
After saving parameters_datacurator.xlsx and (if needed) .env, run:
python /path/to/data_curator_project
Replace /path/to/data_curator_project with the directory containing __main__.py. Data Curator will:
Load
parameters_datacurator.xlsxand read your settings.Read any API keys from .env.
Fetch market, fundamental, and optional alternative data (e.g., transcripts, news, economic).
Apply predefined and custom calculations.
Write one file per ticker into
Output/, named:<TICKER>_Market_and_Fundamental_Data.<csv|parquet>
Each output file contains separate sheets/tables for: - Market data (date, open, high, low, close, volume, etc.) - Fundamental data (income statement and balance-sheet fields) - Dividends (if requested) - Splits (if requested) - Calculations (columns prefixed with
c_)
Container Setup (Recommended)#
Using a container runtime eliminates local dependency issues and guarantees a reproducible environment. This guide uses Podman Desktop (open-source and free).
Install Podman Desktop#
Download the installer for your OS: https://podman-desktop.io/downloads
Run the installer: - On Windows, ensure “Install WSL if not present” is checked. - On macOS/Linux, follow on-screen instructions.
Launch Podman Desktop; accept defaults if prompted to create a Podman Machine.
If Windows complains about WSL version < 1.2.5, open an elevated command prompt and run:
wsl --update
Then re-open Podman Desktop.
Pull the Data Curator Image#
In Podman Desktop’s left menu, click Images.
Click Pull (top-right).
Enter the image URI:
ghcr.io/kaxanuk-community/data-curator:dev
Click Pull Image.
Run the Container for the First Time#
In Images, locate
data-curator:devand click the triangular Run icon.In the Basic tab: - Container name:
data-curatorVolumes:
Host path: select or create the directory where you want
Config/andOutput/to reside (e.g.,~/data_curator_project).Container path:
/app
Environment variables (only if you did not set API keys in Excel): -
KNDC_API_KEY_MARKET_DATA=<your_market_data_api_key>-KNDC_API_KEY_FUNDAMENTAL_DATA=<your_fundamental_data_api_key>
Leave other fields at defaults, then click Start container. Podman will: - Create a container named
data-curator. - Mount your chosen host directory to/appinside the container. - Run__main__.pyonce, which initializes Data Curator in Excel mode (equivalent tokaxanuk.data_curator init excel).
Configure Inside the Container#
After the first run, your host directory contains:
Config/── containsparameters_datacurator.xlsx,custom_calculations.py, and (if present).env.Output/── initially empty; output will be written here on subsequent runs.
Edit these files exactly as in Configure Data Curator:
Config/parameters_datacurator.xlsx: set providers, API keys, date range, tickers, output format, and calculations.
Config/.env: add API keys if not set in Excel.
Run the Fully Configured Container#
In Podman Desktop, click Containers.
Locate
data-curatorand click Start.The container reads updated configuration and writes output to
Output/.
Iterate#
Modify
Config/parameters_datacurator.xlsxorConfig/custom_calculations.py.In Podman Desktop’s Containers view, click Stop (if running) and then Start again.
The container reruns Data Curator with the new settings, overwriting previous outputs.
Output Structure#
After running (Direct or Container), inspect Output/:
``<TICKER>_Market_and_Fundamental_Data.<csv|parquet>`` - Market data: one row per date with columns
date,open,high,low,close,volume,adjusted_close, etc. - Fundamental data: income statement and balance-sheet fields (e.g.,total_revenue,net_income,total_assets). - Dividends:date,dividend_amount(if enabled). - Splits:date,split_ratio(if enabled). - Calculations: columns prefixed withc_(e.g.,c_simple_moving_average_5d,c_log_returns_adjusted_close).``Earnings_Transcripts/`` (optional) If earnings transcripts were enabled, JSON/text files appear here.
``News/`` (optional) If news data was enabled, files per ticker or aggregate news feed appear here.
``Economic_Data/`` (optional) Contains macroeconomic series (e.g., GDP, CPI) if enabled.
See also#
Custom Calculator Workflow for adding Python-based features.
Component Integrator Workflow for programmatic integration.
Developer/Tester Workflow for contributing code and running tests.