Zero-Coder Workflow#

In “zero-coder” mode, everything is configured via an Excel workbook (Config/parameters_datacurator.xlsx) and an optional .env file—no Python coding is required. Follow these steps to install, configure, and run Data Curator.

Direct Setup (No Containers)#

Create a Python 3.12 Environment#

Use Conda or venv to isolate Data Curator’s dependencies.

Conda example (Windows/macOS/Linux):

conda create --name datacurator_env python=3.12
conda activate datacurator_env

venv example:

python3.12 -m venv datacurator_env
source datacurator_env/bin/activate    # macOS/Linux
datacurator_env\Scripts\activate.bat   # Windows

Install Data Curator via pip#

With the virtual environment active, run:

pip install --upgrade kaxanuk.data_curator

This installs Data Curator plus all dependencies specified in pyproject.toml (e.g., openpyxl, pandas, pyarrow, pandas_ta, etc.).

Initialize the Configuration#

Choose or create a project directory and move into it:

mkdir ~/data_curator_project
cd ~/data_curator_project

Run the initializer:

kaxanuk.data_curator init excel

After this command, your directory will contain:

  • __main__.py

  • Config/ (empty configuration folder)

  • Output/ (empty output folder)

Configure Data Curator#

Edit the Excel workbook and, if needed, the .env file as follows.

Edit Config/parameters_datacurator.xlsx#

Open the Excel workbook and fill in these worksheets:

  • Providers - market_data_provider: pick a market-data vendor (e.g., “Financial Modeling Prep”, “AlphaVantage”). - market_data_api_key: enter its API key here, or leave blank to use .env. - fundamental_data_provider: pick a fundamental-data vendor. - fundamental_data_api_key: enter its API key here, or leave blank to use .env.

  • Date Range - start_date (YYYY-MM-DD): first date of data fetch. - end_date (YYYY-MM-DD): last date of data fetch. - period: frequency (e.g., 1d for daily, 1w for weekly, 1m for monthly).

  • Instruments - List ticker symbols (one per row), e.g., AAPL, MSFT.

  • Output Settings - output_format: choose between csv or parquet. - logger_level: log verbosity (e.g., INFO, DEBUG).

  • Columns/Calculations - Tick raw data columns you want (e.g., open, close, volume). - Under Predefined Calculations, tick any built-in features (e.g., “Simple Moving Average 5d”, “Annualized Volatility 21d”). - Under Custom Calculations, list any function names defined in Config/custom_calculations.py (prefix with c_).

Edit Config/.env#

If you left API keys blank in Excel, create or open Config/.env and add:

KNDC_API_KEY_MARKET_DATA=<your_market_data_api_key>
KNDC_API_KEY_FUNDAMENTAL_DATA=<your_fundamental_data_api_key>
  • On macOS, press Command+Shift+Period in Finder dialogs to reveal hidden files when editing.

Run Data Curator#

After saving parameters_datacurator.xlsx and (if needed) .env, run:

python /path/to/data_curator_project

Replace /path/to/data_curator_project with the directory containing __main__.py. Data Curator will:

  • Load parameters_datacurator.xlsx and read your settings.

  • Read any API keys from .env.

  • Fetch market, fundamental, and optional alternative data (e.g., transcripts, news, economic).

  • Apply predefined and custom calculations.

  • Write one file per ticker into Output/, named:

    <TICKER>_Market_and_Fundamental_Data.<csv|parquet>
    

    Each output file contains separate sheets/tables for: - Market data (date, open, high, low, close, volume, etc.) - Fundamental data (income statement and balance-sheet fields) - Dividends (if requested) - Splits (if requested) - Calculations (columns prefixed with c_)

Output Structure#

After running (Direct or Container), inspect Output/:

  • ``<TICKER>_Market_and_Fundamental_Data.<csv|parquet>`` - Market data: one row per date with columns date, open, high, low, close, volume, adjusted_close, etc. - Fundamental data: income statement and balance-sheet fields (e.g., total_revenue, net_income, total_assets). - Dividends: date, dividend_amount (if enabled). - Splits: date, split_ratio (if enabled). - Calculations: columns prefixed with c_ (e.g., c_simple_moving_average_5d, c_log_returns_adjusted_close).

  • ``Earnings_Transcripts/`` (optional) If earnings transcripts were enabled, JSON/text files appear here.

  • ``News/`` (optional) If news data was enabled, files per ticker or aggregate news feed appear here.

  • ``Economic_Data/`` (optional) Contains macroeconomic series (e.g., GDP, CPI) if enabled.

See also#