Pyteryx/README.md

7.3 KiB

Pyteryx — Alteryx Runner

A Python-native runner for Alteryx .yxmd workflow files — no Alteryx installation required. Run workflows directly or convert them to standalone Python scripts.

Prerequisites

  • Python 3.11+
  • uv — fast Python package manager

Setup

# Install all dependencies
uv sync

Running a Workflow

uv run python -m alteryx_runner run <path/to/workflow.yxmd> [options]

Example

uv run python -m alteryx_runner run ./Alteryx_TestWorkflows/Unique\&Sample/Unique\&Sample.yxmd --verbose

Options

Flag Description
--verbose Print Browse results and detailed execution log
--dry-run Parse and validate only — do not execute
--output-dir PATH Write output files to a specific directory
--param KEY=VALUE Set a workflow constant (repeatable)
--format [csv|json|parquet] Default output format for Browse nodes (default: csv)

Dry-Run (Validate Only)

Check that a workflow parses correctly without executing it:

uv run python -m alteryx_runner run ./Alteryx_TestWorkflows/JoinTesting/JoinTesting.yxmd --dry-run

Custom Output Directory & Format

uv run python -m alteryx_runner run ./workflow.yxmd --output-dir ./results --format parquet

Workflow Parameters

Pass runtime constants with --param:

uv run python -m alteryx_runner run ./workflow.yxmd --param "StartDate=2024-01-01" --param "Region=West"

Listing Supported Tools

See all registered Alteryx tool plugins:

uv run python -m alteryx_runner list-tools

Converting an Alteryx Workflow to a Python Script

Pyteryx can convert an .yxmd workflow into a standalone Python script that uses Polars and DuckDB — the same libraries the runner uses internally. The generated script reproduces every step of your workflow as readable, sequential Python code that you can run, edit, and commit without any Alteryx dependency.

How It Works

  1. Parse — The .yxmd XML file is parsed into a directed acyclic graph (DAG) of tool nodes and connections (alteryx_runner/engine/parser.py).
  2. Topological Sort — Nodes are ordered so each tool runs only after its upstream inputs are ready (alteryx_runner/engine/executor.py).
  3. Transpile Expressions — Alteryx formula expressions (e.g. IF [Amount] > 100 THEN "High" ELSE "Low" ENDIF) are transpiled to DuckDB SQL fragments (alteryx_runner/expression/transpiler.py).
  4. Emit Python — Each tool node is converted to its Polars/DuckDB equivalent. The final output is a self-contained .py file.

Quick Start

# Convert an Alteryx workflow to a Python script
uv run python -m alteryx_runner convert <path/to/workflow.yxmd> -o <output_script.py>

Examples

# Convert the Join testing workflow
uv run python -m alteryx_runner convert ./Alteryx_TestWorkflows/JoinTesting/JoinTesting.yxmd -o join_pipeline.py

# Convert and immediately run the generated script
uv run python -m alteryx_runner convert ./workflow.yxmd -o pipeline.py
uv run python pipeline.py

What Gets Generated

The output script follows a predictable structure:

#!/usr/bin/env python3
"""Auto-generated by Pyteryx from MyWorkflow.yxmd"""
import polars as pl
import duckdb

# ── Tool 1: Input Data ────────────────────────────
df_1 = pl.read_csv("data/sales.csv")

# ── Tool 3: Filter ────────────────────────────────
df_3 = df_1.filter(pl.sql_expr("\"Region\" = 'West'"))

# ── Tool 5: Formula ───────────────────────────────
con = duckdb.connect(":memory:")
con.register("t", df_3.to_arrow())
df_5 = con.execute("SELECT *, (\"Price\" * \"Qty\") AS \"Total\" FROM t").pl()

# ── Tool 7: Output Data ──────────────────────────
df_5.write_csv("output/results.csv")

Conversion Reference

The table below shows how each supported Alteryx tool maps to its Python equivalent:

Alteryx Tool Python / Polars Equivalent
Input Data pl.read_csv() / pl.read_excel() / pl.read_parquet()
Output Data df.write_csv() / df.write_parquet()
Text Input pl.DataFrame({...}) (inline literal data)
Browse print(df) or df.write_csv()
Filter df.filter(pl.sql_expr(...))
Formula DuckDB SELECT *, <expr> AS <field>
Multi-Row Formula DuckDB window functions (LAG / LEAD)
Multi-Field Formula Loop over matching columns with DuckDB
Select df.select() / df.rename() / df.cast()
Sort df.sort(...)
Sample df.head() / df.sample()
Unique df.unique()
Record ID df.with_row_index(...)
Auto Field Automatic dtype optimization
Generate Rows Loop / pl.DataFrame construction
Join df.join(other, ...)
Join Multiple Chained df.join(...) calls
Union pl.concat([...])
Append Fields df.join(other, how="cross")
Find Replace DuckDB REPLACE() / Polars .str.replace()
DateTime DuckDB STRFTIME() / STRPTIME()
RegEx df.with_columns(pl.col(...).str.extract(...))
Text To Columns df.with_columns(pl.col(...).str.split(...))
Summarize df.group_by(...).agg(...)
Cross Tab df.pivot(...)
Transpose df.transpose()

Expression Transpilation

Alteryx formula expressions are automatically transpiled to DuckDB SQL. Common patterns:

Alteryx Expression Generated SQL
[ColumnName] "ColumnName"
IF [X] > 0 THEN "Y" ENDIF CASE WHEN "X" > 0 THEN 'Y' END
IIF([A]=1, "yes", "no") CASE WHEN "A"=1 THEN 'yes' ELSE 'no' END
IsNull([F]) "F" IS NULL
Left([Name], 3) LEFT("Name", 3)
DateTimeAdd([Date], 7, "days") "Date" + INTERVAL (7) DAY
[Row-1:Total] LAG("Total", 1) OVER ()

Supported Tool Categories

Category Tools
In/Out Input Data, Output Data, Browse, Text Input
Preparation Filter, Formula, Multi-Field Formula, Multi-Row Formula, Select, Sort, Sample, Unique, Record ID, Auto Field, Generate Rows
Join Join, Join Multiple, Union, Append Fields, Find Replace
Parse DateTime, RegEx, Text To Columns
Transform Summarize, Cross Tab, Transpose

Project Structure

alteryx_runner/
├── cli.py              # CLI entry point
├── engine/
│   ├── parser.py       # .yxmd XML → workflow graph
│   ├── executor.py     # Topological execution engine
│   ├── graph.py        # DAG data structures
│   └── context.py      # Runtime context & config
├── expression/         # Alteryx expression parser
└── tools/              # Tool implementations
    ├── inout/          # Input Data, Output Data, Browse, Text Input
    ├── join/           # Join, Union, Append Fields, Find Replace
    ├── parse/          # DateTime, RegEx, Text To Columns
    ├── preparation/    # Filter, Formula, Sort, Sample, Unique, …
    └── transform/      # Summarize, Cross Tab, Transpose