|
|
||
|---|---|---|
| Alteryx_TestWorkflows | ||
| alteryx_runner | ||
| .gitignore | ||
| AGENTS.md | ||
| README.md | ||
| alteryx_runner_spec.md | ||
| pyproject.toml | ||
README.md
Pyteryx — Alteryx Runner
A Python-native runner for Alteryx .yxmd workflow files — no Alteryx installation required. Run workflows directly or convert them to standalone Python scripts.
Prerequisites
- Python 3.11+
- uv — fast Python package manager
Setup
# Install all dependencies
uv sync
Running a Workflow
uv run python -m alteryx_runner run <path/to/workflow.yxmd> [options]
Example
uv run python -m alteryx_runner run ./Alteryx_TestWorkflows/Unique\&Sample/Unique\&Sample.yxmd --verbose
Options
| Flag | Description |
|---|---|
--verbose |
Print Browse results and detailed execution log |
--dry-run |
Parse and validate only — do not execute |
--output-dir PATH |
Write output files to a specific directory |
--param KEY=VALUE |
Set a workflow constant (repeatable) |
--format [csv|json|parquet] |
Default output format for Browse nodes (default: csv) |
Dry-Run (Validate Only)
Check that a workflow parses correctly without executing it:
uv run python -m alteryx_runner run ./Alteryx_TestWorkflows/JoinTesting/JoinTesting.yxmd --dry-run
Custom Output Directory & Format
uv run python -m alteryx_runner run ./workflow.yxmd --output-dir ./results --format parquet
Workflow Parameters
Pass runtime constants with --param:
uv run python -m alteryx_runner run ./workflow.yxmd --param "StartDate=2024-01-01" --param "Region=West"
Listing Supported Tools
See all registered Alteryx tool plugins:
uv run python -m alteryx_runner list-tools
Converting an Alteryx Workflow to a Python Script
Pyteryx can convert an .yxmd workflow into a standalone Python script that uses Polars and DuckDB — the same libraries the runner uses internally. The generated script reproduces every step of your workflow as readable, sequential Python code that you can run, edit, and commit without any Alteryx dependency.
How It Works
- Parse — The
.yxmdXML file is parsed into a directed acyclic graph (DAG) of tool nodes and connections (alteryx_runner/engine/parser.py). - Topological Sort — Nodes are ordered so each tool runs only after its upstream inputs are ready (
alteryx_runner/engine/executor.py). - Transpile Expressions — Alteryx formula expressions (e.g.
IF [Amount] > 100 THEN "High" ELSE "Low" ENDIF) are transpiled to DuckDB SQL fragments (alteryx_runner/expression/transpiler.py). - Emit Python — Each tool node is converted to its Polars/DuckDB equivalent. The final output is a self-contained
.pyfile.
Quick Start
# Convert an Alteryx workflow to a Python script
uv run python -m alteryx_runner convert <path/to/workflow.yxmd> -o <output_script.py>
Examples
# Convert the Join testing workflow
uv run python -m alteryx_runner convert ./Alteryx_TestWorkflows/JoinTesting/JoinTesting.yxmd -o join_pipeline.py
# Convert and immediately run the generated script
uv run python -m alteryx_runner convert ./workflow.yxmd -o pipeline.py
uv run python pipeline.py
What Gets Generated
The output script follows a predictable structure:
#!/usr/bin/env python3
"""Auto-generated by Pyteryx from MyWorkflow.yxmd"""
import polars as pl
import duckdb
# ── Tool 1: Input Data ────────────────────────────
df_1 = pl.read_csv("data/sales.csv")
# ── Tool 3: Filter ────────────────────────────────
df_3 = df_1.filter(pl.sql_expr("\"Region\" = 'West'"))
# ── Tool 5: Formula ───────────────────────────────
con = duckdb.connect(":memory:")
con.register("t", df_3.to_arrow())
df_5 = con.execute("SELECT *, (\"Price\" * \"Qty\") AS \"Total\" FROM t").pl()
# ── Tool 7: Output Data ──────────────────────────
df_5.write_csv("output/results.csv")
Conversion Reference
The table below shows how each supported Alteryx tool maps to its Python equivalent:
| Alteryx Tool | Python / Polars Equivalent |
|---|---|
| Input Data | pl.read_csv() / pl.read_excel() / pl.read_parquet() |
| Output Data | df.write_csv() / df.write_parquet() |
| Text Input | pl.DataFrame({...}) (inline literal data) |
| Browse | print(df) or df.write_csv() |
| Filter | df.filter(pl.sql_expr(...)) |
| Formula | DuckDB SELECT *, <expr> AS <field> |
| Multi-Row Formula | DuckDB window functions (LAG / LEAD) |
| Multi-Field Formula | Loop over matching columns with DuckDB |
| Select | df.select() / df.rename() / df.cast() |
| Sort | df.sort(...) |
| Sample | df.head() / df.sample() |
| Unique | df.unique() |
| Record ID | df.with_row_index(...) |
| Auto Field | Automatic dtype optimization |
| Generate Rows | Loop / pl.DataFrame construction |
| Join | df.join(other, ...) |
| Join Multiple | Chained df.join(...) calls |
| Union | pl.concat([...]) |
| Append Fields | df.join(other, how="cross") |
| Find Replace | DuckDB REPLACE() / Polars .str.replace() |
| DateTime | DuckDB STRFTIME() / STRPTIME() |
| RegEx | df.with_columns(pl.col(...).str.extract(...)) |
| Text To Columns | df.with_columns(pl.col(...).str.split(...)) |
| Summarize | df.group_by(...).agg(...) |
| Cross Tab | df.pivot(...) |
| Transpose | df.transpose() |
Expression Transpilation
Alteryx formula expressions are automatically transpiled to DuckDB SQL. Common patterns:
| Alteryx Expression | Generated SQL |
|---|---|
[ColumnName] |
"ColumnName" |
IF [X] > 0 THEN "Y" ENDIF |
CASE WHEN "X" > 0 THEN 'Y' END |
IIF([A]=1, "yes", "no") |
CASE WHEN "A"=1 THEN 'yes' ELSE 'no' END |
IsNull([F]) |
"F" IS NULL |
Left([Name], 3) |
LEFT("Name", 3) |
DateTimeAdd([Date], 7, "days") |
"Date" + INTERVAL (7) DAY |
[Row-1:Total] |
LAG("Total", 1) OVER () |
Supported Tool Categories
| Category | Tools |
|---|---|
| In/Out | Input Data, Output Data, Browse, Text Input |
| Preparation | Filter, Formula, Multi-Field Formula, Multi-Row Formula, Select, Sort, Sample, Unique, Record ID, Auto Field, Generate Rows |
| Join | Join, Join Multiple, Union, Append Fields, Find Replace |
| Parse | DateTime, RegEx, Text To Columns |
| Transform | Summarize, Cross Tab, Transpose |
Project Structure
alteryx_runner/
├── cli.py # CLI entry point
├── engine/
│ ├── parser.py # .yxmd XML → workflow graph
│ ├── executor.py # Topological execution engine
│ ├── graph.py # DAG data structures
│ └── context.py # Runtime context & config
├── expression/ # Alteryx expression parser
└── tools/ # Tool implementations
├── inout/ # Input Data, Output Data, Browse, Text Input
├── join/ # Join, Union, Append Fields, Find Replace
├── parse/ # DateTime, RegEx, Text To Columns
├── preparation/ # Filter, Formula, Sort, Sample, Unique, …
└── transform/ # Summarize, Cross Tab, Transpose