Updating readme

main
casey 2026-06-13 14:47:08 +10:00
parent 16ea369727
commit d7e5a36a40
1 changed files with 107 additions and 2 deletions

109
README.md
View File

@ -1,6 +1,6 @@
# Alteryx Runner # Pyteryx — Alteryx Runner
A Python-native runner for Alteryx `.yxmd` workflow files — no Alteryx installation required. A Python-native runner for Alteryx `.yxmd` workflow files — no Alteryx installation required. Run workflows directly **or** convert them to standalone Python scripts.
## Prerequisites ## Prerequisites
@ -14,6 +14,8 @@ A Python-native runner for Alteryx `.yxmd` workflow files — no Alteryx install
uv sync uv sync
``` ```
---
## Running a Workflow ## Running a Workflow
```bash ```bash
@ -66,6 +68,109 @@ See all registered Alteryx tool plugins:
uv run python -m alteryx_runner list-tools uv run python -m alteryx_runner list-tools
``` ```
---
## Converting an Alteryx Workflow to a Python Script
Pyteryx can convert an `.yxmd` workflow into a standalone Python script that uses **Polars** and **DuckDB** — the same libraries the runner uses internally. The generated script reproduces every step of your workflow as readable, sequential Python code that you can run, edit, and commit without any Alteryx dependency.
### How It Works
1. **Parse** — The `.yxmd` XML file is parsed into a directed acyclic graph (DAG) of tool nodes and connections (`alteryx_runner/engine/parser.py`).
2. **Topological Sort** — Nodes are ordered so each tool runs only after its upstream inputs are ready (`alteryx_runner/engine/executor.py`).
3. **Transpile Expressions** — Alteryx formula expressions (e.g. `IF [Amount] > 100 THEN "High" ELSE "Low" ENDIF`) are transpiled to DuckDB SQL fragments (`alteryx_runner/expression/transpiler.py`).
4. **Emit Python** — Each tool node is converted to its Polars/DuckDB equivalent. The final output is a self-contained `.py` file.
### Quick Start
```bash
# Convert an Alteryx workflow to a Python script
uv run python -m alteryx_runner convert <path/to/workflow.yxmd> -o <output_script.py>
```
#### Examples
```bash
# Convert the Join testing workflow
uv run python -m alteryx_runner convert ./Alteryx_TestWorkflows/JoinTesting/JoinTesting.yxmd -o join_pipeline.py
# Convert and immediately run the generated script
uv run python -m alteryx_runner convert ./workflow.yxmd -o pipeline.py
uv run python pipeline.py
```
### What Gets Generated
The output script follows a predictable structure:
```python
#!/usr/bin/env python3
"""Auto-generated by Pyteryx from MyWorkflow.yxmd"""
import polars as pl
import duckdb
# ── Tool 1: Input Data ────────────────────────────
df_1 = pl.read_csv("data/sales.csv")
# ── Tool 3: Filter ────────────────────────────────
df_3 = df_1.filter(pl.sql_expr("\"Region\" = 'West'"))
# ── Tool 5: Formula ───────────────────────────────
con = duckdb.connect(":memory:")
con.register("t", df_3.to_arrow())
df_5 = con.execute("SELECT *, (\"Price\" * \"Qty\") AS \"Total\" FROM t").pl()
# ── Tool 7: Output Data ──────────────────────────
df_5.write_csv("output/results.csv")
```
### Conversion Reference
The table below shows how each supported Alteryx tool maps to its Python equivalent:
| Alteryx Tool | Python / Polars Equivalent |
|---|---|
| **Input Data** | `pl.read_csv()` / `pl.read_excel()` / `pl.read_parquet()` |
| **Output Data** | `df.write_csv()` / `df.write_parquet()` |
| **Text Input** | `pl.DataFrame({...})` (inline literal data) |
| **Browse** | `print(df)` or `df.write_csv()` |
| **Filter** | `df.filter(pl.sql_expr(...))` |
| **Formula** | DuckDB `SELECT *, <expr> AS <field>` |
| **Multi-Row Formula** | DuckDB window functions (`LAG` / `LEAD`) |
| **Multi-Field Formula** | Loop over matching columns with DuckDB |
| **Select** | `df.select()` / `df.rename()` / `df.cast()` |
| **Sort** | `df.sort(...)` |
| **Sample** | `df.head()` / `df.sample()` |
| **Unique** | `df.unique()` |
| **Record ID** | `df.with_row_index(...)` |
| **Auto Field** | Automatic dtype optimization |
| **Generate Rows** | Loop / `pl.DataFrame` construction |
| **Join** | `df.join(other, ...)` |
| **Join Multiple** | Chained `df.join(...)` calls |
| **Union** | `pl.concat([...])` |
| **Append Fields** | `df.join(other, how="cross")` |
| **Find Replace** | DuckDB `REPLACE()` / Polars `.str.replace()` |
| **DateTime** | DuckDB `STRFTIME()` / `STRPTIME()` |
| **RegEx** | `df.with_columns(pl.col(...).str.extract(...))` |
| **Text To Columns** | `df.with_columns(pl.col(...).str.split(...))` |
| **Summarize** | `df.group_by(...).agg(...)` |
| **Cross Tab** | `df.pivot(...)` |
| **Transpose** | `df.transpose()` |
### Expression Transpilation
Alteryx formula expressions are automatically transpiled to DuckDB SQL. Common patterns:
| Alteryx Expression | Generated SQL |
|---|---|
| `[ColumnName]` | `"ColumnName"` |
| `IF [X] > 0 THEN "Y" ENDIF` | `CASE WHEN "X" > 0 THEN 'Y' END` |
| `IIF([A]=1, "yes", "no")` | `CASE WHEN "A"=1 THEN 'yes' ELSE 'no' END` |
| `IsNull([F])` | `"F" IS NULL` |
| `Left([Name], 3)` | `LEFT("Name", 3)` |
| `DateTimeAdd([Date], 7, "days")` | `"Date" + INTERVAL (7) DAY` |
| `[Row-1:Total]` | `LAG("Total", 1) OVER ()` |
## Supported Tool Categories ## Supported Tool Categories
| Category | Tools | | Category | Tools |