Updating readme
parent
16ea369727
commit
d7e5a36a40
109
README.md
109
README.md
|
|
@ -1,6 +1,6 @@
|
|||
# Alteryx Runner
|
||||
# Pyteryx — Alteryx Runner
|
||||
|
||||
A Python-native runner for Alteryx `.yxmd` workflow files — no Alteryx installation required.
|
||||
A Python-native runner for Alteryx `.yxmd` workflow files — no Alteryx installation required. Run workflows directly **or** convert them to standalone Python scripts.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
|
|
@ -14,6 +14,8 @@ A Python-native runner for Alteryx `.yxmd` workflow files — no Alteryx install
|
|||
uv sync
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Running a Workflow
|
||||
|
||||
```bash
|
||||
|
|
@ -66,6 +68,109 @@ See all registered Alteryx tool plugins:
|
|||
uv run python -m alteryx_runner list-tools
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Converting an Alteryx Workflow to a Python Script
|
||||
|
||||
Pyteryx can convert an `.yxmd` workflow into a standalone Python script that uses **Polars** and **DuckDB** — the same libraries the runner uses internally. The generated script reproduces every step of your workflow as readable, sequential Python code that you can run, edit, and commit without any Alteryx dependency.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Parse** — The `.yxmd` XML file is parsed into a directed acyclic graph (DAG) of tool nodes and connections (`alteryx_runner/engine/parser.py`).
|
||||
2. **Topological Sort** — Nodes are ordered so each tool runs only after its upstream inputs are ready (`alteryx_runner/engine/executor.py`).
|
||||
3. **Transpile Expressions** — Alteryx formula expressions (e.g. `IF [Amount] > 100 THEN "High" ELSE "Low" ENDIF`) are transpiled to DuckDB SQL fragments (`alteryx_runner/expression/transpiler.py`).
|
||||
4. **Emit Python** — Each tool node is converted to its Polars/DuckDB equivalent. The final output is a self-contained `.py` file.
|
||||
|
||||
### Quick Start
|
||||
|
||||
```bash
|
||||
# Convert an Alteryx workflow to a Python script
|
||||
uv run python -m alteryx_runner convert <path/to/workflow.yxmd> -o <output_script.py>
|
||||
```
|
||||
|
||||
#### Examples
|
||||
|
||||
```bash
|
||||
# Convert the Join testing workflow
|
||||
uv run python -m alteryx_runner convert ./Alteryx_TestWorkflows/JoinTesting/JoinTesting.yxmd -o join_pipeline.py
|
||||
|
||||
# Convert and immediately run the generated script
|
||||
uv run python -m alteryx_runner convert ./workflow.yxmd -o pipeline.py
|
||||
uv run python pipeline.py
|
||||
```
|
||||
|
||||
### What Gets Generated
|
||||
|
||||
The output script follows a predictable structure:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Auto-generated by Pyteryx from MyWorkflow.yxmd"""
|
||||
import polars as pl
|
||||
import duckdb
|
||||
|
||||
# ── Tool 1: Input Data ────────────────────────────
|
||||
df_1 = pl.read_csv("data/sales.csv")
|
||||
|
||||
# ── Tool 3: Filter ────────────────────────────────
|
||||
df_3 = df_1.filter(pl.sql_expr("\"Region\" = 'West'"))
|
||||
|
||||
# ── Tool 5: Formula ───────────────────────────────
|
||||
con = duckdb.connect(":memory:")
|
||||
con.register("t", df_3.to_arrow())
|
||||
df_5 = con.execute("SELECT *, (\"Price\" * \"Qty\") AS \"Total\" FROM t").pl()
|
||||
|
||||
# ── Tool 7: Output Data ──────────────────────────
|
||||
df_5.write_csv("output/results.csv")
|
||||
```
|
||||
|
||||
### Conversion Reference
|
||||
|
||||
The table below shows how each supported Alteryx tool maps to its Python equivalent:
|
||||
|
||||
| Alteryx Tool | Python / Polars Equivalent |
|
||||
|---|---|
|
||||
| **Input Data** | `pl.read_csv()` / `pl.read_excel()` / `pl.read_parquet()` |
|
||||
| **Output Data** | `df.write_csv()` / `df.write_parquet()` |
|
||||
| **Text Input** | `pl.DataFrame({...})` (inline literal data) |
|
||||
| **Browse** | `print(df)` or `df.write_csv()` |
|
||||
| **Filter** | `df.filter(pl.sql_expr(...))` |
|
||||
| **Formula** | DuckDB `SELECT *, <expr> AS <field>` |
|
||||
| **Multi-Row Formula** | DuckDB window functions (`LAG` / `LEAD`) |
|
||||
| **Multi-Field Formula** | Loop over matching columns with DuckDB |
|
||||
| **Select** | `df.select()` / `df.rename()` / `df.cast()` |
|
||||
| **Sort** | `df.sort(...)` |
|
||||
| **Sample** | `df.head()` / `df.sample()` |
|
||||
| **Unique** | `df.unique()` |
|
||||
| **Record ID** | `df.with_row_index(...)` |
|
||||
| **Auto Field** | Automatic dtype optimization |
|
||||
| **Generate Rows** | Loop / `pl.DataFrame` construction |
|
||||
| **Join** | `df.join(other, ...)` |
|
||||
| **Join Multiple** | Chained `df.join(...)` calls |
|
||||
| **Union** | `pl.concat([...])` |
|
||||
| **Append Fields** | `df.join(other, how="cross")` |
|
||||
| **Find Replace** | DuckDB `REPLACE()` / Polars `.str.replace()` |
|
||||
| **DateTime** | DuckDB `STRFTIME()` / `STRPTIME()` |
|
||||
| **RegEx** | `df.with_columns(pl.col(...).str.extract(...))` |
|
||||
| **Text To Columns** | `df.with_columns(pl.col(...).str.split(...))` |
|
||||
| **Summarize** | `df.group_by(...).agg(...)` |
|
||||
| **Cross Tab** | `df.pivot(...)` |
|
||||
| **Transpose** | `df.transpose()` |
|
||||
|
||||
### Expression Transpilation
|
||||
|
||||
Alteryx formula expressions are automatically transpiled to DuckDB SQL. Common patterns:
|
||||
|
||||
| Alteryx Expression | Generated SQL |
|
||||
|---|---|
|
||||
| `[ColumnName]` | `"ColumnName"` |
|
||||
| `IF [X] > 0 THEN "Y" ENDIF` | `CASE WHEN "X" > 0 THEN 'Y' END` |
|
||||
| `IIF([A]=1, "yes", "no")` | `CASE WHEN "A"=1 THEN 'yes' ELSE 'no' END` |
|
||||
| `IsNull([F])` | `"F" IS NULL` |
|
||||
| `Left([Name], 3)` | `LEFT("Name", 3)` |
|
||||
| `DateTimeAdd([Date], 7, "days")` | `"Date" + INTERVAL (7) DAY` |
|
||||
| `[Row-1:Total]` | `LAG("Total", 1) OVER ()` |
|
||||
|
||||
## Supported Tool Categories
|
||||
|
||||
| Category | Tools |
|
||||
|
|
|
|||
Loading…
Reference in New Issue