Normalize Vendor CSV Imports: Reshape, Filter, Enrich, Group
Updated April 2026.
Vendors send CSVs that are almost what you need. Junk columns, currency strings, EU date formats, cryptic carrier codes, one row per line item when you want grouped, mixed statuses you want to split. This page is the working reference for using /v1/map as a per-vendor normalizer: five worked examples (drop columns, parse mixed types, group by order, split by status, decode codes), plus an honest comparison with pandas, OpenRefine, and a candid note on when you should write the script yourself.
New here? The full /v1/map endpoint reference (contract, sandbox scope, errors, limits, all use cases) lives at Transformation API overview →. This page is the vendor-CSV deep dive.
1. 30-second quickstart
import httpx # Read the CSV the vendor uploaded. csv_text = open("/path/to/vendor_export.csv").read() # Describe what your canonical schema is. TARGET = ( "Parse vendor CSV. Drop internal_id, _legacy_field, _qa_check columns. " "Return array of objects with sku (from vendor_sku), name, price (float " "from price_usd), stock (int)." ) r = httpx.post( "https://streamfix.dev/v1/map", headers={"Authorization": "Bearer sk_..."}, json={"payload": csv_text, "target": TARGET}, timeout=30, ) records = r.json()["output"] for rec in records: db.products.upsert(rec, on_conflict="sku")
First call with a given vendor's CSV shape compiles a normalizer; subsequent files of the same shape hit the cache. Each vendor gets its own (shape, target) cache entry. See section 4 for the per-vendor pattern.
2. Endpoint reference
POST /v1/map. Auth: Authorization: Bearer sk_YOUR_KEY.
payload- the CSV as a string. RFC 4180 quoted-field rules apply (quoted fields can contain commas, escaped quotes are doubled).target- description of the output: which columns to keep, type coercions, renames, grouping, filtering.
3. Worked examples
Five common vendor-CSV pains. All outputs verified.
3.1 Drop junk columns and rename
Vendor exports include internal IDs, legacy fields, QA flags. You only want the product data, with their column names mapped to yours.
internal_id,vendor_sku,name,_legacy_field,price_usd,_qa_check,stock
abc-1,V-001,Widget,UNUSED,29.99,OK,42
abc-2,V-002,Gadget,UNUSED,49.50,OK,17"Parse vendor CSV. Drop internal_id, _legacy_field, _qa_check columns. Return array of objects with sku (from vendor_sku), name, price (float from price_usd), stock (int)."[
{"sku": "V-001", "name": "Widget", "price": 29.99, "stock": 42},
{"sku": "V-002", "name": "Gadget", "price": 49.5, "stock": 17}
]3.2 Currency strings, EU dates, mixed types, ugly headers
A real-world bookkeeping export: $1,234.56 currency strings, DD/MM/YYYY dates, capitalized headers with spaces, quoted fields containing commas.
Order Ref,Order Date,Customer,Subtotal,Tax,Total
ORD-001,15/04/2024,"Smith, John","$1,234.56","$98.76","$1,333.32"
ORD-002,16/04/2024,Jane Doe,"$50.00","$4.00","$54.00""Parse vendor CSV with quoted fields. Return array of objects with: order_ref, order_date (DD/MM/YYYY -> ISO YYYY-MM-DD), customer, subtotal (float, strip $ and commas), tax (float), total (float). Lowercase all keys, strip spaces."[
{"order_ref": "ORD-001", "order_date": "2024-04-15", "customer": "Smith, John",
"subtotal": 1234.56, "tax": 98.76, "total": 1333.32},
{"order_ref": "ORD-002", "order_date": "2024-04-16", "customer": "Jane Doe",
"subtotal": 50.0, "tax": 4.0, "total": 54.0}
]3.3 Group flat rows back into orders
Many vendors export "one row per line item" - the order number repeats. Your DB wants one record per order with line items as a nested array. Plus a computed total.
order_id,customer,sku,qty,price
ORD-100,Alice,A-1,2,10.00
ORD-100,Alice,A-2,1,5.00
ORD-101,Bob,B-1,3,7.50
ORD-100,Alice,A-3,1,12.00"Parse CSV. Group rows by order_id. Return array of {order_id, customer, line_items (array of {sku, qty: int, price: float}), total (float, sum of qty*price across line items)}."[
{"order_id": "ORD-100", "customer": "Alice",
"line_items": [
{"sku": "A-1", "qty": 2, "price": 10.0},
{"sku": "A-2", "qty": 1, "price": 5.0},
{"sku": "A-3", "qty": 1, "price": 12.0}
],
"total": 37.0},
{"order_id": "ORD-101", "customer": "Bob",
"line_items": [{"sku": "B-1", "qty": 3, "price": 7.5}],
"total": 22.5}
]3.4 Split by status with totals
A transaction CSV with mixed statuses. Split into separate buckets and compute totals per bucket - the kind of summary your operations team wants in Slack every morning.
"Parse CSV. Split rows by status. Return object with keys 'paid', 'pending', 'refunded', each containing array of {txn_id, amount: float}. Also include 'totals' object: {paid: float sum, pending: float sum, refunded: float sum, all: float sum of all amounts}."{
"paid": [
{"txn_id": "TX-1", "amount": 99.99},
{"txn_id": "TX-3", "amount": 150.0},
{"txn_id": "TX-6", "amount": 300.0}
],
"pending": [
{"txn_id": "TX-2", "amount": 45.5},
{"txn_id": "TX-5", "amount": 89.0}
],
"refunded": [{"txn_id": "TX-4", "amount": 22.0}],
"totals": {"paid": 549.99, "pending": 134.5, "refunded": 22.0, "all": 706.49}
}3.5 Decode cryptic carrier/service codes
Vendor sends shipping records with two-letter carrier codes and three-letter service codes. You want human-readable names plus a unit conversion (oz to lb).
ship_id,carrier_code,service_code,tracking,weight_oz
SH-1,UPS,GND,1Z123,32
SH-2,FDX,2DA,793844,18
SH-3,USPS,PMI,9400111,8"Parse CSV. Return array of objects: ship_id, carrier (UPS->'United Parcel Service', FDX->'FedEx', USPS->'United States Postal Service', else carrier_code), service ({UPS+GND:'Ground', FDX+2DA:'2-Day Air', USPS+PMI:'Priority Mail International', else service_code}), tracking, weight_lb (float, weight_oz/16, round 2dp)."[
{"ship_id": "SH-1", "carrier": "United Parcel Service", "service": "Ground", "tracking": "1Z123", "weight_lb": 2.0},
{"ship_id": "SH-2", "carrier": "FedEx", "service": "2-Day Air", "tracking": "793844", "weight_lb": 1.12},
{"ship_id": "SH-3", "carrier": "United States Postal Service", "service": "Priority Mail International", "tracking": "9400111", "weight_lb": 0.5}
]4. The per-vendor pattern
If you ingest from N vendors, write N target strings (one per vendor's CSV shape) and store them somewhere - a database table, a config file, a YAML doc next to your import job. Each vendor gets its own cached normalizer; you don't maintain N parsing scripts.
VENDOR_TARGETS = { "acme": "Parse vendor CSV. Drop internal_id, _legacy_field. Return ...", "globex": "Parse CSV. Currency strings start with EUR; convert ...", "initech": "Parse CSV. Group by order_id. Compute total ...", } def normalize(vendor: str, csv_text: str) -> list: target = VENDOR_TARGETS[vendor] r = httpx.post( "https://streamfix.dev/v1/map", headers={"Authorization": "Bearer sk_..."}, json={"payload": csv_text, "target": target}, ) return r.json()["output"]
When a vendor changes their export format (renames a column, adds a new one), you change one string. The cache picks up the new shape automatically.
5. Errors and status codes
| Status | Meaning | What to do |
|---|---|---|
200 | Normalized. | - |
400 | Body missing fields. | Check request. |
401 | Auth. | Bearer token. |
402 | No credits. | Top up. |
422 | Normalizer raised. Common cause: a column the target named is missing in the actual CSV (vendor renamed it; vendor sent a different schema this time). | Update the target text. Or branch by detected header in your import job. |
502 | Internal generation failure. | Retry. |
6. Limits and behavior
- Compile-time vs run-time: when generating the normalizer, the API sees the first ~4,000 characters of the CSV. Once compiled, the cached normalizer runs on the full CSV regardless of size.
- Sandbox:
csv,decimal,datetime,reare available. No filesystem, no network. - Determinism: same input + same target = same output every time.
- RFC 4180 quoting: handled correctly via Python's
csvmodule under the hood.
7. Alternatives and how this differs
CSV normalization is one of the most-served problems in software. Here's how /v1/map differs structurally.
| Tool | Shape | What it does | Best for |
|---|---|---|---|
| pandas read_csv + DataFrame ops | Library; data analysis stack | Read, filter, group, type-coerce in code | Engineers comfortable in pandas; pipelines where pandas is already in the stack. |
| OpenRefine | Open-source desktop tool | Interactive cleanup with an undo history; great for one-off projects | Analysts cleaning a single dataset interactively. |
| csvkit | CLI tool collection | SQL-like queries over CSVs from the shell (csvgrep, csvjoin, csvsql) | Quick command-line cleanup; ad-hoc shell pipelines. |
Python csv module + your own code | Stdlib; no external deps | You write the mapping logic line-by-line | Stable per-vendor parsers you're willing to maintain. |
StreamFix /v1/map | HTTP API; one POST per file | Plain-English target text describes the normalization | You ingest from many vendors with different shapes and don't want to write a new script per vendor. |
Structural difference: pandas / csvkit / OpenRefine all assume you write the transformation. /v1/map takes the description and writes the transformation function for you. The win is when you have many vendors (N target strings beats N hand-written scripts); the loss is the network round-trip for the first call per shape.
8. When NOT to use this
- You ingest from one vendor with one stable schema. Write the parser once with pandas or the
csvmodule. Free, no external dependency, easy to debug. - The CSV is straightforward and you trust the data. No nested groups, no weird formats - just
pd.read_csv()handles it. - You're doing interactive exploration. Use OpenRefine or a Jupyter notebook with pandas. The interactive feedback loop is faster than iterating on target strings.
- You need full control over the parser for edge cases. When a vendor sends genuinely-malformed CSV (mismatched quotes, headers in row 5, embedded line breaks), you'll spend longer wrestling with target text than just writing the parser.
- You need PII to never leave your network. The first call sends the CSV's first ~4,000 chars to OpenAI to compile the normalizer. Cached calls don't, but if first calls are unacceptable, this isn't the right fit.
9. Get an API key
Free trial credits on signup.
Sign up