Home/Normalize Vendor CSV Imports

Normalize Vendor CSV Imports: Reshape, Filter, Enrich, Group

Updated April 2026.

Vendors send CSVs that are almost what you need. Junk columns, currency strings, EU date formats, cryptic carrier codes, one row per line item when you want grouped, mixed statuses you want to split. This page is the working reference for using /v1/map as a per-vendor normalizer: five worked examples (drop columns, parse mixed types, group by order, split by status, decode codes), plus an honest comparison with pandas, OpenRefine, and a candid note on when you should write the script yourself.

New here? The full /v1/map endpoint reference (contract, sandbox scope, errors, limits, all use cases) lives at Transformation API overview →. This page is the vendor-CSV deep dive.

1. 30-second quickstart

Python (vendor import job)

import httpx

# Read the CSV the vendor uploaded.
csv_text = open("/path/to/vendor_export.csv").read()

# Describe what your canonical schema is.
TARGET = (
    "Parse vendor CSV. Drop internal_id, _legacy_field, _qa_check columns. "
    "Return array of objects with sku (from vendor_sku), name, price (float "
    "from price_usd), stock (int)."
)

r = httpx.post(
    "https://streamfix.dev/v1/map",
    headers={"Authorization": "Bearer sk_..."},
    json={"payload": csv_text, "target": TARGET},
    timeout=30,
)
records = r.json()["output"]
for rec in records:
    db.products.upsert(rec, on_conflict="sku")

First call with a given vendor's CSV shape compiles a normalizer; subsequent files of the same shape hit the cache. Each vendor gets its own (shape, target) cache entry. See section 4 for the per-vendor pattern.

2. Endpoint reference

POST /v1/map. Auth: Authorization: Bearer sk_YOUR_KEY.

Request body

payload - the CSV as a string. RFC 4180 quoted-field rules apply (quoted fields can contain commas, escaped quotes are doubled).
target - description of the output: which columns to keep, type coercions, renames, grouping, filtering.

3. Worked examples

Five common vendor-CSV pains. All outputs verified.

3.1 Drop junk columns and rename

Vendor exports include internal IDs, legacy fields, QA flags. You only want the product data, with their column names mapped to yours.

payload

internal_id,vendor_sku,name,_legacy_field,price_usd,_qa_check,stock
abc-1,V-001,Widget,UNUSED,29.99,OK,42
abc-2,V-002,Gadget,UNUSED,49.50,OK,17

target

"Parse vendor CSV. Drop internal_id, _legacy_field, _qa_check columns. Return array of objects with sku (from vendor_sku), name, price (float from price_usd), stock (int)."

output

[
  {"sku": "V-001", "name": "Widget", "price": 29.99, "stock": 42},
  {"sku": "V-002", "name": "Gadget", "price": 49.5, "stock": 17}
]

3.2 Currency strings, EU dates, mixed types, ugly headers

A real-world bookkeeping export: $1,234.56 currency strings, DD/MM/YYYY dates, capitalized headers with spaces, quoted fields containing commas.

payload

Order Ref,Order Date,Customer,Subtotal,Tax,Total
ORD-001,15/04/2024,"Smith, John","$1,234.56","$98.76","$1,333.32"
ORD-002,16/04/2024,Jane Doe,"$50.00","$4.00","$54.00"

target

"Parse vendor CSV with quoted fields. Return array of objects with: order_ref, order_date (DD/MM/YYYY -> ISO YYYY-MM-DD), customer, subtotal (float, strip $ and commas), tax (float), total (float). Lowercase all keys, strip spaces."

output

[
  {"order_ref": "ORD-001", "order_date": "2024-04-15", "customer": "Smith, John",
   "subtotal": 1234.56, "tax": 98.76, "total": 1333.32},
  {"order_ref": "ORD-002", "order_date": "2024-04-16", "customer": "Jane Doe",
   "subtotal": 50.0, "tax": 4.0, "total": 54.0}
]

3.3 Group flat rows back into orders

Many vendors export "one row per line item" - the order number repeats. Your DB wants one record per order with line items as a nested array. Plus a computed total.

payload

order_id,customer,sku,qty,price
ORD-100,Alice,A-1,2,10.00
ORD-100,Alice,A-2,1,5.00
ORD-101,Bob,B-1,3,7.50
ORD-100,Alice,A-3,1,12.00

target

"Parse CSV. Group rows by order_id. Return array of {order_id, customer, line_items (array of {sku, qty: int, price: float}), total (float, sum of qty*price across line items)}."

output

[
  {"order_id": "ORD-100", "customer": "Alice",
   "line_items": [
     {"sku": "A-1", "qty": 2, "price": 10.0},
     {"sku": "A-2", "qty": 1, "price": 5.0},
     {"sku": "A-3", "qty": 1, "price": 12.0}
   ],
   "total": 37.0},
  {"order_id": "ORD-101", "customer": "Bob",
   "line_items": [{"sku": "B-1", "qty": 3, "price": 7.5}],
   "total": 22.5}
]

3.4 Split by status with totals

A transaction CSV with mixed statuses. Split into separate buckets and compute totals per bucket - the kind of summary your operations team wants in Slack every morning.

target

"Parse CSV. Split rows by status. Return object with keys 'paid', 'pending', 'refunded', each containing array of {txn_id, amount: float}. Also include 'totals' object: {paid: float sum, pending: float sum, refunded: float sum, all: float sum of all amounts}."

output

{
  "paid": [
    {"txn_id": "TX-1", "amount": 99.99},
    {"txn_id": "TX-3", "amount": 150.0},
    {"txn_id": "TX-6", "amount": 300.0}
  ],
  "pending": [
    {"txn_id": "TX-2", "amount": 45.5},
    {"txn_id": "TX-5", "amount": 89.0}
  ],
  "refunded": [{"txn_id": "TX-4", "amount": 22.0}],
  "totals": {"paid": 549.99, "pending": 134.5, "refunded": 22.0, "all": 706.49}
}

3.5 Decode cryptic carrier/service codes

Vendor sends shipping records with two-letter carrier codes and three-letter service codes. You want human-readable names plus a unit conversion (oz to lb).

payload

ship_id,carrier_code,service_code,tracking,weight_oz
SH-1,UPS,GND,1Z123,32
SH-2,FDX,2DA,793844,18
SH-3,USPS,PMI,9400111,8

target

"Parse CSV. Return array of objects: ship_id, carrier (UPS->'United Parcel Service', FDX->'FedEx', USPS->'United States Postal Service', else carrier_code), service ({UPS+GND:'Ground', FDX+2DA:'2-Day Air', USPS+PMI:'Priority Mail International', else service_code}), tracking, weight_lb (float, weight_oz/16, round 2dp)."

output

[
  {"ship_id": "SH-1", "carrier": "United Parcel Service", "service": "Ground", "tracking": "1Z123", "weight_lb": 2.0},
  {"ship_id": "SH-2", "carrier": "FedEx", "service": "2-Day Air", "tracking": "793844", "weight_lb": 1.12},
  {"ship_id": "SH-3", "carrier": "United States Postal Service", "service": "Priority Mail International", "tracking": "9400111", "weight_lb": 0.5}
]

4. The per-vendor pattern

If you ingest from N vendors, write N target strings (one per vendor's CSV shape) and store them somewhere - a database table, a config file, a YAML doc next to your import job. Each vendor gets its own cached normalizer; you don't maintain N parsing scripts.

Per-vendor config example

VENDOR_TARGETS = {
    "acme": "Parse vendor CSV. Drop internal_id, _legacy_field. Return ...",
    "globex": "Parse CSV. Currency strings start with EUR; convert ...",
    "initech": "Parse CSV. Group by order_id. Compute total ...",
}

def normalize(vendor: str, csv_text: str) -> list:
    target = VENDOR_TARGETS[vendor]
    r = httpx.post(
        "https://streamfix.dev/v1/map",
        headers={"Authorization": "Bearer sk_..."},
        json={"payload": csv_text, "target": target},
    )
    return r.json()["output"]

When a vendor changes their export format (renames a column, adds a new one), you change one string. The cache picks up the new shape automatically.

5. Errors and status codes

Status	Meaning	What to do
`200`	Normalized.	-
`400`	Body missing fields.	Check request.
`401`	Auth.	Bearer token.
`402`	No credits.	Top up.
`422`	Normalizer raised. Common cause: a column the target named is missing in the actual CSV (vendor renamed it; vendor sent a different schema this time).	Update the target text. Or branch by detected header in your import job.
`502`	Internal generation failure.	Retry.

6. Limits and behavior

Compile-time vs run-time: when generating the normalizer, the API sees the first ~4,000 characters of the CSV. Once compiled, the cached normalizer runs on the full CSV regardless of size.
Sandbox: csv, decimal, datetime, re are available. No filesystem, no network.
Determinism: same input + same target = same output every time.
RFC 4180 quoting: handled correctly via Python's csv module under the hood.

7. Alternatives and how this differs

CSV normalization is one of the most-served problems in software. Here's how /v1/map differs structurally.

Tool	Shape	What it does	Best for
pandas read_csv + DataFrame ops	Library; data analysis stack	Read, filter, group, type-coerce in code	Engineers comfortable in pandas; pipelines where pandas is already in the stack.
OpenRefine	Open-source desktop tool	Interactive cleanup with an undo history; great for one-off projects	Analysts cleaning a single dataset interactively.
csvkit	CLI tool collection	SQL-like queries over CSVs from the shell (csvgrep, csvjoin, csvsql)	Quick command-line cleanup; ad-hoc shell pipelines.
Python `csv` module + your own code	Stdlib; no external deps	You write the mapping logic line-by-line	Stable per-vendor parsers you're willing to maintain.
StreamFix `/v1/map`	HTTP API; one POST per file	Plain-English target text describes the normalization	You ingest from many vendors with different shapes and don't want to write a new script per vendor.

Structural difference: pandas / csvkit / OpenRefine all assume you write the transformation. /v1/map takes the description and writes the transformation function for you. The win is when you have many vendors (N target strings beats N hand-written scripts); the loss is the network round-trip for the first call per shape.

8. When NOT to use this

You ingest from one vendor with one stable schema. Write the parser once with pandas or the csv module. Free, no external dependency, easy to debug.
The CSV is straightforward and you trust the data. No nested groups, no weird formats - just pd.read_csv() handles it.
You're doing interactive exploration. Use OpenRefine or a Jupyter notebook with pandas. The interactive feedback loop is faster than iterating on target strings.
You need full control over the parser for edge cases. When a vendor sends genuinely-malformed CSV (mismatched quotes, headers in row 5, embedded line breaks), you'll spend longer wrestling with target text than just writing the parser.
You need PII to never leave your network. The first call sends the CSV's first ~4,000 chars to OpenAI to compile the normalizer. Cached calls don't, but if first calls are unacceptable, this isn't the right fit.

9. Get an API key

Free trial credits on signup.

Related transformations

Stripe Webhook to CRM →

Map Stripe events into Salesforce / HubSpot / Airtable shapes.

Shopify Order to QuickBooks →

orders/create webhook to QB Invoice / Sales Receipt / Credit Memo.

Parse EDI X12 to JSON →

810 invoices, 850 POs, 856 ASNs - structured records, no DSL.

Parse Apache Logs to JSON →

Combined log format -> structured event with parsed timestamps.