Home/Normalize Vendor CSV Imports

Normalize Vendor CSV Imports: Reshape, Filter, Enrich, Group

Updated April 2026.

Vendors send CSVs that are almost what you need. Junk columns, currency strings, EU date formats, cryptic carrier codes, one row per line item when you want grouped, mixed statuses you want to split. This page is the working reference for using /v1/map as a per-vendor normalizer: five worked examples (drop columns, parse mixed types, group by order, split by status, decode codes), plus an honest comparison with pandas, OpenRefine, and a candid note on when you should write the script yourself.

New here? The full /v1/map endpoint reference (contract, sandbox scope, errors, limits, all use cases) lives at Transformation API overview →. This page is the vendor-CSV deep dive.

1. 30-second quickstart

Python (vendor import job)
import httpx

# Read the CSV the vendor uploaded.
csv_text = open("/path/to/vendor_export.csv").read()

# Describe what your canonical schema is.
TARGET = (
    "Parse vendor CSV. Drop internal_id, _legacy_field, _qa_check columns. "
    "Return array of objects with sku (from vendor_sku), name, price (float "
    "from price_usd), stock (int)."
)

r = httpx.post(
    "https://streamfix.dev/v1/map",
    headers={"Authorization": "Bearer sk_..."},
    json={"payload": csv_text, "target": TARGET},
    timeout=30,
)
records = r.json()["output"]
for rec in records:
    db.products.upsert(rec, on_conflict="sku")

First call with a given vendor's CSV shape compiles a normalizer; subsequent files of the same shape hit the cache. Each vendor gets its own (shape, target) cache entry. See section 4 for the per-vendor pattern.

2. Endpoint reference

POST /v1/map. Auth: Authorization: Bearer sk_YOUR_KEY.

Request body
  • payload  -  the CSV as a string. RFC 4180 quoted-field rules apply (quoted fields can contain commas, escaped quotes are doubled).
  • target  -  description of the output: which columns to keep, type coercions, renames, grouping, filtering.

3. Worked examples

Five common vendor-CSV pains. All outputs verified.

3.1 Drop junk columns and rename

Vendor exports include internal IDs, legacy fields, QA flags. You only want the product data, with their column names mapped to yours.

payload
internal_id,vendor_sku,name,_legacy_field,price_usd,_qa_check,stock
abc-1,V-001,Widget,UNUSED,29.99,OK,42
abc-2,V-002,Gadget,UNUSED,49.50,OK,17
target
"Parse vendor CSV. Drop internal_id, _legacy_field, _qa_check columns. Return array of objects with sku (from vendor_sku), name, price (float from price_usd), stock (int)."
output
[
  {"sku": "V-001", "name": "Widget", "price": 29.99, "stock": 42},
  {"sku": "V-002", "name": "Gadget", "price": 49.5, "stock": 17}
]

3.2 Currency strings, EU dates, mixed types, ugly headers

A real-world bookkeeping export: $1,234.56 currency strings, DD/MM/YYYY dates, capitalized headers with spaces, quoted fields containing commas.

payload
Order Ref,Order Date,Customer,Subtotal,Tax,Total
ORD-001,15/04/2024,"Smith, John","$1,234.56","$98.76","$1,333.32"
ORD-002,16/04/2024,Jane Doe,"$50.00","$4.00","$54.00"
target
"Parse vendor CSV with quoted fields. Return array of objects with: order_ref, order_date (DD/MM/YYYY -> ISO YYYY-MM-DD), customer, subtotal (float, strip $ and commas), tax (float), total (float). Lowercase all keys, strip spaces."
output
[
  {"order_ref": "ORD-001", "order_date": "2024-04-15", "customer": "Smith, John",
   "subtotal": 1234.56, "tax": 98.76, "total": 1333.32},
  {"order_ref": "ORD-002", "order_date": "2024-04-16", "customer": "Jane Doe",
   "subtotal": 50.0, "tax": 4.0, "total": 54.0}
]

3.3 Group flat rows back into orders

Many vendors export "one row per line item" - the order number repeats. Your DB wants one record per order with line items as a nested array. Plus a computed total.

payload
order_id,customer,sku,qty,price
ORD-100,Alice,A-1,2,10.00
ORD-100,Alice,A-2,1,5.00
ORD-101,Bob,B-1,3,7.50
ORD-100,Alice,A-3,1,12.00
target
"Parse CSV. Group rows by order_id. Return array of {order_id, customer, line_items (array of {sku, qty: int, price: float}), total (float, sum of qty*price across line items)}."
output
[
  {"order_id": "ORD-100", "customer": "Alice",
   "line_items": [
     {"sku": "A-1", "qty": 2, "price": 10.0},
     {"sku": "A-2", "qty": 1, "price": 5.0},
     {"sku": "A-3", "qty": 1, "price": 12.0}
   ],
   "total": 37.0},
  {"order_id": "ORD-101", "customer": "Bob",
   "line_items": [{"sku": "B-1", "qty": 3, "price": 7.5}],
   "total": 22.5}
]

3.4 Split by status with totals

A transaction CSV with mixed statuses. Split into separate buckets and compute totals per bucket - the kind of summary your operations team wants in Slack every morning.

target
"Parse CSV. Split rows by status. Return object with keys 'paid', 'pending', 'refunded', each containing array of {txn_id, amount: float}. Also include 'totals' object: {paid: float sum, pending: float sum, refunded: float sum, all: float sum of all amounts}."
output
{
  "paid": [
    {"txn_id": "TX-1", "amount": 99.99},
    {"txn_id": "TX-3", "amount": 150.0},
    {"txn_id": "TX-6", "amount": 300.0}
  ],
  "pending": [
    {"txn_id": "TX-2", "amount": 45.5},
    {"txn_id": "TX-5", "amount": 89.0}
  ],
  "refunded": [{"txn_id": "TX-4", "amount": 22.0}],
  "totals": {"paid": 549.99, "pending": 134.5, "refunded": 22.0, "all": 706.49}
}

3.5 Decode cryptic carrier/service codes

Vendor sends shipping records with two-letter carrier codes and three-letter service codes. You want human-readable names plus a unit conversion (oz to lb).

payload
ship_id,carrier_code,service_code,tracking,weight_oz
SH-1,UPS,GND,1Z123,32
SH-2,FDX,2DA,793844,18
SH-3,USPS,PMI,9400111,8
target
"Parse CSV. Return array of objects: ship_id, carrier (UPS->'United Parcel Service', FDX->'FedEx', USPS->'United States Postal Service', else carrier_code), service ({UPS+GND:'Ground', FDX+2DA:'2-Day Air', USPS+PMI:'Priority Mail International', else service_code}), tracking, weight_lb (float, weight_oz/16, round 2dp)."
output
[
  {"ship_id": "SH-1", "carrier": "United Parcel Service", "service": "Ground", "tracking": "1Z123", "weight_lb": 2.0},
  {"ship_id": "SH-2", "carrier": "FedEx", "service": "2-Day Air", "tracking": "793844", "weight_lb": 1.12},
  {"ship_id": "SH-3", "carrier": "United States Postal Service", "service": "Priority Mail International", "tracking": "9400111", "weight_lb": 0.5}
]

4. The per-vendor pattern

If you ingest from N vendors, write N target strings (one per vendor's CSV shape) and store them somewhere - a database table, a config file, a YAML doc next to your import job. Each vendor gets its own cached normalizer; you don't maintain N parsing scripts.

Per-vendor config example
VENDOR_TARGETS = {
    "acme": "Parse vendor CSV. Drop internal_id, _legacy_field. Return ...",
    "globex": "Parse CSV. Currency strings start with EUR; convert ...",
    "initech": "Parse CSV. Group by order_id. Compute total ...",
}

def normalize(vendor: str, csv_text: str) -> list:
    target = VENDOR_TARGETS[vendor]
    r = httpx.post(
        "https://streamfix.dev/v1/map",
        headers={"Authorization": "Bearer sk_..."},
        json={"payload": csv_text, "target": target},
    )
    return r.json()["output"]

When a vendor changes their export format (renames a column, adds a new one), you change one string. The cache picks up the new shape automatically.

5. Errors and status codes

StatusMeaningWhat to do
200Normalized.-
400Body missing fields.Check request.
401Auth.Bearer token.
402No credits.Top up.
422Normalizer raised. Common cause: a column the target named is missing in the actual CSV (vendor renamed it; vendor sent a different schema this time).Update the target text. Or branch by detected header in your import job.
502Internal generation failure.Retry.

6. Limits and behavior

7. Alternatives and how this differs

CSV normalization is one of the most-served problems in software. Here's how /v1/map differs structurally.

ToolShapeWhat it doesBest for
pandas read_csv + DataFrame opsLibrary; data analysis stackRead, filter, group, type-coerce in codeEngineers comfortable in pandas; pipelines where pandas is already in the stack.
OpenRefineOpen-source desktop toolInteractive cleanup with an undo history; great for one-off projectsAnalysts cleaning a single dataset interactively.
csvkitCLI tool collectionSQL-like queries over CSVs from the shell (csvgrep, csvjoin, csvsql)Quick command-line cleanup; ad-hoc shell pipelines.
Python csv module + your own codeStdlib; no external depsYou write the mapping logic line-by-lineStable per-vendor parsers you're willing to maintain.
StreamFix /v1/mapHTTP API; one POST per filePlain-English target text describes the normalizationYou ingest from many vendors with different shapes and don't want to write a new script per vendor.

Structural difference: pandas / csvkit / OpenRefine all assume you write the transformation. /v1/map takes the description and writes the transformation function for you. The win is when you have many vendors (N target strings beats N hand-written scripts); the loss is the network round-trip for the first call per shape.

8. When NOT to use this

9. Get an API key

Free trial credits on signup.

Sign up

Related transformations