Home/Parse Apache Logs to JSON

Parse Apache Logs to JSON: Combined, NGINX, Categorization, Bot Detection

Updated April 2026.

Apache combined log format and NGINX log_format are still the default text shapes for most webservers. Modern observability tools want structured JSON. This page is the working reference: five worked examples (combined log line, NGINX log_format, error categorization, bot detection, multi-line aggregation), error reference, and a comparison with the log-platform alternatives (Vector, Logstash, Fluent Bit, grok).

New here? The full /v1/map endpoint reference (contract, sandbox scope, errors, limits, all use cases) lives at Transformation API overview →. This page is the log-parsing deep dive.

1. 30-second quickstart

curl
curl https://streamfix.dev/v1/map \
  -H "Authorization: Bearer sk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "payload": "127.0.0.1 - alice [10/Oct/2023:13:55:36 -0700] \"GET /api/users HTTP/1.1\" 200 2326 \"https://referrer.com/\" \"Mozilla/5.0\"",
    "target": "Parse Apache combined log format. Return object with: ip, user (null if dash), timestamp_iso (ISO 8601 with timezone), method, path, status (int), bytes (int), referrer (null if dash), user_agent."
  }'

The first call compiles a parser; subsequent calls with the same log shape and target description run the cached parser in milliseconds. For high-volume ingestion, see the bulk-processing pattern in section 4.

2. Endpoint reference

POST /v1/map. The payload is the log line as a string; target is the description of the structured event you want.

Request body
  • payload  -  the raw log line. Multi-line strings work too (see example 3.5).
  • target  -  description of fields, types, and any computed fields.
Response body
  • output  -  the parsed event.
  • cached  -  true if the parser was reused.
  • fingerprint, elapsed_ms, code_length  -  debugging metadata.

3. Worked examples

3.1 Apache combined log line

The canonical case: one log line in, one structured event out, with timestamp parsed to ISO 8601 with timezone.

payload
127.0.0.1 - alice [10/Oct/2023:13:55:36 -0700] "GET /api/users HTTP/1.1" 200 2326 "https://referrer.com/" "Mozilla/5.0"
target
"Parse Apache combined log format. Return object with: ip, user (null if '-'), timestamp_iso (ISO 8601 with timezone), method, path, status (int), bytes (int), referrer (null if '-'), user_agent."
output
{
  "ip": "127.0.0.1",
  "user": "alice",
  "timestamp_iso": "2023-10-10T13:55:36-07:00",
  "method": "GET",
  "path": "/api/users",
  "status": 200,
  "bytes": 2326,
  "referrer": "https://referrer.com/",
  "user_agent": "Mozilla/5.0"
}

3.2 NGINX log_format with x_forwarded_for

NGINX's log_format is configurable. The default closely matches Apache combined; behind a load balancer you usually add $http_x_forwarded_for. Same endpoint, different target string.

payload
203.0.113.5 - - [22/Apr/2024:08:14:55 +0000] "POST /api/v2/orders HTTP/2.0" 201 412 "-" "curl/8.4.0" "203.0.113.5"
target
"Parse NGINX default log_format with x_forwarded_for. Return object with: remote_addr, time_iso (ISO 8601 with TZ), method, path, http_version, status (int), body_bytes (int), referer (null if '-'), user_agent, x_forwarded_for (null if '-')."
output
{
  "remote_addr": "203.0.113.5",
  "time_iso": "2024-04-22T08:14:55+00:00",
  "method": "POST",
  "path": "/api/v2/orders",
  "http_version": "2.0",
  "status": 201,
  "body_bytes": 412,
  "referer": null,
  "user_agent": "curl/8.4.0",
  "x_forwarded_for": "203.0.113.5"
}

3.3 Error categorization at parse time

Adding a computed field at parse time saves a SQL CASE downstream. The target text describes the categorization.

target
"Parse Apache combined log. Return object with: ip, path, method, timestamp_iso, status (int), error_category ('client_error' if status is 400-499, 'server_error' if status >= 500, 'redirect' if status is 300-399, 'ok' otherwise), is_error (true if status >= 400)."
output
{
  "ip": "10.0.0.5",
  "path": "/missing",
  "method": "GET",
  "timestamp_iso": "2023-10-10T14:01:12-07:00",
  "status": 404,
  "error_category": "client_error",
  "is_error": true
}

3.4 Bot detection

The user-agent string is one of the noisiest fields in any access log. Extracting bot information at parse time means downstream tables don't have to re-parse it on every query.

target
"Parse Apache combined log. Return object with all standard fields (ip, timestamp_iso, method, path, status int, bytes int, user_agent) PLUS is_bot (true if user_agent contains 'bot' or 'crawl' or 'spider' case-insensitive) AND bot_name (extracted name like 'Googlebot' or null)."
output
{
  "ip": "66.249.66.1",
  "timestamp_iso": "2023-10-10T14:30:00-07:00",
  "method": "GET",
  "path": "/robots.txt",
  "status": 200,
  "bytes": 256,
  "user_agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
  "is_bot": true,
  "bot_name": "Googlebot"
}

Heads up: the substring-match heuristic shown here catches obvious crawlers but is not production-grade bot detection. Real classifiers use curated UA lists (e.g. crawler-user-agents), reverse-DNS verification (Googlebot is only Googlebot if the source IP rDNSes to *.googlebot.com), and IP-range allowlists. Use this example as a starter; swap in a real classifier downstream.

3.5 Multi-line aggregation

Pass a multi-line log payload (newline-separated) and ask for a summary. Useful for batch reports, hourly rollups, or quick triage of a small log file.

target
"Parse multi-line Apache combined log (newline-separated). Return summary: total_requests (int), by_status (object keyed by status int, value count), top_paths (array of {path, count} sorted by count desc, top 3), total_bytes (int sum)."
output
{
  "total_requests": 4,
  "by_status": {"200": 2, "404": 1, "500": 1},
  "top_paths": [
    {"path": "/a", "count": 2},
    {"path": "/b", "count": 1},
    {"path": "/c", "count": 1}
  ],
  "total_bytes": 430
}

4. Bulk-processing pattern

For high-volume log ingestion, send one representative line first to compile the parser. Subsequent lines of the same shape hit the cache. Most production access logs have just one or two distinct shapes.

Python
import httpx

TARGET = (
    "Parse Apache combined log format. Return object with: "
    "ip, user (null if dash), timestamp_iso (ISO 8601 with TZ), "
    "method, path, status (int), bytes (int), referrer (null if dash), user_agent."
)

def parse_line(client, line):
    r = client.post(
        "https://streamfix.dev/v1/map",
        headers={"Authorization": "Bearer sk_..."},
        json={"payload": line, "target": TARGET},
        timeout=30,
    )
    return r.json()["output"]

with httpx.Client() as client, open("access.log") as f:
    events = [parse_line(client, ln.strip()) for ln in f]

# First line: parser compiled
# Every subsequent line of same shape: cached, fast

If lines from different sources have different shapes (e.g. a mix of Apache and NGINX in the same file), they get separate cached parsers. Each unique shape costs one compile.

5. Errors and status codes

StatusMeaningWhat to do
200Parsed successfully.-
400Body missing payload or target.Check the request body.
401Missing or invalid Bearer token.Include Authorization: Bearer sk_....
402API key has zero credits remaining.Top up at streamfix.dev.
422The parser ran but raised an exception (e.g. a line that doesn't match the format the target described).Catch and log the line. Adjust the target string if the format has variations you didn't account for.
502Internal generation failure (rare).Retry.

6. Limits and behavior

7. Alternatives and how this differs

Log parsing is a mature space. Here's how /v1/map differs structurally. For high-throughput observability pipelines, the tools below will fit better; for ad-hoc or in-app parsing, the API is simpler.

ToolShapeSetupBest for
VectorOpen-source agent (Rust); pipeline DSL with VRL transformsRun as a daemon; configure source -> transform -> sinkHigh-throughput observability pipelines; you already run agents.
LogstashJVM-based; grok patterns + filter pluginsRun as a daemon; conf file with input / filter / output blocksElastic stack users; existing grok pattern library.
Fluent BitLightweight C agent; regex parsersRun as a sidecar; INI-style configContainerized environments; per-pod log shipping.
grok regexNamed capture-group regex syntaxEmbedded in Logstash, Vector, Elastic ingest pipelinesYou're inside one of those pipelines and need a one-off pattern.
StreamFix /v1/mapHTTP API; one POST per line (or per batch)Bearer key, no daemon, no config fileAn app that needs to parse the occasional log line, ad-hoc analysis, or a workflow where log parsing is one step of many.

Structural difference: Vector / Logstash / Fluent Bit are pipelines (input source -> parse transform -> output sink). /v1/map is just the parse transform, called from your code. If you need the full pipeline, use one of those. If you only need the parse step inside something else you're already building, the API is simpler.

8. When NOT to use this

9. Get an API key

Free trial credits on signup.

Sign up

Related transformations