Where are 2 ways to extract fields:
- By default Splunk recognise “access_combined” log format which is default format for Nginx. If it is your case congratulations nothing to do for you!
- For custom format of logs you will need to create regular expression. Splunk has built in user interface to extract fields or you can provide regular expression manually.
Website traffic over time and error rate
timechart count(status) span=1m by status
For response time I suggest to use 20, 85 and 95 percentile as metrics.
You also can think of average response time metric, but low average response time doesn’t show that website is OK, so I am not using that metric in the query.
timechart perc20(request_time), perc85(request_time), perc95(request_time) span=1m
Traffic by IP
top limit=20 clientip
Top of error page
Top error pages
search status >= 500 | stats count(status) as cnt by uri, status | sort cnt desc
Top 40x error pages
search status >= 400 AND status < 500 | stats count(status) by uri, status | sort cnt desc
Number of timeouts(>30s) per upstream
Timeouts could be a symptom for: slow application performance, not enough system resources or just upstream server is down.
search upstream_response_time >= 30 | stats count(upstream_response_time) as upstreams by upstream
Most time consuming upstreams
stats sum(upstream_response_time), count(upstream) by upstream
Splunk functions like timechart, stats and top is your best friends for data aggregation. They are like unix tools - the more tools you know the more easier is to build powerful commands.