I have 2 datasets, 2.5 GB of IIS logs and 10 GB of apache logs. For the 2.5 GB, ingestion, lookup generation and data model acceleration were completed within 30 minutes and I have a number of dashboards working. For the 10 GB, ingestion took less than 30 minutes, but page generation and session generation did not complete in 1 hour. The following is a list of my observation:
I think 10 GB is not that much for splunk. How can I perform initial load and incremental load? What have I done wrong? Further attempts probably include
my splunk servers included
- 4-core 2GB, centos7, splunk 6.3, web analytics app 1.42
- 8-core 8GB, centos6.7, splunk 6.3, web analytics app 1.42
- 4-core 8GB, win2k12, splunk 6.2.5, web analytics app 1.42
You are right, 10gb is not very much for Splunk. I wrote the app and have loaded bigger data sets on my Macbook Air than what you have tried here. I don't think it is a good idea to rewrite the Generate Sessions search as its highly tuned to work exactly as it is, to offload data into a lookup which is then read into a datamodel that drives the dashboards. Running that search every time a dasboard is loading won't work so we run it in scheduled batches. The search is using the transaction command to create sessions and that puts heavy load on the search head CPU. The shortage of RAM could also be an issue here.
To get around the browser timeout issues I propose this
- Change the initial run of the Generate Sessions to only search for the last 7 days - by default its set to All Time.
- After the search has kicked off, select "Send to background" in the drop down menu for the search. Splunk will either send you an email when its done (if you have that configured) or let you know in the activity menu if the job is done.
The scheduled searches are running every 10 minutes with an overlapping timeperiod to catch open sessions that have not been closed yet from the previous search. You can alter the scheduled and timings within the search but do it carefully so you don't lose sessions.
Let me know how you get long.
/opt/splunkforwarder/etc/system/local/limits.conf [thruput] maxKBps = 64
124M of gz files that expands to 1.6 GB data took 8 hours to load
index=_internal series=/tmp/*gz | timechart avg(kbps)
The throttling seem to apply to UNCOMPRESSED data. I have yet to find official answer on whether the UF throttling applies to compressed or uncompressed data.
Does the order of ingestion matter? The wildcard monitor does not guarantee chronological order. The ls output is sorted by server name and then by date. Initially I set it to load 1 day's data from all servers, I changed it to server*2015*gz later.
If you are running 12 cores, 12gb ram and you still can't get this to work with 10gb of data for 7 days there must be something else at play here. I'm running more than that volume on my consumer grade laptop. Perhaps you can modify the eventtype=web-traffic to contain an index filter?
To answer some of your questions about the app:
Where in the code does it consume WA_sessions.csv or WA_pages.csv and write to data model?
Sessions - This is done in the data model which is update every 10 minutes. Check the Data Model Audit dashboard.
Pages - This is just used for some dropdowns. It's not essential for the functionality of the app.
WA_pages.csv is truncated regularly. WA_sessions.csv keeps growing as I add data. How much does it scale to?
Sessions - This process keeps 72 hours worth of data in the lookup file. The rest is truncated on every scheduled search if the session has been moved into the data model. If you haven't enabled data model acceleration yet, it will keep all data in the session lookup.
The last bit of data was ingested and accelerated when I was asleep. From the search app data summary I could see the last ingestion or index time. How could I check the acceleration completion time?
There is a Data Model Audit dashboard under Setup.
You can also check all the other dashboards (except Real-Time) to see when the last bit of data was put in the data model.
Which column should I read on the "Data Model Audit"? None of these look correct: Min_Time, Max_Time, Now_Time. I am loading historical data from few months ago. The Data Summary of the search dashboard is showing the last indextime. I would expect the data model acceleration to be complete within 10-60 minutes of the last indextime.
The 2.5 GB IIS log processed quickly was from a simple departmental intranet show a boring sankey chart. The 10 GB over 7 day apache access log is from a public web site with 5 nodes showing an intertwined sankey chart. I think the transaction command during lookup generation is one of the bottlenecks. Is data model building equivalent to searching "tag=web"?
eventtype=web-traffic is equivalent to the following
sourcetype="My Access Logs" OR sourcetype="iis" OR sourcetype="access_combined" OR sourcetype="access_common" OR sourcetype="access_combined_wcookie"
then i piped it to the following
| stats count by index sourcetype
the only index is web. the only sourcetype is access_combined. Is it possible to parallelize data model initial builds and updates?
Can I keep something like a database sequence that increments after each invokation? I am thinking off looping thru cidr/16 IPv4 prefixes within 64k iterations. The loop counter is "seq mod 64k"
Instead of relying on the memory-bound transaction command, is it possible to iterate over the range of http_session hash keys 64k or 16M times and stream out the results? On 12 core 12 GB reference hardware, I could dedicate 6 cores running this all day. This should work in parallel over disjoint partitions. Does the splunk index support string prefix match? Can I configure the map command to run 64k subsearches? If it cannot be done within the browser UI, can I write a shell script calling the SDK using "xargs -n1 -P6" over the domain of hash prefixes?