Generate lookups took too long to run

hylam · ‎10-09-2015

I have 2 datasets, 2.5 GB of IIS logs and 10 GB of apache logs. For the 2.5 GB, ingestion, lookup generation and data model acceleration were completed within 30 minutes and I have a number of dashboards working. For the 10 GB, ingestion took less than 30 minutes, but page generation and session generation did not complete in 1 hour. The following is a list of my observation:

IE tired of running of the splunk javascripts, then I switched to Chrome
browser session timeout, then I go to settings and increased the timeout from 1h to 48h
CPU utilization maxed at 280% on 4-core and 8-core servers. lots of headroom for parallelization
atop showed a lot of headroom on CPU, memory and disk
the 10 GB was from www.company.com hosted on server[1-7]
site=www.company.com, host=, source=
total 7 days data in 35 apache access logs and 35 apache error logs
the real-time dashboard runs fine for a 15-60 minute window before I run out of patience
the longest WA_pages.csv was 400+ lines
the longest WA_sessions.csv was 10k+ lines
if I leave it running during bedtime or lunchtime, usually I got job expired or sid not found when I get back
if rarely see the green tick
the dispatch directory was 2.7 GB max. I know that the default disk quota is 10 GB

I think 10 GB is not that much for splunk. How can I perform initial load and incremental load? What have I done wrong? Further attempts probably include

disabling the 10 minute scheduled searches
launching the job from the cmdline to avoid browser problems
examine each part of the query pipeline

my splunk servers included
- 4-core 2GB, centos7, splunk 6.3, web analytics app 1.42
- 8-core 8GB, centos6.7, splunk 6.3, web analytics app 1.42
- 4-core 8GB, win2k12, splunk 6.2.5, web analytics app 1.42

jbjerke_splunk · ‎10-10-2015

Hi hylam

You are right, 10gb is not very much for Splunk. I wrote the app and have loaded bigger data sets on my Macbook Air than what you have tried here. I don't think it is a good idea to rewrite the Generate Sessions search as its highly tuned to work exactly as it is, to offload data into a lookup which is then read into a datamodel that drives the dashboards. Running that search every time a dasboard is loading won't work so we run it in scheduled batches. The search is using the transaction command to create sessions and that puts heavy load on the search head CPU. The shortage of RAM could also be an issue here.

To get around the browser timeout issues I propose this
- Change the initial run of the Generate Sessions to only search for the last 7 days - by default its set to All Time.
- After the search has kicked off, select "Send to background" in the drop down menu for the search. Splunk will either send you an email when its done (if you have that configured) or let you know in the activity menu if the job is done.

The scheduled searches are running every 10 minutes with an overlapping timeperiod to catch open sessions that have not been closed yet from the previous search. You can alter the scheduled and timings within the search but do it carefully so you don't lose sessions.

Let me know how you get long.

j

hylam · ‎10-16-2015

The CIM app is up and running in a few hours. All 4 CPU cores utilized during parallel ingestion. Data model acceleration is done. I used the Web data model.

hylam · ‎10-14-2015

/opt/splunkforwarder/etc/system/local/limits.conf
[thruput]
maxKBps = 64

124M of gz files that expands to 1.6 GB data took 8 hours to load

https://answers.splunk.com/topics/maxkbps.html?sort=votes&filter=all
index=_internal series=/tmp/*gz | timechart avg(kbps)
The throttling seem to apply to UNCOMPRESSED data. I have yet to find official answer on whether the UF throttling applies to compressed or uncompressed data.

hylam · ‎10-14-2015

monitor:///tmp/logs/server*2015*gz

Does the order of ingestion matter? The wildcard monitor does not guarantee chronological order. The ls output is sorted by server name and then by date. Initially I set it to load 1 day's data from all servers, I changed it to server*2015*gz later.

hylam · ‎10-11-2015

WA_pages.csv is truncated regularly. WA_sessions.csv keeps growing as I add data. How much does it scale to?

jbjerke_splunk · ‎10-12-2015

Hi hylam

If you are running 12 cores, 12gb ram and you still can't get this to work with 10gb of data for 7 days there must be something else at play here. I'm running more than that volume on my consumer grade laptop. Perhaps you can modify the eventtype=web-traffic to contain an index filter?

To answer some of your questions about the app:
Where in the code does it consume WA_sessions.csv or WA_pages.csv and write to data model?
Sessions - This is done in the data model which is update every 10 minutes. Check the Data Model Audit dashboard.
Pages - This is just used for some dropdowns. It's not essential for the functionality of the app.

WA_pages.csv is truncated regularly. WA_sessions.csv keeps growing as I add data. How much does it scale to?
Sessions - This process keeps 72 hours worth of data in the lookup file. The rest is truncated on every scheduled search if the session has been moved into the data model. If you haven't enabled data model acceleration yet, it will keep all data in the session lookup.

j

hylam · ‎10-13-2015

The last bit of data was ingested and accelerated when I was asleep. From the search app data summary I could see the last ingestion or index time. How could I check the acceleration completion time?

jbjerke_splunk · ‎10-14-2015

Hi hylam

There is a Data Model Audit dashboard under Setup.

You can also check all the other dashboards (except Real-Time) to see when the last bit of data was put in the data model.

j

hylam · ‎10-14-2015

https://answers.splunk.com/answers/138860/splunk-acceleration-summary-stuck-at-33.html

Will changing the cron from */10 to *20 improve my chances?

hylam · ‎10-14-2015

Which column should I read on the "Data Model Audit"? None of these look correct: Min_Time, Max_Time, Now_Time. I am loading historical data from few months ago. The Data Summary of the search dashboard is showing the last indextime. I would expect the data model acceleration to be complete within 10-60 minutes of the last indextime.

hylam · ‎10-12-2015

The 2.5 GB IIS log processed quickly was from a simple departmental intranet show a boring sankey chart. The 10 GB over 7 day apache access log is from a public web site with 5 nodes showing an intertwined sankey chart. I think the transaction command during lookup generation is one of the bottlenecks. Is data model building equivalent to searching "tag=web"?

hylam · ‎10-12-2015

eventtype=web-traffic is equivalent to the following

sourcetype="My Access Logs" OR sourcetype="iis" OR sourcetype="access_combined" OR sourcetype="access_common" OR sourcetype="access_combined_wcookie"

then i piped it to the following

| stats count by index sourcetype

the only index is web. the only sourcetype is access_combined. Is it possible to parallelize data model initial builds and updates?

hylam · ‎10-11-2015

http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/

Can I keep something like a database sequence that increments after each invokation? I am thinking off looping thru cidr/16 IPv4 prefixes within 64k iterations. The loop counter is "seq mod 64k"

hylam · ‎10-11-2015

Where in the code does it consume WA_sessions.csv or WA_pages.csv and write to data model?

hylam · ‎10-10-2015

Instead of relying on the memory-bound transaction command, is it possible to iterate over the range of http_session hash keys 64k or 16M times and stream out the results? On 12 core 12 GB reference hardware, I could dedicate 6 cores running this all day. This should work in parallel over disjoint partitions. Does the splunk index support string prefix match? Can I configure the map command to run 64k subsearches? If it cannot be done within the browser UI, can I write a shell script calling the SDK using "xargs -n1 -P6" over the domain of hash prefixes?

Generate lookups took too long to run

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...