Deployment Architecture

OLD Splunk Server: lookups and other slow-downs?

jewettg
Explorer

All..

I have inherited the task of learning about an older Splunk installation (4.1.5). It is working just fine, however, it has been starting to show some high-cpu usage and other indications that it is struggling. Given this box has not been touched or changed in a while, but the amount of data it has been tasked with indexing and the number of queries has not been static. I expect that the main reason for the issues being seen are related to the hardware and software being updated to match the load being put on it.

I am soliciting ideas on helping me find weak spots and areas of bad implementation since I did not design this thing, I want to look for weak spots.

So far, it has been all Splunk research:

  • Found numerous lookups taking place, some with Python scripts, others using external lookup files.
  • Found that the "queue" has been hitting the "max_size" of the queue frequently, but the machine memory is not being utilized fully.

Questions:

  • If I have found that there are lookups defined, but files missing, will this cause Splunk to slow down looking for these files or timing out?
  • Is there a way to find out if a lookup is being triggered or being used?
  • Is there a faster or more efficient method of "lookups" than external files or python scripts?
  • Is there a way to increase the max_size of the queue to handle more items in the queue? Is this recommended?

Thanks!

Tags (1)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Obvious answer is obvious: Upgrade to a newer version, 4.1.x is really old! (and not supported any more)

On the individual points:

  • scripted lookups always are slower than CSV files, but it depends on the use case whether these are replaceable 1:1
  • missing lookup files should generate error messages, but not cause a lot of slowdown. If a file isn't there there's no need to wait for a timeout, it's not going to magically reappear.
  • queue sizes can be configured in limits.conf (insert disclaimer about not remembering how much was exposed when, regarding the age of 4.1.x), which queues are full on your end?
  • Usually full queues are an indicator of bigger issues down the pipeline, simply increasing the queue size isn't going to add performance.
  • If your machine is as old as 4.1.x, upgrading the hardware even to a small box (by modern standards) is going to be much cheaper than sinking human time into it.
  • Finding out if a lookup is actually useful to your search results is hard. Finding out if it's being looked at is easier: Say your lookup is defined for sourcetype foo, then it'll get looked at if your search scans events from sourcetype foo. I'll bash the version again: In the past few years there have been many optimizations, certainly around searching with lookups as well. UPGRADE!!

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Obvious answer is obvious: Upgrade to a newer version, 4.1.x is really old! (and not supported any more)

On the individual points:

  • scripted lookups always are slower than CSV files, but it depends on the use case whether these are replaceable 1:1
  • missing lookup files should generate error messages, but not cause a lot of slowdown. If a file isn't there there's no need to wait for a timeout, it's not going to magically reappear.
  • queue sizes can be configured in limits.conf (insert disclaimer about not remembering how much was exposed when, regarding the age of 4.1.x), which queues are full on your end?
  • Usually full queues are an indicator of bigger issues down the pipeline, simply increasing the queue size isn't going to add performance.
  • If your machine is as old as 4.1.x, upgrading the hardware even to a small box (by modern standards) is going to be much cheaper than sinking human time into it.
  • Finding out if a lookup is actually useful to your search results is hard. Finding out if it's being looked at is easier: Say your lookup is defined for sourcetype foo, then it'll get looked at if your search scans events from sourcetype foo. I'll bash the version again: In the past few years there have been many optimizations, certainly around searching with lookups as well. UPGRADE!!

jewettg
Explorer

Thank you so much for responding. You are preaching to the choir regarding the version. We are stuck on this version until I can figure out the pre and post processing code (APIs) written to support our environment. The person who made the monstrosity left the University without any documentation. It has been solid, but everyone is scare to death to touch it.

I am poking around trying to see if I can extend its' life and eek any performance out of it -- all while I am looking at the code and seeing if I can move everything over to a newer version.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...