Monitoring Splunk

Performance issue with large number of sources - even after cleaning the index

grundsch
Communicator

Hi,

this is a tricky question about the internals of Splunk.

We had an issue with our installation: basically a single splunk instance on a syslog server consuming the logs. Due to some garbage syslog, the syslog daemon spitted out 50'000s of files with just one line, that got indexed as individual sources.

After that, search performance was ridiculously slow.

We decided to delete the index and not reindex the old data, i.e. just start indexing fresh data with the garbage properly filtered out.

Still, search performance was very slow.

Finally, we deleted all indexes (incl. internal ones), and now everything is amazingly fast.

Has anybody an explanation why cleaning just the faulty index was not enough?

Thanks,

Steph

0 Karma

yannK
Splunk Employee
Splunk Employee

Question, did you have millions of hosts, or millions of sources ?

  • Are you on an old version (3., 4.1.. 4.2.) ? because they were sensitive to large number of metadata. stored in ($SPLUNK_HOME/var/lib/splunk//db/.meta in old )
    However deleting the data and the global metadata version should have helped

  • The fishbucket keeps track of the sources not the so syslog TCP inputs

  • finally, you may have a large number of learned sourceype in your learned app, please check and clean $SPLUNK_HOME/etc/apps/leaned/local/

0 Karma

yannK
Splunk Employee
Splunk Employee

this is strange, maybe as kristian said, the scope of the files to tail is high and this is the tailing processor that is busy.

0 Karma

grundsch
Communicator
  • lots of sources, actually could also have triggered lots of hosts, as the hosts were extracted from the filepath...
  • Splunk version is the latest one, 5.x
  • syslog was written to files, and the files indexed, no direct TCP input
  • sourcetype was fixed (syslog)
0 Karma

kristian_kolb
Ultra Champion

Hmm, if you [monitor] a directory with tens of thousands of files, it will surely tax the tailing processor (who is responsible for checking on whether a file has been updated or not).

When you cleaned the data in your index, did you also clean out all those files from /var/log/syslog (or wherever they are created).

Also, the internal fishbucket index will keep track of all files it has seen, and it will not be deleted when you remove the source files or clean out the data index. That could also play a part.

The sheer amount of indexed bytes probably has little to do with it.

/K

0 Karma

grundsch
Communicator

yeah, thought about it too late...

0 Karma

kristian_kolb
Ultra Champion

Well, it's hard to say now that you've cleaned it. I guess you didn't make a diag dump before you started cleaning?

But I agree with you, a large fishbucket could not really affect searching, could it? Unless it also degrades general performance in some way.

0 Karma

grundsch
Communicator

yes, we did clean also all the files to avoid reindexing. I was also thinking about the fishbucket, but had difficulty to understand how it would be related to a search...

Another possibility was that actually it was still the indexer that had a problem, and blocked some resources from the searches...

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...