Getting Data In

Indexing performance has slowed to a stop

Rob
Splunk Employee
Splunk Employee

Indexing performance has started to become really bad to the point where my Splunk instance has stopped indexing events or at least appears to have stopped doing so. Restarting Splunk does not resolve the problem and I have not made any changes to my working configuration.

There are plenty of system resources available.

The only thing that seem out of the ordinary is that the Sources.data file for some of my indexes seems somewhat large (5MB - 140MB)

Could this be causing an issue? If so, what can I do to bring my indexing back to an acceptable level?

1 Solution

yannK
Splunk Employee
Splunk Employee

You are correct, the large global metadata files are the culprit.

This is the global metadata file for the Sources, stored in the root folder of each index.
If a file is larger than 50 MB you will start to notice performance impact.
The consequence is slow indexing speed, because of the manipulation of those large file at index time.

The impacted files depend of the type of data you are receiving.
by example for the main index, the files are like :

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/Sources.data
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/SourceTypes.data
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/Hosts.data

  • The reason why the Sources.data file is too large is because you have a very large number of sources. in your case, it seems that each of your log files is a new source, because they are using incremental names. so a new line is created in Sources.data for each new source. example : source::/opt/var/log/mylogog-archive/event_recorder/2012-11-18_2000/4759297159506607
  • The reason for the Sourcetypes.data is when the inputs do not specify a sourcetype for your logs, so Splunk creates a new one automatically for each new source.
  • The reason for the Hosts.data may be that the host is extracted from a changing field. By example for the syslog sourcetype, the host is extracted from the events, from the word after the timestamp. So if your events are not valid syslog the host will be random.

The simple workaround is to upgrade to the splunk version 5.* or more recent, because this global metadata feature has been removed since 5.* (see in indexes.conf disableGlobalMetadata=true)

Important remarks :

  • do not try to trim those files manually, before 5.* they are automatically recreated.
  • never touch the *.data files in the buckets folders they are critical for the search.

View solution in original post

yannK
Splunk Employee
Splunk Employee

You are correct, the large global metadata files are the culprit.

This is the global metadata file for the Sources, stored in the root folder of each index.
If a file is larger than 50 MB you will start to notice performance impact.
The consequence is slow indexing speed, because of the manipulation of those large file at index time.

The impacted files depend of the type of data you are receiving.
by example for the main index, the files are like :

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/Sources.data
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/SourceTypes.data
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/Hosts.data

  • The reason why the Sources.data file is too large is because you have a very large number of sources. in your case, it seems that each of your log files is a new source, because they are using incremental names. so a new line is created in Sources.data for each new source. example : source::/opt/var/log/mylogog-archive/event_recorder/2012-11-18_2000/4759297159506607
  • The reason for the Sourcetypes.data is when the inputs do not specify a sourcetype for your logs, so Splunk creates a new one automatically for each new source.
  • The reason for the Hosts.data may be that the host is extracted from a changing field. By example for the syslog sourcetype, the host is extracted from the events, from the word after the timestamp. So if your events are not valid syslog the host will be random.

The simple workaround is to upgrade to the splunk version 5.* or more recent, because this global metadata feature has been removed since 5.* (see in indexes.conf disableGlobalMetadata=true)

Important remarks :

  • do not try to trim those files manually, before 5.* they are automatically recreated.
  • never touch the *.data files in the buckets folders they are critical for the search.

yannK
Splunk Employee
Splunk Employee

Changing the sources or sourcetypes, will prevent the issue to grow on the long term , but not improve the indexing speed.
Except if you can roll the previous buckets to frozen.

Paolo_Prigione
Builder

It would probably help only if you rotated the buckets with big *.data files to frozen (thus removing them from searchable data)

Rob
Splunk Employee
Splunk Employee

Would it help if I reduce the number of sources, specify the sourcetype and the host for my inputs?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...