Monitoring Splunk

Need Help Troubleshooting Poor SplunkWeb Performance

Path Finder

Hi Folks,

I could use some pointers troubleshooting some Splunk Web performance issues.

Over the last few weeks, our team has noticed that Splunk Web has become a little unresponsive. Please would submit searches, but the UI would just sit there and not actually do anything, sometime for upto a minute.

So in true Splunk style, I thought I'd use Splunk itself to try and investigate.

I've started with the web_service.log file, using this basic search:

index=_internal source="/opt/splunk/var/log/splunk/web_service.log" WARNING earliest=-60d | timechart count

This shows that from around the end of August, we've been getting over 10,000 'WARNING' messages per day. (I'd love to insert the chart, but I can't).

These messages are all similar to:

An unknown view name "xxxxxx" is referenced in the navigation definition for "search"

Using this search:

index=_internal source="/opt/splunk/var/log/splunk/web_service.log" WARNING earliest=-60d | rex "view:[\d]{3}\s-\s(?<body>[^$]*)" | stats count by body

The breakdown is as follows:

An unknown view name "index_status" is referenced in the navigation definition for "search".
An unknown view name "indexing_volume" is referenced in the navigation definition for "search".
An unknown view name "inputs_status" is referenced in the navigation definition for "search".
An unknown view name "pdf_activity" is referenced in the navigation definition for "search".
An unknown view name "scheduler_savedsearch" is referenced in the navigation definition for "search".
An unknown view name "scheduler_status" is referenced in the navigation definition for "search".
An unknown view name "scheduler_user_app" is referenced in the navigation definition for "search".
An unknown view name "search_status" is referenced in the navigation definition for "search".
An unknown view name "splunkd_status" is referenced in the navigation definition for "search".
An unknown view name "splunkweb_status" is referenced in the navigation definition for "search".

With 19,291 for each event.

I tried to query my _internal index a bit more, but noticed a slight issue. Although this Splunk Server has been running since April 2010, this index doesn't have any events prior to 28th August.

This is a totally weird one!

The index size isn't anywhere near a size where events would be aged out. It's about 2GB with a limit of 500GB.

All I can think is that an upgrade (probably 4.1.3 to 4.1.4) has somehow wiped the Index?!!

So any help on either of these points would be great.

The SplunkWeb / UI one is most pressing, but thoughts on the _internal index question would be useful too!

Thanks in advance,

Graham.

SplunkTrust
SplunkTrust

I would download and install Firebug, which is a firefox extension. After it's installed and enabled, log into splunk (using firefox), open firebug's panels, switch to the 'Net' panel (you will have to enable it).

The Net panel will show you the HTTP requests and responses along with the time spent in each. This will give you a lot of information quickly over which requests are hanging splunk for a few seconds, and which are blameless.

Splunk Employee
Splunk Employee

The _internal index has a default frozen time period of about 30 days, so anything older than that is eligible to be deleted.

0 Karma

Path Finder

Ahhh... OK, I didn't know that. That's certainly clears up the second point. Now just the performance issue to work out!

Thanks very much for your reply.

0 Karma