Archive
Highlighted

Splunk freezing for other users when doing large base search.

Motivator

Hi

Other users are unable to open splunk screens for up to 1 minute while one user is running a large base search?

We have one search head and one indexer.
The base search takes 2 minutes to run off datamodels.

When this search is running other user can't even open splunk to do anything.

Would increasing the number of search head or indexers help?

Any help would be great as i dont want to revert from base search.

Thanks in advance
Robert Lynch

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Champion

Can you share server configuration and the search query you are using ?

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Motivator

Hi

The server is a total Beast.
2 CPU 14 cores - With Turbo threading on : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz- SO 56 Cores ().
RAM size 384GB
6TB SSD
Red Hat

At the moment i cant get it to go over 20 % CPU this is another issues i am looking to improve.
I have one search head and one indexer - I am thinking i might need to increase.

This is the base search... It's long and when i run it over 200 Million lines of data it can take 2 minutes to complete - this is when most other things freeze.

| tstats summariesonly=true max(MXTIMING.Elapsed) AS Elapsed max(MXTIMING.CPU) AS CPU max(MXTIMING.CPUPER) AS CPUPER values(MXTIMING.RDBCOM1) AS RDBCOM values(MXTIMING.RDBCOMPER1) AS RDBCOMPER max(MXTIMING.Memory) AS Memory max(MXTIMING.ElapsedC) AS ElapsedC values(source) AS sourceMXTIMING avg(MXTIMING.Elapsed) AS average, count(MXTIMING.Elapsed) AS count, stdev(MXTIMING.Elapsed) AS stdev, median(MXTIMING.Elapsed) AS median, exactperc95(MXTIMING.Elapsed) AS perc95, exactperc99.5(MXTIMING.Elapsed) AS perc99.5, min(MXTIMING.Elapsed) AS min,earliest(time) as start, latest(time) as stop FROM datamodel=MXTIMINGV85Seconds WHERE
host=QCSTRSAT40
AND MXTIMING.Elapsed > 5
GROUPBY time MXTIMING.MachineName MXTIMING.Context+Command MXTIMING.NPID MXTIMING.Date MXTIMING.Time MXTIMING.MXTIMINGTYPEDM source MXTIMING.UserName2 MXTIMING.sourcepath MXTIMING.Command3 MXTIMING.Context3 span=1s
| rename MXTIMING.Context+Command as Context+Command
| rename MXTIMING.NPID as NPID
| rename MXTIMING.MXTIMING
TYPEDM as TYPE
| rename MXTIMING.Date as Date
| rename MXTIMING.Time as Time
| rename MXTIMING.Machine
Name as MachineName
| rename MXTIMING.UserName2 as UserName
| rename MXTIMING.source
path as sourcepath
| eval Date=strftime(strptime(Date,"%Y%m%d"),"%d/%m/%Y")
| eval Time = Date." ".Time
| eval FULL
EVENT=ElapsedC
| eval FULL
EVENT=replace(FULLEVENT,"\d+.\d+","FULLEVENT")
| join MachineName NPID type=left
[| tstats summariesonly=true count(SERVICE.NPID) AS count2 values(source) AS source
SERVICES FROM datamodel=SERVICEV5 WHERE ( host=QCSTRSAT40 earliest=1525269600 latest=1525357584) AND SERVICE.NICKNAME IN (*)
GROUPBY SERVICE.Machine
Name SERVICE.NICKNAME SERVICE.NPID
| rename SERVICE.NPID AS NPID
| rename SERVICE.NICKNAME AS NICKNAME
| rename SERVICE.MachineName as MachineName
| table NICKNAME NPID sourceSERVICES MachineName ]
| lookup MXTIMINGlookupBase ContextCommand AS "Context+Command" Type as "TYPE" OUTPUT Tags CCDescription Threshold Alert
| appendpipe
[| where isnull(Threshold)
| rename TYPE AS BACKUPTYPE
| eval TYPE="*"
| lookup MXTIMING
lookupBase ContextCommand AS "Context+Command" Type as "TYPE" OUTPUT Tags CCDescription Threshold Alert
| rename BACKUP
TYPE AS TYPE]
| dedup Time, NPID,Context+Command
| where Elapsed > Threshold OR isnull('Threshold')
| fillnull Tags
| eval Tags=if(Tags=0,"PLEASEADDTAG",Tags)
| makemv Tags delim=","
| eval Tags=split(Tags,",")
| search Tags IN (*)
| eval sourceSERVICEScount=mvcount(split(sourceSERVICES, " "))
| eval NICKNAME=if(source
SERVICEScount > 1, "MULTIPLEOPTIONS_FOUND",NICKNAME)

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

SplunkTrust
SplunkTrust

Think about beefing up your servers with proper/recommended H/W. If you're already there then I would increase both SH and indexer. What's your current HW configuration BTW?

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Motivator

Hi

2 CPU 14 cores - With Turbo threading on : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz- SO 56 Cores ().
RAM size 384GB
6TB SSD
Red Hat

When you say increase you mean add on more - at the moment i have one of each. I think i need to add on more.

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Path Finder

Run top on your search head while the user executes the search, I expect you to see utilization jump to 100% This sounds like a resource utilization problem where your hardware cannot keep up with the demand.

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Motivator

HI

I cant get the BOX over 20% ?

It a big big box... but i am looking to push it.
I think i might have to add indexers and SH - I have not done it before, but i will try

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Motivator

vmstat and top

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
19 0 0 907140 4304 375184512 0 0 28 104 0 3 4 1 95 0 0
15 0 0 3216580 4304 374960896 0 0 336 462 518492 24231 22 6 72 0 0
21 0 0 3744904 4304 374987200 0 0 0 29153 436836 30985 22 7 71 0 0
14 0 0 3243016 4304 375016384 0 0 0 17118 450371 29062 23 7 69 0 0
17 0 0 3439856 4304 375056768 0 0 0 427 517405 22007 24 6 70 0 0
15 0 0 3589468 4304 375072640 0 0 0 1417 499387 40596 24 5 71 0 0
15 0 0 3413696 4304 375063040 0 0 0 11410 473863 25533 23 5 72 0 0
25 0 0 3953944 4304 375152448 0 0 0 21670 492277 35373 23 6 70 0 0
13 0 0 4148080 4304 375221440 0 0 0 16422 373584 40051 20 5 75 0 0
6 0 0 4738224 4304 375207680 0 0 0 66 52522 22534 11 3 86 0 0
11 1 0 4355948 4304 375392992 0 0 0 78 54571 23997 10 4 86 0 0
10 0 0 4507112 4304 375828352 0 0 0 217 51837 17508 10 4 86 0 0
8 0 0 3708840 4304 375964480 0 0 0 46526 60016 17598 11 4 85 0 0
7 0 0 3552936 4304 376003744 0 0 0 87206 40021 10533 9 3 87 0 0
12 0 0 1958064 4304 376103168 0 0 0 806 56752 16889 23 6 71 0 0
10 0 0 2568120 4304 376076704 0 0 0 1843 60440 16156 15 4 81 0 0
9 0 0 2318016 4304 376160064 0 0 0 2604 51968 15052 14 3 82 0 0
8 0 0 2563528 4304 376129504 0 0 0 282 40177 15186 12 3 85 0 0
10 0 0 3183756 4304 376142976 0 0 0 37471 55299 14889 11 4 85 0 0

top - 15:47:42 up 2 days, 7:39, 4 users, load average: 14.84, 12.97, 10.54
Tasks: 658 total, 2 running, 656 sleeping, 0 stopped, 0 zombie
%Cpu(s): 15.5 us, 6.5 sy, 0.0 ni, 77.9 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 39595564+total, 977992 free, 54651488 used, 34032617+buff/cache
KiB Swap: 67108860 total, 67108860 free, 0 used. 33537020+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5254 autoeng+ 20 0 47.470g 0.036t 24340 S 680.1 9.7 777:05.55 splunkd
2274 autoeng+ 20 0 5913624 985.0m 12344 S 199.7 0.3 34:07.72 splunkd
661 autoeng+ 20 0 5807112 1.508g 12336 S 99.7 0.4 5:55.52 splunkd
40803 autoeng+ 20 0 5688312 1.542g 12412 S 99.7 0.4 7:18.46 splunkd
304 root 20 0 0 0 0 S 21.2 0.0 0:04.37 kswapd1
303 root 20 0 0 0 0 S 20.5 0.0 0:04.52 kswapd0
42315 autoeng+ 20 0 92588 41836 15156 R 8.3 0.0 0:00.25 splunkd
10272 autoeng+ 20 0 6034912 5.297g 12348 S 6.6 1.4 329:38.43 splunkd
19849 autoeng+ 20 0 2262496 1.902g 12376 S 6.3 0.5 274:43.51 splunkd
19818 autoeng+ 20 0 394720 152636 12348 S 4.3 0.0 138:57.36 splunkd
42264 autoeng+ 20 0 813372 19864 7740 S 4.3 0.0 0:00.13 java
41966 root 0 -20 0 0 0 S 2.0 0.0 1:49.34 kworker/6:0H
42261 autoeng+ 20 0 107900 13508 5096 S 1.7 0.0 0:00.05 python
19845 autoeng+ 20 0 292320 48484 12364 S 0.7 0.0 1:37.45 splunkd
25248 root 0 -20 0 0 0 S 0.7 0.0 2:06.24 kworker/9:2H
25469 root 20 0 155552 95620 95288 S 0.7 0.0 16:09.59 systemd-journal
39096 root 0 -20 0 0 0 S 0.7 0.0 0:03.88 kworker/10:2H
42073 autoeng+ 20 0 52772 2724 1440 R 0.7 0.0 0:00.14 top

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

SplunkTrust
SplunkTrust

I've experienced a limitation on the number of sockets available. When a single dash opens many simultaneous searches, it can prevent other uses from receiving a socket for their dash.

To see if that is the case, ask the users who are "dead" to note any messages that they may receive.

Also, check the dash and see how the base search works. This may be a candidate for the technique where the base search runs, then saves the jobid of the results. The remainder of the searches, instead of using the bse search as such, use loadjob with the id that was returned.

0 Karma
Highlighted

Re: Splunk freezing for other users when doing large base search.

Esteemed Legend

If this is the case, be aware that sockets are also inodes so you may be suffering from inode-exhaustion.

0 Karma