Getting Data In

How to improve performance on data-models with additional indexers or search heads

robertlynch2020
Motivator

Hi

I have been looking at this doc on Capacity Planning Manual
http://docs.splunk.com/Documentation/Splunk/7.1.0/Capacity/Summaryofperformancerecommendations

But i am not sure what to do, all my searches are on Datamodels - I have one indexer and one search head.
My CPU does not go over 20% on a 58 Cores BOX. I run some very big searches. So the question is what should i do, for faster performance for the search + to scale for more users.

Thank
Robert Lynch

woodcock
Esteemed Legend

Also, if you are peered to different indexer tiers or have data imbalance between what should be identical indexer HW/configurations, you will never be able to reach 100% accelerated without turning on this setting (which defaults to false😞

acceleration.poll_buckets_until_maxtime = true
0 Karma

robertlynch2020
Motivator

Hi

Thanks for the answer. As i have one indexer and one search head is this setting usable for me. In fact this is what i am asking, should i increase my no of indexers and search heads to 2

http://docs.splunk.com/Documentation/Splunk/7.1.0/Capacity/Summaryofperformancerecommendations

If i am reading this doc i think i should go to 2 indexers and 1 search head.

I have up to 100GB a day and 20 users

Thanks
Robert

0 Karma

woodcock
Esteemed Legend

Did you set the cim_<DM>_index macros for the DMs that you accelerated? You can also add sourcetypes there, if your index values are improperly partitioned and have way too many sourctypes in each index. You should also look at proper parallel pipe settings:

http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Parallelization

robertlynch2020
Motivator

Hi Woodcock (Thanks for your help)

I am not sure what cim_DM_index is - how do i set this?

I have also added in the setting from http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Parallelization
However i did not really see any improvement.

server.conf
[general]
parallelIngestionPipeline = 2

limits.conf
[search]
batch_search_max_pipeline = 4
batch_search_max_results_aggregator_queue_size = 300
batch_search_max_serialized_results_queue_size = 300

In datamodels.conf
[MXTIMING_V8_5_Seconds]
acceleration = 1
acceleration.earliest_time = -3mon
acceleration.hunk.dfs_block_size = 0
acceleration.manual_rebuilds = 0
acceleration.poll_buckets_until_maxtime = 0
acceleration.max_concurrent = 3

alt text

Example of the tstats
| tstats summariesonly=true max(MXTIMING.Elapsed) AS Elapsed max(MXTIMING.CPU) AS CPU max(MXTIMING.CPU_PER) AS CPU_PER values(MXTIMING.RDB_COM1) AS RDB_COM values(MXTIMING.RDB_COM_PER1) AS RDB_COM_PER max(MXTIMING.Memory) AS Memory max(MXTIMING.Elapsed_C) AS Elapsed_C values(source) AS source_MXTIMING avg(MXTIMING.Elapsed) AS average, count(MXTIMING.Elapsed) AS count, stdev(MXTIMING.Elapsed) AS stdev, median(MXTIMING.Elapsed) AS median, exactperc95(MXTIMING.Elapsed) AS perc95, exactperc99.5(MXTIMING.Elapsed) AS perc99.5, min(MXTIMING.Elapsed) AS min,earliest(_time) as start, latest(_time) as stop FROM datamodel=MXTIMING_V8_5_Seconds WHERE
host=QCST_RSAT_40
AND MXTIMING.Elapsed > 5
GROUPBY _time MXTIMING.Machine_Name MXTIMING.Context+Command MXTIMING.NPID MXTIMING.Date MXTIMING.Time MXTIMING.MXTIMING_TYPE_DM source MXTIMING.UserName2 MXTIMING.source_path MXTIMING.Command3 MXTIMING.Context3 span=1s
| rename MXTIMING.Context+Command as Context+Command
| rename MXTIMING.NPID as NPID
| rename MXTIMING.MXTIMING_TYPE_DM as TYPE
| rename MXTIMING.Date as Date
| rename MXTIMING.Time as Time
| rename MXTIMING.Machine_Name as Machine_Name
| rename MXTIMING.UserName2 as UserName
| rename MXTIMING.source_path as source_path
| eval Date=strftime(strptime(Date,"%Y%m%d"),"%d/%m/%Y")
| eval Time = Date." ".Time
| eval FULL_EVENT=Elapsed_C
| eval FULL_EVENT=replace(FULL_EVENT,"\d+.\d+","FULL_EVENT") | join Machine_Name NPID type=left [| tstats summariesonly=true count(SERVICE.NPID) AS count2 values(source) AS source_SERVICES FROM datamodel=SERVICE_V5 WHERE ( host=QCST_RSAT_40 earliest=1525770000 latest=1525859918) AND SERVICE.NICKNAME IN ()
GROUPBY SERVICE.Machine_Name SERVICE.NICKNAME SERVICE.NPID
| rename SERVICE.NPID AS NPID
| rename SERVICE.NICKNAME AS NICKNAME
| rename SERVICE.Machine_Name as Machine_Name
| table NICKNAME NPID source_SERVICES Machine_Name ]
| lookup MXTIMING_lookup_Base Context_Command AS "Context+Command" Type as "TYPE" OUTPUT Tags CC_Description Threshold Alert

| appendpipe
[ | where isnull(Threshold)
| rename TYPE AS BACKUP_TYPE
| eval TYPE="
"

| lookup MXTIMING_lookup_Base Context_Command AS "Context+Command" Type as "TYPE" OUTPUT Tags CC_Description Threshold Alert
| rename BACKUP_TYPE AS TYPE]
| dedup Time, NPID,Context+Command
| where Elapsed > Threshold OR isnull('Threshold')
| fillnull Tags
| eval Tags=if(Tags=0,"PLEASE_ADD_TAG",Tags)
| makemv Tags delim=","
| eval Tags=split(Tags,",")
| search Tags IN (*)
| eval source_SERVICES_count=mvcount(split(source_SERVICES, " ")) | eval NICKNAME=if(source_SERVICES_count > 1, "MULTIPLE_OPTIONS_FOUND",NICKNAME) | search

alt text

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.