Getting Data In

How to optimize real time search?

SplunkDash
Motivator

Hello,

I have huge volume of data coming in under different source types (or indexes) for different applications/projects. Most of the cases ACCOUNTID and IPAddress are the unique fields for each of the applications/Projects. I require to perform Real Time search over a wide range/period of time (30 days to All Time). How would I optimize these search criteria in Real Time? Any thoughts or recommendations would be highly appreciated. Thank you so much.

Labels (1)
0 Karma
1 Solution

gcusello
Esteemed Legend

Hi @SplunkDash,

I confirm both the answers from @PickleRick: don't use Real Time Searches, in this way you kill your system!

Especially if the RT search is used by more users at the same time!

Having a huge quantity of data and requiring a search on a large time period (30 days), the best approach is schedule a search and save results in a Summary index (https://docs.splunk.com/Documentation/Splunk/8.2.6/Knowledge/Usesummaryindexing) or in an accelerated DataModel (https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels).

Setting a frequent update of the Summary Index or the DataModel you can have a near real time search that usually solves the needs of all customers I encountered.

Ciao.

Giuseppe

View solution in original post

verbal_666
Contributor

RTS are really very dangerous.

I disabled them for ALL users, since they produce a massive searches per minute, and Indexers get suddenly stressed, also if HW environment is good enough. Mostly if users are not such "clever", and they're many with admin/power grants using this feature at the same time, or badly.

Go into your SHs, and other instances, edit the Power role and remove the "rtsearch" capability, from which it is inherited by the Admins.

gcusello
Esteemed Legend

Hi @SplunkDash,

I confirm both the answers from @PickleRick: don't use Real Time Searches, in this way you kill your system!

Especially if the RT search is used by more users at the same time!

Having a huge quantity of data and requiring a search on a large time period (30 days), the best approach is schedule a search and save results in a Summary index (https://docs.splunk.com/Documentation/Splunk/8.2.6/Knowledge/Usesummaryindexing) or in an accelerated DataModel (https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutdatamodels).

Setting a frequent update of the Summary Index or the DataModel you can have a near real time search that usually solves the needs of all customers I encountered.

Ciao.

Giuseppe

SplunkDash
Motivator

Hello @gcusello and @PickleRick 

Thank you so much, appreciate your quick response and support in these efforts. I agree, however, even though, if we want to use summary index/saved search/schedule search/data model based on your recommended approaches, still it's a 100s of millions of events for each of the apps/sourcetypes//indexes. Do you think, I can optimize these searches further by using/creating Indexed Fields for each of the apps/sourcetypes/indexes with unique attributes like ACCOUNTID or IPAddress?  Thank you so much again, your recommendations will be highly appreciated. 

0 Karma

PickleRick
Ultra Champion

As @gcusello said - use datamodels and turn on acceleration for them. Firstly, the data model gives you a great way to normalize your data. You no longer have to bother thinking what was the naming of particular field in particular sourcetype. When you have it normalized to datamodel you can just query your datamodel and get the results from all appropriate indexes/sources/sourcetypes. And indexing makes them work very fast _without altering the original data_.

That's the biggest advantage over indexed fields.  Indexed fields are indeed very fast but they are extracted at index-time and are immutable - you can't for example - change how they are parsed from the raw event after they had beed indexed.

The datamodels however have one drawback and it's worth knowing - you can't differentiate access permissions for parts of data model coming from different source indexes.

gcusello
Esteemed Legend

Hi @SplunkDash,

as I said, try summary indexes or (best) accelerated DataModels and you'll see that also having millions of records you'll have quick searches.

I used summary indexes to make searches on proxy logs of a bank (aroun 1.5 millions of records every day) and result was very performant!

Obviously you have to optimize your searches (e.g. avoid transaction or join commands) and put in the Summary/DataModel only the fields you need for your searches and not all the _row.

Then (if possible) you could anticipate some calculatio during the summary/DataModel preparation (e.g. using stats or timechart command) and storing results you'll have more performant searches.

Ciao.

Giuseppe

SplunkDash
Motivator

Hello @gcusello and @PickleRick ,

Thank you so much again, these are extremely helpful for optimizing my search.

I have 2 questions on Indexed Field,

1. Do I need to perform indexed Time Filed Extraction to create/use indexed Fields? If I need to perform Indexed Time field extraction, would it be any performance issues since SPLUNK always recommend using Search Time Field Extraction instead of Indexed Time Filed Extraction to avoid computational overload?

and

2. My second question: if I need to use Indexed Time Field Extraction then, do I need to have HF installed on Deployment Server as I know Indexed Time Field Extraction doesn't work on UF?

 

Thank you so much again, appreciate your support in these efforts. 

 

0 Karma

gcusello
Esteemed Legend

Hi @SplunkDash,

these are questions on a different topic and I hint (for the next time) to open a different question, in this way you'll be sure that your question will be solved by more people in a quicker and (probably) better way.

Anyway, answering to your questions:

1)

the choise of using extration at Index Time or at Search Time depends on the quantity of data you  have and the searches your used to perform.

In other words, you have to define if you want to move the load of field extractions during indexing phase or during search phase.

Usually if you index many logs it's better to have field extraction at Search Time to avoid an additional load to Indexers, if instead you index few logs and you have many executed searches it could be better to use index time extraction.

I rarely use Index Time extraction!

2)

Index time extractions are made on Indexers or (when present) on Heavy Forwarders so it isn't importanto the way to distribute apps to Forwarders.

Anyway I hint to always use Deployment Server to distribute Apps to Forwarders (UFs or HFs).

Please accepr an answer for the other people of Community.

Ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors 😉

PickleRick
Ultra Champion

The indexed-fields use case depends also on data quality 🙂

Due to how splunk works internally it can speed up searches significantly in some cases when due to how daya contents are built and how are searched other acceleration methods are not really feasible. But that's a topic for a completely different discussion.

Anyway, indexed fields are really rarely used because you lose the flexibility which is one of spkunk's core features with indexed fields. Theoretically, you could parse your whole event into indexed fields and store it that way but then why use splunk at all?

SplunkDash
Motivator

Hello @gcusello ,

Ok...that sounds great to me. just one quick question, proxy logs you meant, predefined or stored/saved logs? Thank you so much again!

0 Karma

gcusello
Esteemed Legend

Hi @SplunkDash,

I had BlueCoat logs, around 1.5 millions every day,

I scheduled a search to update a Summary index every hour and then I runned all my searches on the Summary index, anyway original logs remained for further investigations.

You can decide to store both original and Summary logs or only the seconds and few of the first, it depends on your requirements.

Ciao.

Giuseppe

PickleRick
Ultra Champion

Quick answer is - don't use real-time searches.

Longer answer is - real-time search allocates one CPU on every indexer so it limits your search abilities. It's also limited in what you can do in such search (commands you can use).

Also, if you're interested in data from 30 days back, it's highly unlikely that real-time search makes much difference here.

Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...