Knowledge Management

How to create data capture from large datasets?

satyaallaparthi
Communicator

Hello, 

         Please help me with the below requirement.

         I need to capture usernames from 90 days worth of data from a large datasets which includes multiple source types and multiple indexes.

          Search "index=* sourcetype=* earliest=-90d@d latest=now
| eval LOGIN = lower(user)
| stats count by LOGIN sourcetype"" is taking forever.

 

         Is there a better way to capture the 90 days worth usernames and source types without timeout? 

        Note:  I can able to schedule the search to capture them and append the results. However I am not sure, what time modifiers I should use, If I want to capture all of them in a single day & that should be a continuous process every day.

 

 

 

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. Searching across _all_ your data will always be slow. That's just the way it is. OK. If it's your home test rig with hardly any data, it might go relatively quickly but imagine that you're running this on a system with daily ingest rate of several terabytes. Sorry, it won't be fast if you're searching through raw data. So the more you can limit your search (specify indexes, sourcetypes, maybe only some sources) the better. But still if you simply have a lot of data it will be slow no matter what.

2. There are several methods of making searches faster.

- Data model acceleration

- Summary indexing

- Report acceleration

Each of them has its pros and cons so go to https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutsummaryindexing and read about each of them and decide which is most apropriate for you.

0 Karma

satyaallaparthi
Communicator

I tried in fast mode. 

However I got an idea to capture that large set data.. palling to create 3 separate schedule reports with time modifiers earliest=-90d@d latest=-60d@d, earliest=-60d@d latest=-30d@d and earliest=-30d@d latest=now and then append all 3 csv outputs into one using another schedule report.

 

Thanks, 

0 Karma

GaetanVP
Contributor

Hello @satyaallaparthi,

Even if you have a lot of indexes and sourcetypes, I would advice you to specify them in the query (or create eventtypes to group hosts or sourcetype) in order to have a wrapper and keep the query easy to read. The index=* makes always searches really slow.

I also think that you could use the "Fast Mode" for your research if you didn't already.

Let me know if that helped !

Good Luck

 

 

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...