Knowledge Management

How to create data capture from large datasets?

satyaallaparthi
Communicator

Hello, 

         Please help me with the below requirement.

         I need to capture usernames from 90 days worth of data from a large datasets which includes multiple source types and multiple indexes.

          Search "index=* sourcetype=* earliest=-90d@d latest=now
| eval LOGIN = lower(user)
| stats count by LOGIN sourcetype"" is taking forever.

 

         Is there a better way to capture the 90 days worth usernames and source types without timeout? 

        Note:  I can able to schedule the search to capture them and append the results. However I am not sure, what time modifiers I should use, If I want to capture all of them in a single day & that should be a continuous process every day.

 

 

 

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. Searching across _all_ your data will always be slow. That's just the way it is. OK. If it's your home test rig with hardly any data, it might go relatively quickly but imagine that you're running this on a system with daily ingest rate of several terabytes. Sorry, it won't be fast if you're searching through raw data. So the more you can limit your search (specify indexes, sourcetypes, maybe only some sources) the better. But still if you simply have a lot of data it will be slow no matter what.

2. There are several methods of making searches faster.

- Data model acceleration

- Summary indexing

- Report acceleration

Each of them has its pros and cons so go to https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutsummaryindexing and read about each of them and decide which is most apropriate for you.

0 Karma

satyaallaparthi
Communicator

I tried in fast mode. 

However I got an idea to capture that large set data.. palling to create 3 separate schedule reports with time modifiers earliest=-90d@d latest=-60d@d, earliest=-60d@d latest=-30d@d and earliest=-30d@d latest=now and then append all 3 csv outputs into one using another schedule report.

 

Thanks, 

0 Karma

GaetanVP
Contributor

Hello @satyaallaparthi,

Even if you have a lot of indexes and sourcetypes, I would advice you to specify them in the query (or create eventtypes to group hosts or sourcetype) in order to have a wrapper and keep the query easy to read. The index=* makes always searches really slow.

I also think that you could use the "Fast Mode" for your research if you didn't already.

Let me know if that helped !

Good Luck

 

 

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...