How to create data capture from large datasets?

satyaallaparthi · ‎01-06-2023

Hello,

Please help me with the below requirement.

I need to capture usernames from 90 days worth of data from a large datasets which includes multiple source types and multiple indexes.

Search "index=* sourcetype=* earliest=-90d@d latest=now
| eval LOGIN = lower(user)
| stats count by LOGIN sourcetype"" is taking forever.

Is there a better way to capture the 90 days worth usernames and source types without timeout?

Note: I can able to schedule the search to capture them and append the results. However I am not sure, what time modifiers I should use, If I want to capture all of them in a single day & that should be a continuous process every day.

PickleRick · ‎01-06-2023

1. Searching across _all_ your data will always be slow. That's just the way it is. OK. If it's your home test rig with hardly any data, it might go relatively quickly but imagine that you're running this on a system with daily ingest rate of several terabytes. Sorry, it won't be fast if you're searching through raw data. So the more you can limit your search (specify indexes, sourcetypes, maybe only some sources) the better. But still if you simply have a lot of data it will be slow no matter what.

2. There are several methods of making searches faster.

- Data model acceleration

- Summary indexing

- Report acceleration

Each of them has its pros and cons so go to https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutsummaryindexing and read about each of them and decide which is most apropriate for you.

satyaallaparthi · ‎01-06-2023

I tried in fast mode.

However I got an idea to capture that large set data.. palling to create 3 separate schedule reports with time modifiers earliest=-90d@d latest=-60d@d, earliest=-60d@d latest=-30d@d and earliest=-30d@d latest=now and then append all 3 csv outputs into one using another schedule report.

Thanks,

GaetanVP · ‎01-06-2023

Hello @satyaallaparthi,

Even if you have a lot of indexes and sourcetypes, I would advice you to specify them in the query (or create eventtypes to group hosts or sourcetype) in order to have a wrapper and keep the query easy to read. The index=* makes always searches really slow.

I also think that you could use the "Fast Mode" for your research if you didn't already.

Let me know if that helped !

Good Luck

How to create data capture from large datasets?

lookup

other

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms