Knowledge Management

How to create data capture from large datasets?

satyaallaparthi
Communicator

Hello, 

         Please help me with the below requirement.

         I need to capture usernames from 90 days worth of data from a large datasets which includes multiple source types and multiple indexes.

          Search "index=* sourcetype=* earliest=-90d@d latest=now
| eval LOGIN = lower(user)
| stats count by LOGIN sourcetype"" is taking forever.

 

         Is there a better way to capture the 90 days worth usernames and source types without timeout? 

        Note:  I can able to schedule the search to capture them and append the results. However I am not sure, what time modifiers I should use, If I want to capture all of them in a single day & that should be a continuous process every day.

 

 

 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. Searching across _all_ your data will always be slow. That's just the way it is. OK. If it's your home test rig with hardly any data, it might go relatively quickly but imagine that you're running this on a system with daily ingest rate of several terabytes. Sorry, it won't be fast if you're searching through raw data. So the more you can limit your search (specify indexes, sourcetypes, maybe only some sources) the better. But still if you simply have a lot of data it will be slow no matter what.

2. There are several methods of making searches faster.

- Data model acceleration

- Summary indexing

- Report acceleration

Each of them has its pros and cons so go to https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutsummaryindexing and read about each of them and decide which is most apropriate for you.

0 Karma

satyaallaparthi
Communicator

I tried in fast mode. 

However I got an idea to capture that large set data.. palling to create 3 separate schedule reports with time modifiers earliest=-90d@d latest=-60d@d, earliest=-60d@d latest=-30d@d and earliest=-30d@d latest=now and then append all 3 csv outputs into one using another schedule report.

 

Thanks, 

0 Karma

GaetanVP
Contributor

Hello @satyaallaparthi,

Even if you have a lot of indexes and sourcetypes, I would advice you to specify them in the query (or create eventtypes to group hosts or sourcetype) in order to have a wrapper and keep the query easy to read. The index=* makes always searches really slow.

I also think that you could use the "Fast Mode" for your research if you didn't already.

Let me know if that helped !

Good Luck

 

 

0 Karma
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL | Key Use Cases to Unlock the Power of SPL

Splunk AI Assistant for SPL | Key Use Cases to Unlock the Power of SPL  The Splunk AI Assistant for SPL ...

Buttercup Games: Further Dashboarding Techniques (Part 5)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...

Customers Increasingly Choose Splunk for Observability

For the second year in a row, Splunk was recognized as a Leader in the 2024 Gartner® Magic Quadrant™ for ...