Getting Data In

Best way to index a file for once-off analysis

RVDowning
Contributor

What is the best way to index a file (user application file) or two for a one time analysis? Should I create a new index, use a new source type, copy the file(s) into a directory on the server and then use the Web based data upload?

Then just delete everything afterwards?

Tags (3)
0 Karma

kristian_kolb
Ultra Champion

Yes, in general a good idea. No need to invent sourcetypes if those that exist are good/relevant. Remember that the retention time of an index relates to the event timestamps, not when the logs were ingested. Keeping a too short retention time could cause them to be deleted before you have time to make the analysis.

grijhwani
Motivator

1) If you have full access to the deployment, then by far the simplest way to do this is create yourself an index for the purpose, give yourself query rights to same, and delete the entire index once you are sure you are finished with the data. It keeps it neatly encapsulated, and makes the release of storage a doddle. If you go the alternative route of selectively deleting the event data using a delete directive, this is laborious, risks selecting/deleting unexpected data, will not fully release the storage used until the index content surrounding it is expired (because the data itself is not deleted from storage, per se, only de-indexed) and expunged by house-keeping, and has a high processing overhead by deleting each event index item in turn.

2) Bear in mind that whichever way you choose to go, deleting the data/index will not cancel the licence usage incurred. If the data is large in comparison with your licence cap you may cause yourself problems if this is a repeated occurrence.

0 Karma

RVDowning
Contributor

Most helpful, thanks much.

0 Karma

strive
Influencer

I asked for frequency, so that you can create a separate index for this and set data retention as 1 or 2 days. That way you need not delete or clean index every time you finish your analysis.

We followed same approach while trying proof of concepts. We created a new index for this, called it as temp_analysis_idx. Created a new sourcetype as temp_analysis_srctype. Rest everything as you have mentioned in your question. This helped us not to mess up with other indexers. Since data retention was set as 2 days, automatically data used to get deleted and we need not execute commands every time.

RVDowning
Contributor

Probably not often, but why would the frequency really matter?

0 Karma

strive
Influencer

The answer depends on how frequently you do this kind of one time analysis?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...