Splunk Search

Is there a way to save the results for parts of a search so when I modify the tail end, I don't have to run the whole search?

CREVITCH
Path Finder

I am executing the following search and it is taking a long time to execute. Is there a way to save the results of parts of a search so that when I modify the tail end I don't have to run the whole search? I.e. can I save the results of user=* | dedup _ raw and then run those saved results through subsequent searches?

user=* | dedup _raw | transaction user date_minute date_second
0 Karma
1 Solution

jeffland
SplunkTrust
SplunkTrust

To save an intermediate result, you could also use

some search | outputlookup temp.csv

and from here on start a new search with

| inputlookup temp.csv | continue search

If some search is a complex (time-consuming) search and you just want to play around with different ways of doing it in continue search, then this method will allow you to do so without any hassle. The only thing you may want to look out for is if the intermediate results are too numerous for a .csv file (say, some hundred thousand lines of result).

View solution in original post

0 Karma

woodcock
Esteemed Legend

Use | outputcsv to send to disk and then use | inputcsv to pull back in. You can also use Tableau which has a Splunk connector so you can pull in your raw data and save to disk and then do all of the "stuff" to it from the disk image.

0 Karma

jeffland
SplunkTrust
SplunkTrust

To save an intermediate result, you could also use

some search | outputlookup temp.csv

and from here on start a new search with

| inputlookup temp.csv | continue search

If some search is a complex (time-consuming) search and you just want to play around with different ways of doing it in continue search, then this method will allow you to do so without any hassle. The only thing you may want to look out for is if the intermediate results are too numerous for a .csv file (say, some hundred thousand lines of result).

0 Karma

BernardEAI
Communicator

Thanks for this interesting suggestion. 

I have tried applying this, but I'm getting strange results. Consecutive identical searched is returning different results. My suspicion is that different parts of the search is performed asynchronously, causing the data in an earlier version of temp.csv being read before the new version of temp.csv is written.

Could this be possible? 

Note: I'm using "| inputlookup temp.csv" inside a subsearch. Maybe the subsearch is executed  asynchronously with the main search?

UPDATE: after looking at the Splunk documentation on subsearches, I read this: "The subsearch is in square brackets and is run first. " This explains the strange behaviour. 

0 Karma

javiergn
Super Champion

Apply filtering as soon as possible and do not use transaction unless you have to.
Specify your index name and sourcetype because it will speed things up.
Also restrict your search by time using earliest and latest.

If you post the whole query I can try to be more specific:

index=foo sourcetype=bar user=* 
| fields user date_minute date_second
| stats list(user) by date_minute, date_second

Let me know if that helps

CREVITCH
Path Finder

If I only have one index and one sourcetype, will this speed things up? I want to look at all events, and not just within a time window.

Is there a way to reuse the results of a search?

0 Karma

javiergn
Super Champion

Even if there's only one index and one sourcetype it's always better to be as specific as possible and apply that filter as early as possible in your query.

You can reuse the results of a search via different ways but it all depends on what you are trying to achieve, if you give us more details we might be able to help.

For instance, you can use subsearches, output and inputcsv, collect, etc.

0 Karma

CREVITCH
Path Finder

the dedup _raw takes so long I am hoping to store its result to pipe to subesequent searches. I need to do thsi step because I have many duplicate events for some reason.

0 Karma

javiergn
Super Champion

But why do you need to dedup the whole RAW event if you are then only using the following three fields: user date_minute date_second?

Doesn't the following query work for you?

index=foo sourcetype=bar user=* 
 | fields user date_minute date_second
 | stats list(user) by date_minute, date_second

Or the alternative that uses values instead of list to remove duplicates:

index=foo sourcetype=bar user=* 
 | fields user date_minute date_second
 | stats values(user) by date_minute, date_second
0 Karma

somesoni2
Revered Legend

You'd probably achieve the same result by using just the stats command, which will be much faster. What is the search requirement here?

0 Karma

CREVITCH
Path Finder

I am looking to group events by transaction. Will the stats command do this for me?

I have a lot of events. By doing user=*, I narrow it to login events since they have a user field. I end up with duplicate events, and I go through dedup. Finally i am left with events, some of which group together (i.e. password accepted and session opened). This is why I want to group as transactions: want to preserve individual events, but want to know the number of independent transactions.

It would be nice to know if there is a way to re-use the results of previous searches. Is there a way to do this?

0 Karma

somesoni2
Revered Legend

What all field you're interested in? all the fields OR just _raw?

As @javiergn mentioned, restrict your base search by specifying index/sourcetype/source etc. To remove duplicates, group events based on user, date_minute, date_second, try this stats option.

index=blah sourcetype=blah user=* | stats latest(user) as user latest(date_minute) as date_minute latest(date_second) as date_second by _raw | stats list(_raw) as _raw by user date_minute date_second

If you want to preserve more fields add the to both the stats in similar way.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...