Solved: Is there a way to save the results for parts of a ...

CREVITCH · ‎01-12-2016

I am executing the following search and it is taking a long time to execute. Is there a way to save the results of parts of a search so that when I modify the tail end I don't have to run the whole search? I.e. can I save the results of user=* | dedup _ raw and then run those saved results through subsequent searches?

user=* | dedup _raw | transaction user date_minute date_second

jeffland · ‎01-13-2016

To save an intermediate result, you could also use

some search | outputlookup temp.csv

and from here on start a new search with

| inputlookup temp.csv | continue search

If some search is a complex (time-consuming) search and you just want to play around with different ways of doing it in continue search, then this method will allow you to do so without any hassle. The only thing you may want to look out for is if the intermediate results are too numerous for a .csv file (say, some hundred thousand lines of result).

View solution in original post

woodcock · ‎01-13-2016

Use | outputcsv to send to disk and then use | inputcsv to pull back in. You can also use Tableau which has a Splunk connector so you can pull in your raw data and save to disk and then do all of the "stuff" to it from the disk image.

jeffland · ‎01-13-2016

To save an intermediate result, you could also use

some search | outputlookup temp.csv

and from here on start a new search with

| inputlookup temp.csv | continue search

If some search is a complex (time-consuming) search and you just want to play around with different ways of doing it in continue search, then this method will allow you to do so without any hassle. The only thing you may want to look out for is if the intermediate results are too numerous for a .csv file (say, some hundred thousand lines of result).

BernardEAI · ‎10-15-2020

Thanks for this interesting suggestion.

I have tried applying this, but I'm getting strange results. Consecutive identical searched is returning different results. My suspicion is that different parts of the search is performed asynchronously, causing the data in an earlier version of temp.csv being read before the new version of temp.csv is written.

Could this be possible?

Note: I'm using "| inputlookup temp.csv" inside a subsearch. Maybe the subsearch is executed asynchronously with the main search?

UPDATE: after looking at the Splunk documentation on subsearches, I read this: "The subsearch is in square brackets and is run first. " This explains the strange behaviour.

javiergn · ‎01-12-2016

Apply filtering as soon as possible and do not use transaction unless you have to.
Specify your index name and sourcetype because it will speed things up.
Also restrict your search by time using earliest and latest.

If you post the whole query I can try to be more specific:

index=foo sourcetype=bar user=* 
| fields user date_minute date_second
| stats list(user) by date_minute, date_second

Let me know if that helps

CREVITCH · ‎01-12-2016

If I only have one index and one sourcetype, will this speed things up? I want to look at all events, and not just within a time window.

Is there a way to reuse the results of a search?

javiergn · ‎01-13-2016

Even if there's only one index and one sourcetype it's always better to be as specific as possible and apply that filter as early as possible in your query.

You can reuse the results of a search via different ways but it all depends on what you are trying to achieve, if you give us more details we might be able to help.

For instance, you can use subsearches, output and inputcsv, collect, etc.

CREVITCH · ‎01-13-2016

the dedup _raw takes so long I am hoping to store its result to pipe to subesequent searches. I need to do thsi step because I have many duplicate events for some reason.

javiergn · ‎01-13-2016

But why do you need to dedup the whole RAW event if you are then only using the following three fields: user date_minute date_second?

Doesn't the following query work for you?

index=foo sourcetype=bar user=* 
 | fields user date_minute date_second
 | stats list(user) by date_minute, date_second

Or the alternative that uses values instead of list to remove duplicates:

index=foo sourcetype=bar user=* 
 | fields user date_minute date_second
 | stats values(user) by date_minute, date_second

somesoni2 · ‎01-12-2016

You'd probably achieve the same result by using just the stats command, which will be much faster. What is the search requirement here?

CREVITCH · ‎01-12-2016

I am looking to group events by transaction. Will the stats command do this for me?

I have a lot of events. By doing user=*, I narrow it to login events since they have a user field. I end up with duplicate events, and I go through dedup. Finally i am left with events, some of which group together (i.e. password accepted and session opened). This is why I want to group as transactions: want to preserve individual events, but want to know the number of independent transactions.

It would be nice to know if there is a way to re-use the results of previous searches. Is there a way to do this?

somesoni2 · ‎01-12-2016

What all field you're interested in? all the fields OR just _raw?

As @javiergn mentioned, restrict your base search by specifying index/sourcetype/source etc. To remove duplicates, group events based on user, date_minute, date_second, try this stats option.

index=blah sourcetype=blah user=* | stats latest(user) as user latest(date_minute) as date_minute latest(date_second) as date_second by _raw | stats list(_raw) as _raw by user date_minute date_second

If you want to preserve more fields add the to both the stats in similar way.

Is there a way to save the results for parts of a search so when I modify the tail end, I don't have to run the whole search?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life