Splunk Search

get source file names containing a specific value without search through every event within

splunkmasterfle
Path Finder

I need to get the source names of files that contain a specific value. The search is taking a long time because each contains millions of lines. The value I am searching for is repeated on each line. Is there a way to take one line per source to accelerate the search?

Thanks,


The is my search

eventtype=perf | stats first(customer) as cust by source | search cust=$customerToken$ | sort -_time | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

The customer token is selected in a drop down

Here is some data:

<date>;<requestType>;<requestTime>;<customer>;<version>

Values:

2014-06-11;com.ws.rich.content.adapter.RichContentAdapterPortType.getMultipleTextContent; 3; pv-1;1.2.5

2014-06-11;com.ws.distribution.reservation.ReservPortType.quote;186;pv-1;1.2.5

2014-06-11;com.ws.rich.content.adapter.RichContentAdapterPortType.getMultipleTextContent;3;pv-1;1.2.5

2014-06-11;com.ws.sales.book.BookingPortType.createBooking;773;pv-1;1.2.5

Goes on for millions of lines in each source file

Each source file contains the same customer

It is really a matter of finding all source files that contain a specific customer without searching every file line by line

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

If each source file only contains one customer then you can get rid of loading all events before the stats like this:

eventtype=perf cust="$customerToken$" | stats count by source | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

That way you'll only load sources that contain the customer you're looking for.

If that doesn't work quickly you could define a summary index or a lookup that extracts one event for every search as soon as it's added to your Splunk and have your search run off that. See http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ for an example.

Edit:

If your number of source files is low then you can do this:

| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

If each source file only contains one customer then you can get rid of loading all events before the stats like this:

eventtype=perf cust="$customerToken$" | stats count by source | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

That way you'll only load sources that contain the customer you're looking for.

If that doesn't work quickly you could define a summary index or a lookup that extracts one event for every search as soon as it's added to your Splunk and have your search run off that. See http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ for an example.

Edit:

If your number of source files is low then you can do this:

| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head.

martin_mueller
SplunkTrust
SplunkTrust

Great. I've added that to the answer.

0 Karma

splunkmasterfle
Path Finder

The last comment worked, you can convert it to an answer and I will accept it

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

If your number of source files is low then you can do this:

| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head.

0 Karma

splunkmasterfle
Path Finder

Preliminary tests indicate that this does not speed up the search. It still takes upwards of thirty seconds to find 3 source files out of a total of 10. I will look into the summary index as soon as I have a bit of spare time.
Thanks anyway

0 Karma

splunkmasterfle
Path Finder

Ok. Sample data and search posted.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Do post your search and some sample data - then we might see a way to speed things up.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...