I need to get the source names of files that contain a specific value. The search is taking a long time because each contains millions of lines. The value I am searching for is repeated on each line. Is there a way to take one line per source to accelerate the search?
Thanks,
The is my search
eventtype=perf | stats first(customer) as cust by source | search cust=$customerToken$ | sort -_time | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename
The customer token is selected in a drop down
Here is some data:
<date>;<requestType>;<requestTime>;<customer>;<version>
Values:
2014-06-11;com.ws.rich.content.adapter.RichContentAdapterPortType.getMultipleTextContent; 3; pv-1;1.2.5
2014-06-11;com.ws.distribution.reservation.ReservPortType.quote;186;pv-1;1.2.5
2014-06-11;com.ws.rich.content.adapter.RichContentAdapterPortType.getMultipleTextContent;3;pv-1;1.2.5
2014-06-11;com.ws.sales.book.BookingPortType.createBooking;773;pv-1;1.2.5
Goes on for millions of lines in each source file
Each source file contains the same customer
It is really a matter of finding all source files that contain a specific customer without searching every file line by line
If each source file only contains one customer then you can get rid of loading all events before the stats
like this:
eventtype=perf cust="$customerToken$" | stats count by source | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename
That way you'll only load sources that contain the customer you're looking for.
If that doesn't work quickly you could define a summary index or a lookup that extracts one event for every search as soon as it's added to your Splunk and have your search run off that. See http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ for an example.
Edit:
If your number of source files is low then you can do this:
| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename
However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head
.
If each source file only contains one customer then you can get rid of loading all events before the stats
like this:
eventtype=perf cust="$customerToken$" | stats count by source | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename
That way you'll only load sources that contain the customer you're looking for.
If that doesn't work quickly you could define a summary index or a lookup that extracts one event for every search as soon as it's added to your Splunk and have your search run off that. See http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ for an example.
Edit:
If your number of source files is low then you can do this:
| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename
However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head
.
Great. I've added that to the answer.
The last comment worked, you can convert it to an answer and I will accept it
If your number of source files is low then you can do this:
| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename
However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head
.
Preliminary tests indicate that this does not speed up the search. It still takes upwards of thirty seconds to find 3 source files out of a total of 10. I will look into the summary index as soon as I have a bit of spare time.
Thanks anyway
Ok. Sample data and search posted.
Do post your search and some sample data - then we might see a way to speed things up.