Solved: Re: get source file names containing a specific va...

splunkmasterfle · ‎06-20-2014

I need to get the source names of files that contain a specific value. The search is taking a long time because each contains millions of lines. The value I am searching for is repeated on each line. Is there a way to take one line per source to accelerate the search?

Thanks,

The is my search

The customer token is selected in a drop down

Here is some data:

Values:

2014-06-11;com.ws.rich.content.adapter.RichContentAdapterPortType.getMultipleTextContent; 3; pv-1;1.2.5

2014-06-11;com.ws.distribution.reservation.ReservPortType.quote;186;pv-1;1.2.5

2014-06-11;com.ws.rich.content.adapter.RichContentAdapterPortType.getMultipleTextContent;3;pv-1;1.2.5

2014-06-11;com.ws.sales.book.BookingPortType.createBooking;773;pv-1;1.2.5

Goes on for millions of lines in each source file

Each source file contains the same customer

It is really a matter of finding all source files that contain a specific customer without searching every file line by line

martin_mueller · ‎06-23-2014

If each source file only contains one customer then you can get rid of loading all events before the stats like this:

eventtype=perf cust="$customerToken$" | stats count by source | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

That way you'll only load sources that contain the customer you're looking for.

If that doesn't work quickly you could define a summary index or a lookup that extracts one event for every search as soon as it's added to your Splunk and have your search run off that. See http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ for an example.

Edit:

If your number of source files is low then you can do this:

| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head.

View solution in original post

martin_mueller · ‎06-23-2014

If each source file only contains one customer then you can get rid of loading all events before the stats like this:

eventtype=perf cust="$customerToken$" | stats count by source | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

That way you'll only load sources that contain the customer you're looking for.

If that doesn't work quickly you could define a summary index or a lookup that extracts one event for every search as soon as it's added to your Splunk and have your search run off that. See http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ for an example.

Edit:

If your number of source files is low then you can do this:

| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head.

martin_mueller · ‎08-06-2014

Great. I've added that to the answer.

splunkmasterfle · ‎08-06-2014

The last comment worked, you can convert it to an answer and I will accept it

martin_mueller · ‎06-23-2014

If your number of source files is low then you can do this:

| metadata type=sources index=yourindex | map maxsearches=yournumberofsources search="eventtype=perf cust=\"$customerToken$\" source=$$source$$ | head 1" | rex field=source "(?<sourcename>([^/]).[^.]$)" | fields source sourcename

However, I remember there being some bug around the different layers of dollar tokens - one is for the form value, one is for the map value. The dashboard may get confused there. If you can get it to work then this should be blazingly fast because of the head.

splunkmasterfle · ‎06-23-2014

Preliminary tests indicate that this does not speed up the search. It still takes upwards of thirty seconds to find 3 source files out of a total of 10. I will look into the summary index as soon as I have a bit of spare time.
Thanks anyway

splunkmasterfle · ‎06-23-2014

Ok. Sample data and search posted.

martin_mueller · ‎06-20-2014

Do post your search and some sample data - then we might see a way to speed things up.

get source file names containing a specific value without search through every event within

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation

get source file names containing a specific value without search through every event within

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions