Splunk Search

Join/Transaction with field comparison operators

richard_whiffen
Explorer

I have some data sources in splunk that are XML formated. The initial request:

<query id=12345-54321>
<Request_1 initialQueryTime="1294214640000"><SearchParms startNum="1" numResults="14">
<TextQuery type="Title">Children's Hour</TextQuery>
</Request_1>
</query>

The Response:

<query id=c12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>0</NumberOfMatches>
<SearchResults>
</SearchResults>
</Response_1>
</query>

or:

<tvsquery id=12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>2</NumberOfMatches>
<SearchResults>
<VODShow closedCaptioning="Y" dolby="Y" hd="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000006823480</AssetID>
</VODShow>
<VODShow closedCaptioning="Y" dolby="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000005278648</AssetID
</VODShow>
</SearchResults>
</Response_1>
</query>

So I have a field id that should be a transaction. If I do a simple search:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" | transaction id

The transactions marry up correctly by id. If I try to look for transactions with specific attributes, I don't get the results I"m expecting. I've tried several variations of the following (with both the transaction command and join command:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" NumberOfMatches < 0 | transaction id [search host="srchhost*" src="/path/to/tomcat/log/qlogs/*" "Request_1" | top TextQuery ]

I think I'm not looking at the problem the right way and am hoping someone out there can put me on the right path. I have a field id that is common to the search and the result set. I want to know what TextQuery people used that returned zero NumberOfMatches. So I ask for a result set of id values for zero matches. I then want to return the TextQuery field for every one of those IDs.

The individual queries work. I can get a list of the zero result IDs. I can get the TextQuery values for a given ID. I just can't do it in a single search pipeline.

Cheers, Rich

Tags (2)
0 Karma
1 Solution

sideview
SplunkTrust
SplunkTrust

Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )

and then you pipe to transaction or to stats to put the mess together sensibly:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id

1) Transaction

the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats command is a much better tool.

2) stats

Here's an example that hopefully gives you the idea of how you could use stats:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id

3) join

As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.

Further reading:

http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708

http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717

View solution in original post

sideview
SplunkTrust
SplunkTrust

Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )

and then you pipe to transaction or to stats to put the mess together sensibly:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id

1) Transaction

the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats command is a much better tool.

2) stats

Here's an example that hopefully gives you the idea of how you could use stats:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id

3) join

As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.

Further reading:

http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708

http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...