I have some data sources in splunk that are XML formated. The initial request:
<query id=12345-54321>
<Request_1 initialQueryTime="1294214640000"><SearchParms startNum="1" numResults="14">
<TextQuery type="Title">Children's Hour</TextQuery>
</Request_1>
</query>
The Response:
<query id=c12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>0</NumberOfMatches>
<SearchResults>
</SearchResults>
</Response_1>
</query>
or:
<tvsquery id=12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>2</NumberOfMatches>
<SearchResults>
<VODShow closedCaptioning="Y" dolby="Y" hd="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000006823480</AssetID>
</VODShow>
<VODShow closedCaptioning="Y" dolby="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000005278648</AssetID
</VODShow>
</SearchResults>
</Response_1>
</query>
So I have a field id that should be a transaction. If I do a simple search:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" | transaction id
The transactions marry up correctly by id. If I try to look for transactions with specific attributes, I don't get the results I"m expecting. I've tried several variations of the following (with both the transaction command and join command:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" NumberOfMatches < 0 | transaction id [search host="srchhost*" src="/path/to/tomcat/log/qlogs/*" "Request_1" | top TextQuery ]
I think I'm not looking at the problem the right way and am hoping someone out there can put me on the right path. I have a field id that is common to the search and the result set. I want to know what TextQuery people used that returned zero NumberOfMatches. So I ask for a result set of id values for zero matches. I then want to return the TextQuery field for every one of those IDs.
The individual queries work. I can get a list of the zero result IDs. I can get the TextQuery values for a given ID. I just can't do it in a single search pipeline.
Cheers, Rich
Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
and then you pipe to transaction or to stats to put the mess together sensibly:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id
1) Transaction
the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats
command is a much better tool.
2) stats
Here's an example that hopefully gives you the idea of how you could use stats:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id
3) join
As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.
Further reading:
http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708
http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717
Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
and then you pipe to transaction or to stats to put the mess together sensibly:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id
1) Transaction
the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats
command is a much better tool.
2) stats
Here's an example that hopefully gives you the idea of how you could use stats:
host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id
3) join
As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.
Further reading:
http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708
http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717