Splunk Search

Join/Transaction with field comparison operators

richard_whiffen
Explorer

I have some data sources in splunk that are XML formated. The initial request:

<query id=12345-54321>
<Request_1 initialQueryTime="1294214640000"><SearchParms startNum="1" numResults="14">
<TextQuery type="Title">Children's Hour</TextQuery>
</Request_1>
</query>

The Response:

<query id=c12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>0</NumberOfMatches>
<SearchResults>
</SearchResults>
</Response_1>
</query>

or:

<tvsquery id=12345-54321>
<?xml version="1.0" encoding="ISO-8859-1"?>
<Response_1>
<NumberOfMatches>2</NumberOfMatches>
<SearchResults>
<VODShow closedCaptioning="Y" dolby="Y" hd="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000006823480</AssetID>
</VODShow>
<VODShow closedCaptioning="Y" dolby="Y" entitled="Y">
<ResultTitle>Graffiti Bridge HD</ResultTitle>
<AssetID>INTL0609000005278648</AssetID
</VODShow>
</SearchResults>
</Response_1>
</query>

So I have a field id that should be a transaction. If I do a simple search:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" | transaction id

The transactions marry up correctly by id. If I try to look for transactions with specific attributes, I don't get the results I"m expecting. I've tried several variations of the following (with both the transaction command and join command:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" NumberOfMatches < 0 | transaction id [search host="srchhost*" src="/path/to/tomcat/log/qlogs/*" "Request_1" | top TextQuery ]

I think I'm not looking at the problem the right way and am hoping someone out there can put me on the right path. I have a field id that is common to the search and the result set. I want to know what TextQuery people used that returned zero NumberOfMatches. So I ask for a result set of id values for zero matches. I then want to return the TextQuery field for every one of those IDs.

The individual queries work. I can get a list of the zero result IDs. I can get the TextQuery values for a given ID. I just can't do it in a single search pipeline.

Cheers, Rich

Tags (2)
0 Karma
1 Solution

sideview
SplunkTrust
SplunkTrust

Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )

and then you pipe to transaction or to stats to put the mess together sensibly:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id

1) Transaction

the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats command is a much better tool.

2) stats

Here's an example that hopefully gives you the idea of how you could use stats:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id

3) join

As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.

Further reading:

http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708

http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717

View solution in original post

sideview
SplunkTrust
SplunkTrust

Yep. This is a very common stumbling block when you're first getting started. You have to get away from old SQL-ish ideas of joining two different datasets. Instead you just mash them together from the very beginning with a big OR:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )

and then you pipe to transaction or to stats to put the mess together sensibly:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 ) | transaction id

1) Transaction

the transaction command is good for the use cases where, at the end of the day you need the actual multiline raw text of the various events all stuck together. Usually however you find that's not what you need and rather you just need various statistics and bits of text from the overall set of conjoined/conglommified events. In those cases the stats command is a much better tool.

2) stats

Here's an example that hopefully gives you the idea of how you could use stats:

host="srchhost*" src="/path/to/tomcat/log/qlogs/*" ( NumberOfMatches<0 OR Request_1 )
| stats first(AssetId) as AssetId first(TextQuery) as TextQuery sum(NumberOfMatches) as totalMatches by id

3) join

As for the 'join' command, it's mostly evil. I'd avoid it unless you really really cant do it with stats or transaction.

Further reading:

http://answers.splunk.com/questions/8694/joining-two-log-files-that-have-a-common-field/8708#8708

http://answers.splunk.com/questions/822/simulating-a-sql-join-in-splunk/1717#1717

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...