Re: need to extract two values from two lines and ...

sukundur · ‎04-21-2016

I am trying to get two files milli seconds from one line and merchant id from another line from the same tomcat tread where milli seconds greater than 9000. below is the sample log

tomcat-http--25 [NTAUtil] parse request
tomcat-http--25 [RequestParser] API Version: urn:schemas-xxxxxxxxxx-com:transaction-data-1.60
tomcat-http--25 [NTAUtil] parse request 19000
tomcat-http--25 [TransformFilter] Transform Request 140
tomcat-http--25 [WSSecurityFilter] merchantID: james_mid; digestAlgorithm: null

From the above log, i need 19000 & james_mid in a table. if milli sec greater than 9000.

sukundur · ‎04-25-2016

I tried this query on below log ( including time stamp as well). when i add the where cluase (Parse request)... "merchant ID" is not getting displayed. include " transaction threadID " ZERO results.

2016-04-21 19:38:00,079 INFO  tomcat-http--25 [ABCUtil] parse request
2016-04-21 19:38:00,117 INFO  tomcat-http--25 [RequestParser] API Version: urn:schemas-google-com:transaction-data-1.00001
2016-04-21 19:38:00,118 INFO  tomcat-http--25 [ABCUtil] parse request 39
2016-04-21 19:38:00,118 INFO  tomcat-http--25 [goooooogleFilter] Transform Request 140
2016-04-21 19:38:00,120 INFO  tomcat-http--25 [googleeeeeeeeeeee] merchantID: james_mid; digestAlgorithm: null

I have used this query

Query 1 : No merchant ID

host=sl*ntaapp* source="/var/mware/nta/nta_service.log"  NTAUtil OR WSSecurityFilter
 | rex "tomcat-http--(?\d+)"
 | rex  "\[NTAUtil\] parse request (?\d+)"
 | rex "merchantID: (?.*?);" | where parseRequest >= 1 
 | table threadID parseRequest merchantID

output

threadID    parseRequest    merchantID
62  5   
102 13  
36  85  
65  5   
111 5   
36  85  
53  162

Query 2 : zero records

 host=sl*ntaapp* source="/var/mware/nta/nta_service.log" 
 | rex "tomcat-http--(?\d+)"
 | rex  "\[NTAUtil\] parse request (?\d+)"
 | rex "merchantID: (?.*?);"
 | transaction threadID
 | where parseRequest >= 1
 | table threadID parseRequest merchantID

am i missing anything here?

lguinn2 · ‎04-25-2016

You need to use the formatting 101010 to for queries, otherwise the editor assumes that characters like underscore are formatting characters and it makes a mess of what you are posting.

lguinn2 · ‎04-25-2016

And the rex you have for merchantID will definitely NOT work.

sukundur · ‎04-22-2016

giving it a try now.

Richfez · ‎04-28-2016

sukundur,

Could you check the additional comment I made yesterday and try some of those examples again? I think we're close and I'd hate to get so close yet not get final resolution on this.

Thanks,
Rich

sukundur · ‎04-28-2016

Hi Rich & IGuinn

Thank you guys for looking at the issue.
Rich. I tried your queries and they provided the results i expected.

similar issues ( with treadID) were troubling me for a long time. this helps me fix thouse issues too.

I have to fix the query a bit so that each treadID (or result) is reported as one line( instead of grouping).
so when i export if does not report all in one cell ( as many of users has same Merchant ID).

i will see and let youknow.

thanks again to both of you.

Richfez · ‎04-22-2016

Assuming the answers to my comment above are favorable.

First, let's extract your threadID from all lines, and extract "everything else" as another field.

... my root search ... | rex field=_raw "^(?<ThreadID>[^\s]+)\s+(?<RemainingMessage>.*)$"

That should extract two fields, ThreadID and RemainingMessage. Now that we have those, let's parse the two RemainingMessage fields into the bits you need.

... everything above plus ... 
| rex field=RemainingMessage "^\[NTAUtil\]\s+parse\s+request\s+(?<ParseTime>\d+)"
| rex field=RemainingMessage "merchantID:\s+(?<MerchantID>[^\s;]+)"

That should extract both ParseTime and MerchantID.

Now that we have all those pieces in place, we can group them in a variety of ways. You could use transaction, which is often more readable code, but in this case using stats is likely to be just as easy to understand and will be far faster.

... everything above plus ... 
| stats values(MerchantID) as MerchantID, values(ParseTime) as ParseTime by ThreadID
| search ParseTime>9000 | table MerchantID, ParseTime, ThreadID

So, we take out the distinct values of MerchantID and ParseTime for each group of ThreadID. I used values instead of some other function just in case your threads overlap (after restart?). Then we search where our ParseTime is greater than your 9000 threshold, then output a pretty table.

So, all together for easy pasting...

... my root search ... 
| rex field=_raw "^(?<ThreadID>[^\s]+)\s+(?<RemainingMessage>.*)$"
| rex field=RemainingMessage "^\[NTAUtil\]\s+parse\s+request\s+(?<ParseTime>\d+)"
| rex field=RemainingMessage "merchantID:\s+(?<MerchantID>[^\s;]+)"
| stats values(MerchantID) as MerchantID, values(ParseTime) as ParseTime by ThreadID
| search ParseTime>9000 | table MerchantID, ParseTime, ThreadID

Give that a try and see how that works.

Richfez · ‎04-26-2016

When I use the snippet of data you provide down below (with the timestamps), I had to make a couple of small changes to get it to completely work:

host=sl*ntaapp* source="/var/mware/nta/nta_service.log"  NTAUtil OR WSSecurityFilter
| rex field=_raw "INFO\s+(?<ThreadID>[^\s]+)\s+(?<RemainingMessage>.*)$"
| rex field=RemainingMessage "^\[NTAUtil\]\s+parse\s+request\s+(?<ParseTime>\d+)"
| rex field=RemainingMessage "merchantID:\s+(?<MerchantID>[^\s;]+)"
| stats values(MerchantID) as MerchantID, values(ParseTime) as ParseTime by ThreadID
| search ParseTime>30

You'll see that I changed the first rex because I had originally set it to start at the beginning of the string, but because the actual lines look like this:

2016-04-26 07:31:00,079 INFO  tomcat-http--25 [NTAUtil] parse request

I changed it to look for "INFO " at the start to find the ThreadID. I also moved the search ParseTime>9000 (which I used 30 for because that's all the data I had) to AFTER the stats. If you do it before, you remove all the lines that ParseTime isn't in, then try to group the lines that are no longer there which doesn't work so well. My bad on the original way.

That gives results:

ThreadID            MerchantID  ParseTime
tomcat-http--25     james_mid   39

If you'd like to use the transaction method outlined by lguinn below, feel free to. Just remove the stats and search and replace it with the transaction, like so:

host=sl*ntaapp* source="/var/mware/nta/nta_service.log"  NTAUtil OR WSSecurityFilter
| rex field=_raw "tomcat-http--(?<ThreadID>\d+)\s+(?<RemainingMessage>.*)$"
| rex field=RemainingMessage "^\[NTAUtil\]\s+parse\s+request\s+(?<ParseTime>\d+)"
| rex field=RemainingMessage "merchantID:\s+(?<MerchantID>[^\s;]+)"
| transaction ThreadID
| search ParseTime>30
| table ThreadID ParseTime MerchantID

If you'd prefer the "just the number" version of ThreadID, you could do something more akin to what lguinn did with the rexes:

host=sl*ntaapp* source="/var/mware/nta/nta_service.log"  NTAUtil OR WSSecurityFilter
| rex field=_raw "tomcat-http--(?<ThreadID>\d+)\s+(?<RemainingMessage>.*)$"
| rex field=RemainingMessage "^\[NTAUtil\]\s+parse\s+request\s+(?<ParseTime>\d+)"
| rex field=RemainingMessage "merchantID:\s+(?<MerchantID>[^\s;]+)"
| stats values(MerchantID) as MerchantID, values(ParseTime) as ParseTime by ThreadID
| search ParseTime>30

Which gives nearly identical output as before, only with "ThreadID" of 25 instead of tomcat-http--25. Your choice there.

Let us know how it goes!

Richfez · ‎04-22-2016

Oh, yeah, the fantastic and talented lguinn has the optimization of making your search only return the two lines from each thread you need:

... my root search ... NTAUtil OR WSSecurityFilter | ...

That'll trim the number of events needing to be handled by (in your small example) by 60%, and I'm assuming there are a lot more lines per thread so the actual savings and speed increase should be FAR larger than that.

lguinn2 · ‎04-22-2016

Assume, as @rich7177 suggested, that 25 is the thread number. Also assume that these three fields exist: threadID parseRequest and merchantID. Then the following will give you the table that you want, I believe

sourcetype=tomcat_or_whatever NTAUtil OR WSSecurityFilter
| transaction threadID
| where parseRequest > 9000
| table threadID parseRequest merchantID

If the fields do not exist, use this search, which also extracts the fields

sourcetype=tomcat_or_whatever NTAUtil OR WSSecurityFilter
| rex "tomcat-http--(?<threadID>\d+)"
| rex  "\[NTAUtil\] parse request (?<parseRequest>\d+)"
| rex "merchantID: (?<merchantID>.*?);"
| transaction threadID
| where parseRequest > 9000
| table threadID parseRequest merchantID

Richfez · ‎04-22-2016

Is that "thread 25?" If not, how can you tell which thread you are in?
Does the ms you need always come in the line [NTAUtil] parse request?
Is merchantID already being extracted (i.e. does it show up in your field list on the left)?

need to extract two values from two lines and display as table

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Are you a member of the Splunk Community?

need to extract two values from two lines and display as table

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...