Splunk Search

Regex on multiline event - how to match multiple occurences of a matching group?

Contributor

About the source

I have a SQL report scheduled every 15 minute reporting the status of queues in our case handler system. Splunk is instructed to read all as one event - so when searching in Splunk the event is returned like this

TO_CHAR(SYSDATE,'                                                               
-----------------                                                               
01062016 09:00:00                                                               
HANGING_WOBS_COUNT QUEUE                                                        
------------------ -------------------------                                    
                 2 LIV_InitialCheck                                             
                 1 LIV_InitialCreate                                            
                 1 LIV_LB                                                       
                 0 BPF_Operations                                               
                 0 CE_Operations                                                
                 0 LIV_AttachmentMarkDeleted                                    
                 0 LIV_DeleteIndeksCase                                         
                 0 LIV_InitialLookup                                            
                 0 LIV_InitialMerge                                             
...
25 rows selected.

If I open the source file in Notepad++ and view all characters it looks like this alt text

What I have done

  1. Using Rubular.com created and verified the following regex ^\s{3,}(?<QueueCount>\d+)\s(?<QueueName>\w+) It captures each queuesize and -name
  2. Verified the same regex using the field extractor . It captures only the first queue LIV_InitialCheck
  3. Testing the same regex in the search field ... | rex "^\s{3,}(?<QueueCount>\d+)\s(?<QueueName>\w+)" matches nothing
  4. Prefixing the regex with (?m) makes it match the first occurence (same as #2), but not the rest.

So what does my regex miss in order for Splunk to capture all occurences the same way as Rubular does?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Splunk uses perl regex strings, not ruby. regex101.com is good site for testing regex strings. Also, the rex command will only return the first match unless the max_match option is used. Try this:

... | rex max_match=0 "(?<QueueCount>\d+)\s(?<QueueName>[a-zA-Z_]+)" | ...
---
If this reply helps you, an upvote would be appreciated.

View solution in original post

Influencer

Aww Rich beat me to it. But this may also work for you:

| rex "-------------------------(?<QueueName>[\s\S]+)$" 
| makemv tokenizer="(?<token>\d\s[A-Za-z]+_[A-Za-z]+?)\b" QueueName
| mvexpand QueueName
| rex field=QueueName "^(?<QueueCount>\d+?)\s(?<QueueName>[A-Za-z]+_[A-Za-z]+?)\b" 
| table QueueCount QueueName

Contributor

You are very close to answering my next issue, since I immediately realized that indexing it all as one event makes it hard, if not impossible, to use the queuecount and queuesize as a key-value pair for alerting. But I found two errors (at least I think it is)

  • There are 25 rows returned, that is 25 queuenames, but your search only returns 12 events
  • The search only return 0 as count, it skips queues with other values.
0 Karma

Influencer

Yes I strongly suggest that break them into seperate events. But its your call.

My search should work fine, but I did make some assumptions - I'm assuming your queue names are all XYZ_Something. If they vary you'll need to play with the regex. Also I just copied the data from this website - the actual formatting may be different in your source data - you may need to play around with the rex commands. The regex101 site that Rich posted is excellent for this.

Here's the output from my search: http://imgur.com/2SY7lDA
alt text

0 Karma

Contributor

Realized the following

  • Splitting up the search string you provided and adding one "pipe" after fully understanding each part and it did work as expected
  • But I realized that the log format is a drawback for what I want to monitor (queuesizes over time), since the searchstring becomes so complex. So decided to create a script that parses the inital logfile and creates an additional file where each line has the format "timestamp queuename queuesize" (off course the best would have been to do it with the inital logfile, but ... to much effort since I then need to communicate with offshore resources...
0 Karma

Contributor

Rolling up my sleeves and will dive into the bits and pieces of the search string now to get it working, tnx for the input

0 Karma

SplunkTrust
SplunkTrust

Splunk uses perl regex strings, not ruby. regex101.com is good site for testing regex strings. Also, the rex command will only return the first match unless the max_match option is used. Try this:

... | rex max_match=0 "(?<QueueCount>\d+)\s(?<QueueName>[a-zA-Z_]+)" | ...
---
If this reply helps you, an upvote would be appreciated.

View solution in original post

Contributor

Tnx!

That was what I needed.

R.

0 Karma

Contributor

This was perfect!!!!

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!