Splunk Search

Regex on multiline event - how to match multiple occurences of a matching group?

rune_hellem
Contributor

About the source

I have a SQL report scheduled every 15 minute reporting the status of queues in our case handler system. Splunk is instructed to read all as one event - so when searching in Splunk the event is returned like this

TO_CHAR(SYSDATE,'                                                               
-----------------                                                               
01062016 09:00:00                                                               
HANGING_WOBS_COUNT QUEUE                                                        
------------------ -------------------------                                    
                 2 LIV_InitialCheck                                             
                 1 LIV_InitialCreate                                            
                 1 LIV_LB                                                       
                 0 BPF_Operations                                               
                 0 CE_Operations                                                
                 0 LIV_AttachmentMarkDeleted                                    
                 0 LIV_DeleteIndeksCase                                         
                 0 LIV_InitialLookup                                            
                 0 LIV_InitialMerge                                             
...
25 rows selected.

If I open the source file in Notepad++ and view all characters it looks like this alt text

What I have done

  1. Using Rubular.com created and verified the following regex ^\s{3,}(?<QueueCount>\d+)\s(?<QueueName>\w+) It captures each queuesize and -name
  2. Verified the same regex using the field extractor . It captures only the first queue LIV_InitialCheck
  3. Testing the same regex in the search field ... | rex "^\s{3,}(?<QueueCount>\d+)\s(?<QueueName>\w+)" matches nothing
  4. Prefixing the regex with (?m) makes it match the first occurence (same as #2), but not the rest.

So what does my regex miss in order for Splunk to capture all occurences the same way as Rubular does?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Splunk uses perl regex strings, not ruby. regex101.com is good site for testing regex strings. Also, the rex command will only return the first match unless the max_match option is used. Try this:

... | rex max_match=0 "(?<QueueCount>\d+)\s(?<QueueName>[a-zA-Z_]+)" | ...
---
If this reply helps you, Karma would be appreciated.

View solution in original post

jplumsdaine22
Influencer

Aww Rich beat me to it. But this may also work for you:

| rex "-------------------------(?<QueueName>[\s\S]+)$" 
| makemv tokenizer="(?<token>\d\s[A-Za-z]+_[A-Za-z]+?)\b" QueueName
| mvexpand QueueName
| rex field=QueueName "^(?<QueueCount>\d+?)\s(?<QueueName>[A-Za-z]+_[A-Za-z]+?)\b" 
| table QueueCount QueueName

rune_hellem
Contributor

You are very close to answering my next issue, since I immediately realized that indexing it all as one event makes it hard, if not impossible, to use the queuecount and queuesize as a key-value pair for alerting. But I found two errors (at least I think it is)

  • There are 25 rows returned, that is 25 queuenames, but your search only returns 12 events
  • The search only return 0 as count, it skips queues with other values.
0 Karma

jplumsdaine22
Influencer

Yes I strongly suggest that break them into seperate events. But its your call.

My search should work fine, but I did make some assumptions - I'm assuming your queue names are all XYZ_Something. If they vary you'll need to play with the regex. Also I just copied the data from this website - the actual formatting may be different in your source data - you may need to play around with the rex commands. The regex101 site that Rich posted is excellent for this.

Here's the output from my search: http://imgur.com/2SY7lDA
alt text

0 Karma

rune_hellem
Contributor

Realized the following

  • Splitting up the search string you provided and adding one "pipe" after fully understanding each part and it did work as expected
  • But I realized that the log format is a drawback for what I want to monitor (queuesizes over time), since the searchstring becomes so complex. So decided to create a script that parses the inital logfile and creates an additional file where each line has the format "timestamp queuename queuesize" (off course the best would have been to do it with the inital logfile, but ... to much effort since I then need to communicate with offshore resources...
0 Karma

rune_hellem
Contributor

Rolling up my sleeves and will dive into the bits and pieces of the search string now to get it working, tnx for the input

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Splunk uses perl regex strings, not ruby. regex101.com is good site for testing regex strings. Also, the rex command will only return the first match unless the max_match option is used. Try this:

... | rex max_match=0 "(?<QueueCount>\d+)\s(?<QueueName>[a-zA-Z_]+)" | ...
---
If this reply helps you, Karma would be appreciated.

rune_hellem
Contributor

Tnx!

That was what I needed.

R.

0 Karma

jaxjohnny2000
Builder

This was perfect!!!!

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...