Solved: Regex on multiline event - how to match multiple o...

rune_hellem · ‎01-06-2016

About the source

I have a SQL report scheduled every 15 minute reporting the status of queues in our case handler system. Splunk is instructed to read all as one event - so when searching in Splunk the event is returned like this

TO_CHAR(SYSDATE,'                                                               
-----------------                                                               
01062016 09:00:00                                                               
HANGING_WOBS_COUNT QUEUE                                                        
------------------ -------------------------                                    
                 2 LIV_InitialCheck                                             
                 1 LIV_InitialCreate                                            
                 1 LIV_LB                                                       
                 0 BPF_Operations                                               
                 0 CE_Operations                                                
                 0 LIV_AttachmentMarkDeleted                                    
                 0 LIV_DeleteIndeksCase                                         
                 0 LIV_InitialLookup                                            
                 0 LIV_InitialMerge                                             
...
25 rows selected.

If I open the source file in Notepad++ and view all characters it looks like this

What I have done

Using Rubular.com created and verified the following regex ^\s{3,}(?<QueueCount>\d+)\s(?<QueueName>\w+) It captures each queuesize and -name
Verified the same regex using the field extractor . It captures only the first queue LIV_InitialCheck
Testing the same regex in the search field ... | rex "^\s{3,}(?<QueueCount>\d+)\s(?<QueueName>\w+)" matches nothing
Prefixing the regex with (?m) makes it match the first occurence (same as #2), but not the rest.

So what does my regex miss in order for Splunk to capture all occurences the same way as Rubular does?

richgalloway · ‎01-06-2016

Splunk uses perl regex strings, not ruby. regex101.com is good site for testing regex strings. Also, the rex command will only return the first match unless the max_match option is used. Try this:

... | rex max_match=0 "(?<QueueCount>\d+)\s(?<QueueName>[a-zA-Z_]+)" | ...

---
If this reply helps you, Karma would be appreciated.

View solution in original post

jplumsdaine22 · ‎01-06-2016

Aww Rich beat me to it. But this may also work for you:

| rex "-------------------------(?<QueueName>[\s\S]+)$" 
| makemv tokenizer="(?<token>\d\s[A-Za-z]+_[A-Za-z]+?)\b" QueueName
| mvexpand QueueName
| rex field=QueueName "^(?<QueueCount>\d+?)\s(?<QueueName>[A-Za-z]+_[A-Za-z]+?)\b" 
| table QueueCount QueueName

rune_hellem · ‎01-06-2016

You are very close to answering my next issue, since I immediately realized that indexing it all as one event makes it hard, if not impossible, to use the queuecount and queuesize as a key-value pair for alerting. But I found two errors (at least I think it is)

There are 25 rows returned, that is 25 queuenames, but your search only returns 12 events
The search only return 0 as count, it skips queues with other values.

jplumsdaine22 · ‎01-06-2016

Yes I strongly suggest that break them into seperate events. But its your call.

My search should work fine, but I did make some assumptions - I'm assuming your queue names are all XYZ_Something. If they vary you'll need to play with the regex. Also I just copied the data from this website - the actual formatting may be different in your source data - you may need to play around with the rex commands. The regex101 site that Rich posted is excellent for this.

Here's the output from my search: http://imgur.com/2SY7lDA

rune_hellem · ‎01-08-2016

Realized the following

Splitting up the search string you provided and adding one "pipe" after fully understanding each part and it did work as expected
But I realized that the log format is a drawback for what I want to monitor (queuesizes over time), since the searchstring becomes so complex. So decided to create a script that parses the inital logfile and creates an additional file where each line has the format "timestamp queuename queuesize" (off course the best would have been to do it with the inital logfile, but ... to much effort since I then need to communicate with offshore resources...

rune_hellem · ‎01-08-2016

Rolling up my sleeves and will dive into the bits and pieces of the search string now to get it working, tnx for the input

richgalloway · ‎01-06-2016

Splunk uses perl regex strings, not ruby. regex101.com is good site for testing regex strings. Also, the rex command will only return the first match unless the max_match option is used. Try this:

... | rex max_match=0 "(?<QueueCount>\d+)\s(?<QueueName>[a-zA-Z_]+)" | ...

---
If this reply helps you, Karma would be appreciated.

rune_hellem · ‎01-06-2016

Tnx!

That was what I needed.

R.

jaxjohnny2000 · ‎10-02-2019

This was perfect!!!!

Regex on multiline event - how to match multiple occurences of a matching group?

About the source

What I have done

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Regex on multiline event - how to match multiple occurences of a matching group?

About the source

What I have done

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...