Splunk Search

Regex extracting field values, but unable to find them when added to search

rturk
Builder

Hi Splunkers & Splunkettes,

I am currently defining some sourcetypes for some db2 SMF logs and have finally got the field extractions working the way I want to via regex. To give you a snippet example:

EXTRACT-db2_header = (?m)^0(?<primauth>.{8})\s(?<connect>.{8})\s

On this event:

0=======================================================================================================
 PRIMAUTH CONNECT  INSTANCE       END_USER      WS_NAME                       TRANSACT                                               
 ORIGAUTH CORRNAME CONNTYPE       RECORD TIME   DESTNO     IFC DESCRIPTION    DATA                                                   
 PLANNAME CORRNMBR                TCB CPU TIME             ID                                                                        
 -------- -------- ------------ -------------------------- --- -------------- --------------------------
0A1B2C3   SERVER   X'123456789012' A12345           ABCD123                       SQLA.exe                                           
0Z9Y8X7   N/A      REMOTE     M 15:46:05        1234567890 140 Audit Auth Failures                                                   
0DISTSERV 'BLANK' 

Gives me the following fields and their values:

primauth = "A1B2C3"
connect = "SERVER"

Now this appears to work fine. When I apply it to a larger sample size I get the following results for the primauth field in the field picker:

Values        #         %
-------------------------
A1B2C3   29,270   99.996%
Z9Y8X7        1    0.003%

Which is excellent because I need to find all instances where the primauth ISN'T 'A1B2C3'. HOWEVER, when I click on the value for 'Z9Y8X7' to add it to the search query, I get no results, despite Splunk telling me there is one value in my data set??? I've tried both:

sourcetype="db2_header" primauth="Z9Y8X7"
sourcetype="db2_header" Z9Y8X7

But both come up with no matches... am I missing something here? I realise that it's a stiatistically insignificant value, but so is a needle in a haystack and that's Splunk's bread & butter.

EDIT: To make matters a little weirder, I DO get the expected values when I enter this:

sourcetype="db2_header" NOT primauth="A1B2C3"

Thanks in advance 🙂

0 Karma
1 Solution

Ayn
Legend

This should help clear some of the confusion, and explain why you're seeing the behaviour you're seeing. http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

View solution in original post

Ayn
Legend

This should help clear some of the confusion, and explain why you're seeing the behaviour you're seeing. http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

rturk
Builder

Thanks Ayn, good to know I was on the right path.

0 Karma

rturk
Builder

Right, I think I've made a little headway (I've gotta get out of the habit of asking questions then answering them myself 10 minutes later).

It has to do with the way Splunk performs it's searching. The search function appears to work only from non alpha-numeric boundaries. Even though I've specified my regex to ignore the leading zero in the value for primauth, this doesn't fly for the search function as it will always try to match a search with raw data so while a seach for:

primauth="Z9Y8X7"

won't work, a search for:

primauth="*Z9Y8X7"

WILL work as the search function needs to deal with the leading zero, even if the rex doesn't.

To complicate matters, any NOT search I declare seems to take the primauth value AFTER regex extraction, hence the:

sourcetype="db2_header" NOT primauth="A1B2C3"

DOES work the way you'd expect.

Not particularly intuitive, but good to know & understand. Hope this helps someone out 🙂

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...