Splunk Search

Why is my regex not matching for a multivalue field?

spike021
Explorer

I looked through quite a few posts on here and couldn't find an appropriate answer, so please bare with me.

I have events coming into Splunk in JSON format. The top-level fields are extracted fine. However, a nested map/dictionary is giving me issues. When I run a search to get the values from that inner dictionary, it works in that I get a resulting table like:

 A       B
---     ---   
 x       y
         z
         y
         z

 s       m
         n

 u       -  (- means None)

So, the y and z both belong to x and occasionally there are more than 2 items per each x. This happens for any x in A.

Since the cell in the table makes the values in B look separated by a newline, I created a regular expression that I've verified to correctly grab the logical groups for each y and z, if, for instance, they were just in a text box like this:

y
z
y
z
y
z

So the regex would properly grab the two as many times necessary, separately.

What I want to do is pull out each pair and separate the two items into two new fields, say C and D, and then later have a table where I have C and D grouped to field A.

The regex part of the command:

rex field="A{}{}"  "(?<C>[\da-z\.-]+\.[a-z\.]{2,6})\n(?<D>\d{1,3})"

Note: the A{}{} together makes up the multivalue field, B, and A is just A as in the earlier part of my example.

The issue I'm running into is that when I pipe what should be the output from that statement into the table command, I don't get anything.. The regex is definitely confirmed working on a site like http://regexr.com/ just for sanity-checking.

So there must be something I'm missing. Maybe the initial table with my example just looks like newlines separate the two values into rows when it doesn't. In which case I tried using a \s as the separator rather than \n and it still doesn't work.

Or maybe there's a super simple explanation for an obvious mistake I'm making.

Regardless I would appreciate some help very much.

Thanks in advance.

0 Karma
1 Solution

alemarzu
Motivator

Spike,

Are this results acceptable for you ?

http://postimg.org/image/5ofc2b29v/

View solution in original post

alemarzu
Motivator

I just realized that the regex you gave us has an invalid structure. Do you mind sharing a sample data so I can build the proper regex ?

spike021
Explorer

I just added a comment with a sample event.

0 Karma

jkat54
SplunkTrust
SplunkTrust

It only turns invalid when he quotes like this versus

  like this<><\><><><><><><><

But yes, please provide a sample event.

0 Karma

spike021
Explorer

Odd formatting.

So a typical event looks something like this. Priority is to get the keys from the "IMPORTANT" dictionary, but values as well in their own field would be very useful if I could get this to work properly

{
    "timestamp": "2016-01-21T14:44:28", 
    "SOME_FIELD": "etc.",
    "ANOTHER_FIELD": "...", 
    "IMPORTANT": {
        "a_string": 3,
        "another_strong": 44,
        "maybe_another...":95
    }, 
    "test": [
        [
            "something", 
            1.0
        ]
    ]
}
0 Karma

alemarzu
Motivator

U were right, thx 😉

0 Karma

jkat54
SplunkTrust
SplunkTrust

Please provide a full search or at least the table command you are using.

0 Karma

spike021
Explorer

Mentioned it below, but it looks something like: index="myindex" | rex max_match=0 field="A" "(?[\da-z\.-]+\.[a-z\.]{2,6})\n(?\d{1,3})" | table "A", "C", "D"

So nothing particularly complicated, just to get data output, which isn't happening at all yet.

0 Karma

jkat54
SplunkTrust
SplunkTrust
   index="myindex"| rex max_match=0 field="A"  "(?<C>[\da-z\.-]+\.[a-z\.]{2,6})\n(?<D>\d{1,3})" | table "A", "C", "D"
0 Karma

jkat54
SplunkTrust
SplunkTrust

So you're looking for a new line in field A? whats the \n for? Are you turning the JSON into one large event using should_linemerge=true? Are you using KV_MODE=JSON? A sample event and your props/transforms would be most helpful.

0 Karma

spike021
Explorer

My props/transforms are default right now since it seemed like Splunk could already pull out the top-level fields, as mentioned in my other comment a moment ago.

Maybe that's the problem.

0 Karma

spike021
Explorer

So I actually added an example event that you might have missed.

{
     "timestamp": "2016-01-21T14:44:28", 
     "SOME_FIELD": "etc.",
     "ANOTHER_FIELD": "...", 
     "IMPORTANT": {
         "a_string": 3,
         "another_strong": 44,
         "maybe_another...":95
     }, 
     "test": [
         [
             "something", 
             1.0
         ]
     ]
 }

Originally my idea just for the absolute minimum (to at least show I'm able to retrieve that part of the JSON data) was to use a | table "timestamp", IMPORTANT.key, IMPORTANT.values.

Maybe that isn't a good way to go about this?

Splunk already recognizes the rest of the fields at the top level. So if I do | table "timestamp", "ANOTHER_FIELD" then it works fine

0 Karma
Get Updates on the Splunk Community!

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

New Release | Splunk Cloud Platform 10.1.2507

Hello Splunk Community!We are thrilled to announce the General Availability of Splunk Cloud Platform 10.1.2507 ...

🌟 From Audit Chaos to Clarity: Welcoming Audit Trail v2

&#x1f5e3; You Spoke, We Listened  Audit Trail v2 wasn’t written in isolation—it was shaped by your voices.  In ...