Splunk Search

Why is my regex not matching for a multivalue field?

spike021
Explorer

I looked through quite a few posts on here and couldn't find an appropriate answer, so please bare with me.

I have events coming into Splunk in JSON format. The top-level fields are extracted fine. However, a nested map/dictionary is giving me issues. When I run a search to get the values from that inner dictionary, it works in that I get a resulting table like:

 A       B
---     ---   
 x       y
         z
         y
         z

 s       m
         n

 u       -  (- means None)

So, the y and z both belong to x and occasionally there are more than 2 items per each x. This happens for any x in A.

Since the cell in the table makes the values in B look separated by a newline, I created a regular expression that I've verified to correctly grab the logical groups for each y and z, if, for instance, they were just in a text box like this:

y
z
y
z
y
z

So the regex would properly grab the two as many times necessary, separately.

What I want to do is pull out each pair and separate the two items into two new fields, say C and D, and then later have a table where I have C and D grouped to field A.

The regex part of the command:

rex field="A{}{}"  "(?<C>[\da-z\.-]+\.[a-z\.]{2,6})\n(?<D>\d{1,3})"

Note: the A{}{} together makes up the multivalue field, B, and A is just A as in the earlier part of my example.

The issue I'm running into is that when I pipe what should be the output from that statement into the table command, I don't get anything.. The regex is definitely confirmed working on a site like http://regexr.com/ just for sanity-checking.

So there must be something I'm missing. Maybe the initial table with my example just looks like newlines separate the two values into rows when it doesn't. In which case I tried using a \s as the separator rather than \n and it still doesn't work.

Or maybe there's a super simple explanation for an obvious mistake I'm making.

Regardless I would appreciate some help very much.

Thanks in advance.

0 Karma
1 Solution

alemarzu
Motivator

Spike,

Are this results acceptable for you ?

http://postimg.org/image/5ofc2b29v/

View solution in original post

alemarzu
Motivator

I just realized that the regex you gave us has an invalid structure. Do you mind sharing a sample data so I can build the proper regex ?

spike021
Explorer

I just added a comment with a sample event.

0 Karma

jkat54
SplunkTrust
SplunkTrust

It only turns invalid when he quotes like this versus

  like this<><\><><><><><><><

But yes, please provide a sample event.

0 Karma

spike021
Explorer

Odd formatting.

So a typical event looks something like this. Priority is to get the keys from the "IMPORTANT" dictionary, but values as well in their own field would be very useful if I could get this to work properly

{
    "timestamp": "2016-01-21T14:44:28", 
    "SOME_FIELD": "etc.",
    "ANOTHER_FIELD": "...", 
    "IMPORTANT": {
        "a_string": 3,
        "another_strong": 44,
        "maybe_another...":95
    }, 
    "test": [
        [
            "something", 
            1.0
        ]
    ]
}
0 Karma

alemarzu
Motivator

U were right, thx 😉

0 Karma

jkat54
SplunkTrust
SplunkTrust

Please provide a full search or at least the table command you are using.

0 Karma

spike021
Explorer

Mentioned it below, but it looks something like: index="myindex" | rex max_match=0 field="A" "(?[\da-z\.-]+\.[a-z\.]{2,6})\n(?\d{1,3})" | table "A", "C", "D"

So nothing particularly complicated, just to get data output, which isn't happening at all yet.

0 Karma

jkat54
SplunkTrust
SplunkTrust
   index="myindex"| rex max_match=0 field="A"  "(?<C>[\da-z\.-]+\.[a-z\.]{2,6})\n(?<D>\d{1,3})" | table "A", "C", "D"
0 Karma

jkat54
SplunkTrust
SplunkTrust

So you're looking for a new line in field A? whats the \n for? Are you turning the JSON into one large event using should_linemerge=true? Are you using KV_MODE=JSON? A sample event and your props/transforms would be most helpful.

0 Karma

spike021
Explorer

My props/transforms are default right now since it seemed like Splunk could already pull out the top-level fields, as mentioned in my other comment a moment ago.

Maybe that's the problem.

0 Karma

spike021
Explorer

So I actually added an example event that you might have missed.

{
     "timestamp": "2016-01-21T14:44:28", 
     "SOME_FIELD": "etc.",
     "ANOTHER_FIELD": "...", 
     "IMPORTANT": {
         "a_string": 3,
         "another_strong": 44,
         "maybe_another...":95
     }, 
     "test": [
         [
             "something", 
             1.0
         ]
     ]
 }

Originally my idea just for the absolute minimum (to at least show I'm able to retrieve that part of the JSON data) was to use a | table "timestamp", IMPORTANT.key, IMPORTANT.values.

Maybe that isn't a good way to go about this?

Splunk already recognizes the rest of the fields at the top level. So if I do | table "timestamp", "ANOTHER_FIELD" then it works fine

0 Karma
Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...