Splunk Search

Filter Json events sending only interesting fields to indexer

djluke
Path Finder

Hello Splunkers,
I have an heavy forwarder that receives millions of events in json format. In order to save space and license I'd like to send to indexer only some interesting fields.
I tried different combinations with props and transforms but without success. What I obtained is simply the match of first group of regex and nothing else.

Here below the configuration (I reduced the regex to only 3 fields and the event structure to make it easy), an event example and what is the output now

transforms.conf

[KeepJsonFields]
REGEX = ("Field001":".*?".)|("Field002":".*?".)||("Field003":".*?".)
FORMAT = {$1$2$3}
DEST_KEY = _raw

props.conf

[st_json_fields]
DATETIME_CONFIG = KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = 34
NO_BINARY_CHECK = true
TRANSFORMS-KeepJsonFields = KeepJsonFields
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y%m%d%H%M%S"
TIME_PREFIX = "FieldForTime":"
pulldown_type = true 
TZ = UTC

Event sample

{  
   "Block1":{  
      "Field001":"Value001",
      "Field002":"Value002   ",
      "Field003":1000,
      "Field004":"Value004",
      "Field005":1000,
      "Field00N":1000
   },
   "Block2":{  
      "Field-001":"Value-001",
      "Field-002":"Value-002",
      "Field-003":"Value-003",
      "Field-00N":"Value-004"
   },
   "Block3":{  
      "Field_001":"Value_001",
      "Field_002":"Value_002",
      "Field_003":"Value_003"
   }
  }

Event in splunk

{"Field001":"Value001",}

Here below main problems I've encountered:

  • The regex seems to stop at the first occurence
  • How to manage commas to rebuild the correct json format (is it correct to include them in the capture group?)

Thanks in advance for the help you can give me

0 Karma

woodcock
Esteemed Legend

If you really need to restructure your JSON to eliminate some of the payload, there is no practical way to do that with plain Splunk but you can do this with cribil:
https://www.cribl.io/

Tell them Gregg sent you!

0 Karma

djluke
Path Finder

Hi Gregg,
thanks for the idea. I've installed cribl in the test environment.
Could you help me with some tip to get what i need?
Tnx

0 Karma

woodcock
Esteemed Legend

It is called JSON unpack I think. Maybe @clintsharp or @dritan can help point you in the right direction.

0 Karma

FrankVl
Ultra Champion

Your regex technically only has 1 capture group (3 different options with the | (OR) operator in between, only the first option of those 3 will be chosen) , so that behavior is just as expected.

Can you perhaps share an example of the output you would like to get and explain the logic to get from input to output? Then we can help figure out a solution.

0 Karma

djluke
Path Finder

Hi FrankVI,
thanks for your reply.
The regex part now is clear even if at this point I don't know how to capture both 3 groups. The fact is that not always the desired groups are present in the event.

The expected output should be something like this

{

"Block1":{

"Field001":"Value001",
"Field005":1000,
"Field022":1000
},
"Block2":{

"Field-001":"Value-001"
},
"Block3":{

"Field_003":"Value_003"
}
}

Let's say that it would be nice if also the output would be a json, but I can use a simple key value format too.

Thanks

0 Karma

FrankVl
Ultra Champion

Ok, from that example I can't really determine your required logic. Do you want to drop / keep specific fields and perhaps only in specific blocks?

For dropping a specific field, using a SEDCMD is a lot easier:

in props.conf:

[st_json_fields]
SEDCMD-stripfield003 = "s/"Field003":\d+,?\s*//"

https://regex101.com/r/IgrBzR/2

0 Karma

djluke
Path Finder

Exactly.
but I just know for sure what I want to keep ... and I also know there are cases where needed fields are not present.

Is there a way to keep important fields instead of dropping others?

Thanks again for your time

0 Karma

FrankVl
Ultra Champion

That makes things a lot more difficult. You'd need to write a regex that matches your full event, with capture groups for all the bits you want to keep (making some of them optional) and then gluing it all back together.

0 Karma

djluke
Path Finder

mmmm...yes.... 😞

0 Karma

FrankVl
Ultra Champion

I guess you could use negative lookaheads in the SEDCMD to exclude the lines you want to keep:

[st_json_fields]
 SEDCMD-stripfields = s/"(?!Block\d|Field001|Field002|Field_003)[-\w]+":[^,\r\n]+,?\s*//g

https://regex101.com/r/MXT4jm/1

Note: some serious tuning on this usecase may be needed depending on what your actual raw data looks like (e.g. does that includes newlines and indenting?).

0 Karma

djluke
Path Finder

Hi,
I tried with the exclusions but sometimes it works and sometimes not, depending by the fields that are coming. At the end I would say that the json structure is too complex to be filtered with a regex. In some cases there is a comma before a closure, or a brace/square bracket to add/remove in order to obtain the correct format.

Thanks again for your help

0 Karma