Hello Splunkers,
I have an heavy forwarder that receives millions of events in json format. In order to save space and license I'd like to send to indexer only some interesting fields.
I tried different combinations with props and transforms but without success. What I obtained is simply the match of first group of regex and nothing else.
Here below the configuration (I reduced the regex to only 3 fields and the event structure to make it easy), an event example and what is the output now
transforms.conf
[KeepJsonFields]
REGEX = ("Field001":".*?".)|("Field002":".*?".)||("Field003":".*?".)
FORMAT = {$1$2$3}
DEST_KEY = _raw
props.conf
[st_json_fields]
DATETIME_CONFIG = KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = 34
NO_BINARY_CHECK = true
TRANSFORMS-KeepJsonFields = KeepJsonFields
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y%m%d%H%M%S"
TIME_PREFIX = "FieldForTime":"
pulldown_type = true
TZ = UTC
Event sample
{
"Block1":{
"Field001":"Value001",
"Field002":"Value002 ",
"Field003":1000,
"Field004":"Value004",
"Field005":1000,
"Field00N":1000
},
"Block2":{
"Field-001":"Value-001",
"Field-002":"Value-002",
"Field-003":"Value-003",
"Field-00N":"Value-004"
},
"Block3":{
"Field_001":"Value_001",
"Field_002":"Value_002",
"Field_003":"Value_003"
}
}
Event in splunk
{"Field001":"Value001",}
Here below main problems I've encountered:
Thanks in advance for the help you can give me
If you really need to restructure your JSON to eliminate some of the payload, there is no practical way to do that with plain Splunk but you can do this with cribil:
https://www.cribl.io/
Tell them Gregg sent you!
Hi Gregg,
thanks for the idea. I've installed cribl in the test environment.
Could you help me with some tip to get what i need?
Tnx
It is called JSON unpack
I think. Maybe @clintsharp or @dritan can help point you in the right direction.
Your regex technically only has 1 capture group (3 different options with the |
(OR) operator in between, only the first option of those 3 will be chosen) , so that behavior is just as expected.
Can you perhaps share an example of the output you would like to get and explain the logic to get from input to output? Then we can help figure out a solution.
Hi FrankVI,
thanks for your reply.
The regex part now is clear even if at this point I don't know how to capture both 3 groups. The fact is that not always the desired groups are present in the event.
The expected output should be something like this
{
"Block1":{
"Field001":"Value001",
"Field005":1000,
"Field022":1000
},
"Block2":{
"Field-001":"Value-001"
},
"Block3":{
"Field_003":"Value_003"
}
}
Let's say that it would be nice if also the output would be a json, but I can use a simple key value format too.
Thanks
Ok, from that example I can't really determine your required logic. Do you want to drop / keep specific fields and perhaps only in specific blocks?
For dropping a specific field, using a SEDCMD is a lot easier:
in props.conf:
[st_json_fields]
SEDCMD-stripfield003 = "s/"Field003":\d+,?\s*//"
Exactly.
but I just know for sure what I want to keep ... and I also know there are cases where needed fields are not present.
Is there a way to keep important fields instead of dropping others?
Thanks again for your time
That makes things a lot more difficult. You'd need to write a regex that matches your full event, with capture groups for all the bits you want to keep (making some of them optional) and then gluing it all back together.
mmmm...yes.... 😞
I guess you could use negative lookaheads in the SEDCMD to exclude the lines you want to keep:
[st_json_fields]
SEDCMD-stripfields = s/"(?!Block\d|Field001|Field002|Field_003)[-\w]+":[^,\r\n]+,?\s*//g
https://regex101.com/r/MXT4jm/1
Note: some serious tuning on this usecase may be needed depending on what your actual raw data looks like (e.g. does that includes newlines and indenting?).
Hi,
I tried with the exclusions but sometimes it works and sometimes not, depending by the fields that are coming. At the end I would say that the json structure is too complex to be filtered with a regex. In some cases there is a comma before a closure, or a brace/square bracket to add/remove in order to obtain the correct format.
Thanks again for your help