Dataframe row : {"_c0":{"0":"[","1":" {","2":" \"table_name\": \"pc_dwh_rdv.gdh_ls2lo_s99\"","3":" \"deleted_count\": 18","4":" \"redelivered_count\": 0","5":" \"load_date\": \"2023-07-27\"","6":" }","7":" {","8":" \"table_name\": \"pc_dwh_rdv.gdh_spar_s99\"","9":" \"deleted_count\": 8061","10":" \"redelivered_count\": 1","11":" \"load_date\": \"2023-07-27\"","12":" }","13":" {","14":" \"table_name\": \"pc_dwh_rdv.gdh_tf3tx_s99\"","15":" \"deleted_count\": 366619","16":" \"redelivered_count\": 0","17":" \"load_date\": \"2023-07-27\"","18":" }","19":" {","20":" \"table_name\": \"pc_dwh_rdv.gdh_wechsel_s99\"","21":" \"deleted_count\": 2","22":" \"redelivered_count\": 0","23":" \"load_date\": \"2023-07-27\"","24":" }","25":" {","26":" \"table_name\": \"pc_dwh_rdv.gdh_revolvingcreditcard_s99\"","27":" \"deleted_count\": 1285","28":" \"redelivered_count\": 0","29":" \"load_date\": \"2023-07-27\"","30":" }","31":" {","32":" \"table_name\": \"pc_dwh_rdv.gdh_phd_s99\"","33":" \"deleted_count\": 2484","34":" \"redelivered_count\": 204","35":" \"load_date\": \"2023-07-27\"","36":" }","37":" {","38":" \"table_name\": \"pc_dwh_rdv.gdh_npk_s99\"","39":" \"deleted_count\": 1705","40":" \"redelivered_count\": 0","41":" \"load_date\": \"2023-07-27\"","42":" }","43":" {","44":" \"table_name\": \"pc_dwh_rdv.gdh_npk_s98\"","45":" \"deleted_count\": 1517","46":" \"redelivered_count\": 0","47":" \"load_date\": \"2023-07-27\"","48":" }","49":" {","50":" \"table_name\": \"pc_dwh_rdv.gdh_kontokorrent_s99\"","51":" \"deleted_count\": 12998","52":" \"redelivered_count\": 0","53":" \"load_date\": \"2023-07-27\"","54":" }","55":" {","56":" \"table_name\": \"pc_dwh_rdv.gdh_gds_s99\"","57":" \"deleted_count\": 13","58":" \"redelivered_count\": 0","59":" \"load_date\": \"2023-07-27\"","60":" }","61":" {","62":" \"table_name\": \"pc_dwh_rdv.gdh_dszins_s99\"","63":" \"deleted_count\": 57","64":" \"redelivered_count\": 0","65":" \"load_date\": \"2023-07-27\"","66":" }","67":" {","68":" \"table_name\": \"pc_dwh_rdv_gdh_monat.gdh_phd_izr_monthly_s99\"","69":" \"deleted_count\": 1315","70":" \"redelivered_count\": 0","71":" \"load_date\": \"2023-07-27\"","72":" }","73":"]"}}
The above is the sample message of an event which we have in splunk we want to extract the deleted count values like "1315", "57", "13" etc and add those values as a separate fields using rex command .
Also from the above message we want to extract load_date value such as 2023-07-27 and add that value as a separate field. Please help us in this.
As @yuanliu already said - this is a very strange piece of data. Something "json-like" containing a string which is again json-like.
1. Even if it was a proper json structure, the string field values are just strings. So you'd have to do multiple spath passes to parse it all properly.
2. Don't fiddle with regexes around structured data. You'll end up accounting for so many border cases that will make your head spin. It's not worth it.
Hi @Renunaren ,
this seems to be a json format so ypu could extract fields using INDEXED_EXTRACTIONS=json in props.conf os the spath command.
Anyway, if you want to use a regex, You can use these two regexes:
| rex max_match=0 "deleted_count\\\":\s+(?<deleted_count>\d+)"
| rex max_match=0 "load_date\\\":\s+\\\"(?<load_date>[^\\]+)"
that you can test at https://regex101.com/r/w1XGFx/1 and https://regex101.com/r/w1XGFx/2
Ciao.
Giuseppe
Hi Giuseppe,
Thanks for your response. We have tried using the mentioned REGEX commands but it hasn't worked here. In our environment we are not authorized to modify props.conf configuration files. Below are some of the snippets that will mention the result.
Please look into this and help us on this.
hi @Renunaren,
did you try the "spath" command (https://docs.splunk.com/Documentation/Splunk/9.1.0/SearchReference/Spath) ?
The problem wit backslashes sometimes is solved adding a fourth backslash.
Ciao.
Giuseppe
Hi Giuseppe,
Thanks for your response. One of our team member has tried using the SPATH command. But he was unable to extract the required data, so we want to extract the data using rex command only. Please help us in this issue.
This data format is indeed very intriguing. Each of the JSON node represents a fragment of a quasi-JSON object. You will have to first extract the compliant JSON, then reassemble the fragments. But according to the sample data you posted, the reassembled string resembles JSON but is not compliant. So, you fix syntax errors in the assembly. Here is one way to achieve that. But note: each message contains many values. So I need to apply mvexpand to the reassembled JSON array.
| eval message = replace(message, "Dataframe row : ", "")
| spath input=message
| foreach c0.*
[eval assemble = mvappend(assemble, (printf("%04d", <<MATCHSTR>>) . "<>" . '<<FIELD>>'))]
| fields - _ c0.*
| eval assemble = mvsort(assemble)
| eval message = ""
| foreach assemble mode=multivalue
[eval message = message . "," . replace(<<ITEM>>, ".+<>", "")]
| rex field=message mode=sed "s/,\[,\s*/[/ s/{,/{/g s/,\s*}/}/g s/,]/]/"
| spath input=message path={}
| mvexpand {}
| spath input={}
As you can see, the reverse engineering is maddening. If you have influence on developers who rendered these logs, beg them to use JSON array for c0 instead of subnodes c0.*; if c0.* subnodes must be used, name them in lexicographic order; also to add commas in the right places inside escaped quotes.
Anyway, your sample data would give
deleted_count | load_date | redelivered_count | table_name |
18 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_ls2lo_s99 |
8061 | 2023-07-27 | 1 | pc_dwh_rdv.gdh_spar_s99 |
366619 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_tf3tx_s99 |
2 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_wechsel_s99 |
1285 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_revolvingcreditcard_s99 |
2484 | 2023-07-27 | 204 | pc_dwh_rdv.gdh_phd_s99 |
1705 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_npk_s99 |
1517 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_npk_s98 |
12998 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_kontokorrent_s99 |
13 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_gds_s99 |
57 | 2023-07-27 | 0 | pc_dwh_rdv.gdh_dszins_s99 |
1315 | 2023-07-27 | 0 | pc_dwh_rdv_gdh_monat.gdh_phd_izr_monthly_s99 |
Here is a data emulation you can play with and compare with real data
| makeresults
| eval message = "Dataframe row : {\"_c0\":{\"0\":\"[\",\"1\":\" {\",\"2\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_ls2lo_s99\\\"\",\"3\":\" \\\"deleted_count\\\": 18\",\"4\":\" \\\"redelivered_count\\\": 0\",\"5\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"6\":\" }\",\"7\":\" {\",\"8\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_spar_s99\\\"\",\"9\":\" \\\"deleted_count\\\": 8061\",\"10\":\" \\\"redelivered_count\\\": 1\",\"11\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"12\":\" }\",\"13\":\" {\",\"14\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_tf3tx_s99\\\"\",\"15\":\" \\\"deleted_count\\\": 366619\",\"16\":\" \\\"redelivered_count\\\": 0\",\"17\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"18\":\" }\",\"19\":\" {\",\"20\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_wechsel_s99\\\"\",\"21\":\" \\\"deleted_count\\\": 2\",\"22\":\" \\\"redelivered_count\\\": 0\",\"23\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"24\":\" }\",\"25\":\" {\",\"26\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_revolvingcreditcard_s99\\\"\",\"27\":\" \\\"deleted_count\\\": 1285\",\"28\":\" \\\"redelivered_count\\\": 0\",\"29\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"30\":\" }\",\"31\":\" {\",\"32\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_phd_s99\\\"\",\"33\":\" \\\"deleted_count\\\": 2484\",\"34\":\" \\\"redelivered_count\\\": 204\",\"35\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"36\":\" }\",\"37\":\" {\",\"38\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_npk_s99\\\"\",\"39\":\" \\\"deleted_count\\\": 1705\",\"40\":\" \\\"redelivered_count\\\": 0\",\"41\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"42\":\" }\",\"43\":\" {\",\"44\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_npk_s98\\\"\",\"45\":\" \\\"deleted_count\\\": 1517\",\"46\":\" \\\"redelivered_count\\\": 0\",\"47\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"48\":\" }\",\"49\":\" {\",\"50\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_kontokorrent_s99\\\"\",\"51\":\" \\\"deleted_count\\\": 12998\",\"52\":\" \\\"redelivered_count\\\": 0\",\"53\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"54\":\" }\",\"55\":\" {\",\"56\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_gds_s99\\\"\",\"57\":\" \\\"deleted_count\\\": 13\",\"58\":\" \\\"redelivered_count\\\": 0\",\"59\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"60\":\" }\",\"61\":\" {\",\"62\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_dszins_s99\\\"\",\"63\":\" \\\"deleted_count\\\": 57\",\"64\":\" \\\"redelivered_count\\\": 0\",\"65\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"66\":\" }\",\"67\":\" {\",\"68\":\" \\\"table_name\\\": \\\"pc_dwh_rdv_gdh_monat.gdh_phd_izr_monthly_s99\\\"\",\"69\":\" \\\"deleted_count\\\": 1315\",\"70\":\" \\\"redelivered_count\\\": 0\",\"71\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"72\":\" }\",\"73\":\"]\"}}"
``` data emulation above ```
Hi @Renunaren,
it's strange that the spath comand doesn't run and the use of INDEXED_EXTRACTIONS at source level is always the best solution, anyway sometimes there's an issue on regexes with backslash, so please try this:
| rex max_match=0 "deleted_count\\\\":\s+(?<deleted_count>\d+)"
| rex max_match=0 "load_date\\\\":\s+\\\\"(?<load_date>[^\\]+)"
Ciao.
Giuseppe
Let me interject here a bit.
Indexed extractions are _not_ always the best solution. Don't mistake indexed extractions with auto-kv. Indexed extractions are run during the indexing process and extract the fields permanently whereas auto-kv extract the fields in search-time.