Hi Team,
We have a raw event where the message field consists of multiple file names, we want to extract those and add them as a separate field. Please help us on this. Below is the sample event for reference.
{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕" \\\"status\\\": \\\"files arrived\\\"\",\"3\"😕" \\\"files\\\": [\",\"4\"😕" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}
Below is the sample SPL command used for this purpose.
index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"
Please help us on this.
Please repost your raw event in a code block </> so that it doesn't get corrupted by formatting
HI IT Whisperer,
Thanks for your response. As mentioned by you, below is the raw event.
{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕" \\\"status\\\": \\\"files arrived\\\"\",\"3\"😕" \\\"files\\\": [\",\"4\"😕" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" \\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\" \\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\" \\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\" \\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\" \\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\" \\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}
I tried to extract the file names like PAKS_FACT_DWH2_D20220221.ok, PAKS_UBER_DWH2_D20220221.ok, HHE_SIT_check_file1.txt.ok, HHE_SIT_check_file2.txt.ok, HHE_SIT_check_file3.txt.ok
separately and add them as a separate field using the below query
index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"
but this doesn't worked. Please help us on this issue.
By not putting your event in a code block </> as requested it gets corrupted
Please use this button
to insert your example event
Hi IT Whisperer,
Thanks for your response. Please look into the sample event below.
{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\":\"{\",\"1\":\" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\":\" \\\"status\\\": \\\"files arrived\\\"\",\"3\":\" \\\"files\\\": [\",\"4\":\" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" \\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\" \\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\" \\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\" \\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\" \\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\" \\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\":\"}\"}} ", "process": 32633, "processName": "MainProcess"}
Please look into the above code and kindly help us in extracting the file names like mentioned above using rex command.
First extract the list, then each file
| rex "(?:\"files[\\\\]+\": \[)(?<fileslist>[^\s:]+[^\]]+)"
| rex field=fileslist max_match=0 "(?:[^\s:]+[^\s]+\s[\"\\\]+)(?<files>[^\\\]+)"