Splunk Search

How can I extract multiple file names from an event and add it as a separate field using rex command?

Renunaren
Loves-to-Learn Everything

Hi Team,

We have a raw event where the message field consists of multiple file names, we want to extract those and add them as a separate field. Please help us on this. Below is the sample event for reference.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

Below is the sample SPL command used for this purpose.

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

Please help us on this.

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please repost your raw event in a code block </> so that it doesn't get corrupted by formatting 

0 Karma

Renunaren
Loves-to-Learn Everything

HI IT Whisperer,

Thanks for your response. As mentioned by you, below is the raw event.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\\\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\\\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\\\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\\\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\\\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\\\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\\\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\\\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\\\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

I tried to extract the file names like  PAKS_FACT_DWH2_D20220221.okPAKS_UBER_DWH2_D20220221.okHHE_SIT_check_file1.txt.okHHE_SIT_check_file2.txt.okHHE_SIT_check_file3.txt.ok

separately and add them as a separate field using the below query 

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

but this doesn't worked. Please help us on this issue.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

By not putting your event in a code block </> as requested it gets corrupted

ITWhisperer_0-1686746848325.png

Please use this button

ITWhisperer_1-1686746905019.png

to insert your example event

Renunaren
Loves-to-Learn Everything

Hi IT Whisperer,

Thanks for your response. Please look into the sample event below.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\":\"{\",\"1\":\" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\":\" \\\"status\\\": \\\"files arrived\\\"\",\"3\":\" \\\"files\\\": [\",\"4\":\" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" \\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\" \\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\" \\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\" \\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\" \\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\" \\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\":\"}\"}} ", "process": 32633, "processName": "MainProcess"}

Please look into the above code and kindly help us in extracting the file names like mentioned above using rex command.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

First extract the list, then each file

| rex "(?:\"files[\\\\]+\": \[)(?<fileslist>[^\s:]+[^\]]+)"
| rex field=fileslist max_match=0 "(?:[^\s:]+[^\s]+\s[\"\\\]+)(?<files>[^\\\]+)"
0 Karma
Get Updates on the Splunk Community!

Community Content Calendar, August edition

In the dynamic world of cybersecurity, staying ahead means constantly solving new puzzles and optimizing your ...

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Heading to your first .Conf? You’re in for an unforgettable ride — learning, networking, swag collecting, ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Whether you're managing complex deployments or looking to future-proof your data infrastructure, this session ...