Splunk Search

How can I extract multiple file names from an event and add it as a separate field using rex command?

Renunaren
Loves-to-Learn Everything

Hi Team,

We have a raw event where the message field consists of multiple file names, we want to extract those and add them as a separate field. Please help us on this. Below is the sample event for reference.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

Below is the sample SPL command used for this purpose.

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

Please help us on this.

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please repost your raw event in a code block </> so that it doesn't get corrupted by formatting 

0 Karma

Renunaren
Loves-to-Learn Everything

HI IT Whisperer,

Thanks for your response. As mentioned by you, below is the raw event.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\\\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\\\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\\\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\\\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\\\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\\\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\\\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\\\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\\\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

I tried to extract the file names like  PAKS_FACT_DWH2_D20220221.okPAKS_UBER_DWH2_D20220221.okHHE_SIT_check_file1.txt.okHHE_SIT_check_file2.txt.okHHE_SIT_check_file3.txt.ok

separately and add them as a separate field using the below query 

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

but this doesn't worked. Please help us on this issue.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

By not putting your event in a code block </> as requested it gets corrupted

ITWhisperer_0-1686746848325.png

Please use this button

ITWhisperer_1-1686746905019.png

to insert your example event

Renunaren
Loves-to-Learn Everything

Hi IT Whisperer,

Thanks for your response. Please look into the sample event below.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\":\"{\",\"1\":\" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\":\" \\\"status\\\": \\\"files arrived\\\"\",\"3\":\" \\\"files\\\": [\",\"4\":\" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" \\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\" \\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\" \\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\" \\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\" \\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\" \\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\":\"}\"}} ", "process": 32633, "processName": "MainProcess"}

Please look into the above code and kindly help us in extracting the file names like mentioned above using rex command.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

First extract the list, then each file

| rex "(?:\"files[\\\\]+\": \[)(?<fileslist>[^\s:]+[^\]]+)"
| rex field=fileslist max_match=0 "(?:[^\s:]+[^\s]+\s[\"\\\]+)(?<files>[^\\\]+)"
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...