Splunk Search

Need to filter particular job name from the list of job names which were extracted using rex

Loves-to-Learn Everything

Hi Team,

Good day!

We have extracted the set of job names from the event using the below rex query.

index=app_events_dwh2_de_uat _raw=*jobname* | rex max_match=0 "\\\\\\\\\\\\\"jobname\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<Name>[^\\\]+).*?\\\\\\\\\\\\\"status\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<State>ENDED OK).*?Timestamp\\\\\\\\\\\\\": \\\\\\\\\\\\\"(?<TIME>\d+\s*\d+\:\d+\:\d+).*?execution_time_in_seconds\\\\\\\\\\\\\": \\\\\\\\\\\\\"(?<EXECUTION_TIME>[\d\.\-]+)"  | table  "TIME", "Name", "State", "EXECUTION_TIME" | mvexpand TIME | dedup TIME

After using the above query we have obtained the result in the table format like below.

20240417 21:13:23   CONTROL_M_REPORT                ENDED OK         73.14

                    DWHEAP_FW_BHW                   ENDED OK         80.66

                    DWHEAP_FW_TALANX                ENDED OK         80.18

                    DWHEAP_TALANX_LSP_FW_NODATA     ENDED OK         3.25

                    SALES_EVENT_TRANSACTION_RDV     ENDED OK         141.41


Is it possible to extract only the jobs with name consists of string NODATA from the above set of job names. 

Below is the sample event for the above one.

Dataframe row : {"_c0":{"0":"{","1":" \"0\": {","2":" \"jobname\": \"CONTROL_M_REPORT\"","3":" \"status\": \"ENDED OK\"","4":" \"execution_time_in_seconds\": \"46.39\"","5":" \"Timestamp\": \"20240418 12:13:23\"","6":" }","7":" \"1\": {","8":" \"jobname\": \"DWHEAP_FW_AIMA_001\"","9":" \"status\": \"ENDED OK\"","10":" \"execution_time_in_seconds\": \"73.14\"","11":" \"Timestamp\": \"20240418 12:13:23\"","12":" }","13":" \"2\": {","14":" \"jobname\": \"DWHEAP_FW_BHW\"","15":" \"status\": \"ENDED OK\"","16":" \"execution_time_in_seconds\": \"71.19\"","17":" \"Timestamp\": \"20240418 12:13:23\"","18":" }","19":" \"3\": {","20":" \"jobname\": \"DWHEAP_FW_NODATA\"","21":" \"status\": \"ENDED OK\"","22":" \"execution_time_in_seconds\": \"80.63\"","23":" \"Timestamp\": \"20240418 12:13:23\"","24":" }","25":" \"4\": {","26":" \"jobname\": \"DWHEAP_FW_TALANX\"","27":" \"status\": \"ENDED OK\"","28":" \"execution_time_in_seconds\": \"80.20\"","29":" \"Timestamp\": \"20240418 12:13:23\"","30":" }","31":" \"5\": {","32":" \"jobname\": \"DWHEAP_FW_UC4_001\"","33":" \"status\": \"ENDED OK\"","34":" \"execution_time_in_seconds\": \"80.13\"","35":" \"Timestamp\": \"20240418 12:13:23\"","36":" }","37":" \"6\": {","38":" \"jobname\": \"DWHEAP_TALANX_LSP_FW_NODATA\"","39":" \"status\": \"ENDED NOTOK\"","40":" \"execution_time_in_seconds\": \"120.12\"","41":" \"Timestamp\": \"20240418 12:13:23\"","42":" }","43":" \"7\": {","44":" \"jobname\": \"RDV_INFRASTRUCTURE_DETAILS\"","45":" \"status\": \"ENDED OK\"","46":" \"execution_time_in_seconds\": \"81.16\"","47":" \"Timestamp\": \"20240418 12:13:23\"","48":" }","49":" \"8\": {","50":" \"jobname\": \"VIPASNEU_STG\"","51":" \"status\": \"ENDED OK\"","52":" \"execution_time_in_seconds\": \"45.04\"","53":" \"Timestamp\": \"20240418 12:13:23\"","54":" }","55":"}"}}

Please look into this and kindly help us in extraction of the job which contains string NODATA from the above set of job names that has been extracted 



Labels (1)
0 Karma


OK, is the "Dataframe row :" part really a part of the event or just a header you posted before the actual event.

Anyway, it seems like it's a relatively well-formed (unless I'm missing something) json embedded (and escaped) within another json. Possibly prepended with that "Dataframe row :" header.

I'd say just cut the header if applicable, parse the outer json, extract the inner json, split if needed into multiple events, then spath the inner json(s).

And don't use regexes to manipulate structured data unless you really can't avoid it.

0 Karma


You could try extracting each job as a complete event, before extracting the individual fields. You can then filter out the jobs you don't want (btw, your regex seems to have way too many backslashes, but you may need them if your actual data is different to the example you shared)

| rex max_match=0 "(?<job>\\\\\"jobname\\\\\":\s*\\\\\"[^\\\]+.*?\\\\\"status\\\\\":\s*\\\\\"ENDED OK.*?Timestamp\\\\\": \\\\\"\d+\s*\d+\:\d+\:\d+.*?execution_time_in_seconds\\\\\": \\\\\"[\d\.\-]+)"
| mvexpand job
| rex field=job "\\\\\"jobname\\\\\":\s*\\\\\"(?<Name>[^\\\]+).*?\\\\\"status\\\\\":\s*\\\\\"(?<State>ENDED OK).*?Timestamp\\\\\": \\\\\"(?<TIME>\d+\s*\d+\:\d+\:\d+).*?execution_time_in_seconds\\\\\": \\\\\"(?<EXECUTION_TIME>[\d\.\-]+)"
| regex Name!="NODATA"
| table TIME Name State EXECUTION_TIME


0 Karma

Loves-to-Learn Everything

Hi ITWhisperer,

Thank you for your response.

But the query which you have provided is eliminating the job name that contains NODATA string, but we only need that job name that contains NODATA string, rest all jobs, we can eliminate. Kindly help us on this. Thank you

0 Karma


If you examine and try to understand the solution I posted, you will see there is a not equals condition on the regex. Perhaps you could have figured out for yourself that you could simply change not equals to equals!

| regex Name="NODATA"
0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...