Need to filter particular job name from the list o...

Renunaren · ‎04-18-2024

Hi Team,

Good day!

We have extracted the set of job names from the event using the below rex query.

index=app_events_dwh2_de_uat _raw=*jobname* | rex max_match=0 "\\\\\\\\\\\\\"jobname\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<Name>[^\\\]+).*?\\\\\\\\\\\\\"status\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<State>ENDED OK).*?Timestamp\\\\\\\\\\\\\": \\\\\\\\\\\\\"(?<TIME>\d+\s*\d+\:\d+\:\d+).*?execution_time_in_seconds\\\\\\\\\\\\\": \\\\\\\\\\\\\"(?<EXECUTION_TIME>[\d\.\-]+)"  | table  "TIME", "Name", "State", "EXECUTION_TIME" | mvexpand TIME | dedup TIME

After using the above query we have obtained the result in the table format like below.

20240417 21:13:23   CONTROL_M_REPORT                ENDED OK         73.14

                    DWHEAP_FW_BHW                   ENDED OK         80.66

                    DWHEAP_FW_TALANX                ENDED OK         80.18

                    DWHEAP_TALANX_LSP_FW_NODATA     ENDED OK         3.25

                    SALES_EVENT_TRANSACTION_RDV     ENDED OK         141.41

Is it possible to extract only the jobs with name consists of string NODATA from the above set of job names.

Below is the sample event for the above one.

Dataframe row : {"_c0":{"0":"{","1":" \"0\": {","2":" \"jobname\": \"CONTROL_M_REPORT\"","3":" \"status\": \"ENDED OK\"","4":" \"execution_time_in_seconds\": \"46.39\"","5":" \"Timestamp\": \"20240418 12:13:23\"","6":" }","7":" \"1\": {","8":" \"jobname\": \"DWHEAP_FW_AIMA_001\"","9":" \"status\": \"ENDED OK\"","10":" \"execution_time_in_seconds\": \"73.14\"","11":" \"Timestamp\": \"20240418 12:13:23\"","12":" }","13":" \"2\": {","14":" \"jobname\": \"DWHEAP_FW_BHW\"","15":" \"status\": \"ENDED OK\"","16":" \"execution_time_in_seconds\": \"71.19\"","17":" \"Timestamp\": \"20240418 12:13:23\"","18":" }","19":" \"3\": {","20":" \"jobname\": \"DWHEAP_FW_NODATA\"","21":" \"status\": \"ENDED OK\"","22":" \"execution_time_in_seconds\": \"80.63\"","23":" \"Timestamp\": \"20240418 12:13:23\"","24":" }","25":" \"4\": {","26":" \"jobname\": \"DWHEAP_FW_TALANX\"","27":" \"status\": \"ENDED OK\"","28":" \"execution_time_in_seconds\": \"80.20\"","29":" \"Timestamp\": \"20240418 12:13:23\"","30":" }","31":" \"5\": {","32":" \"jobname\": \"DWHEAP_FW_UC4_001\"","33":" \"status\": \"ENDED OK\"","34":" \"execution_time_in_seconds\": \"80.13\"","35":" \"Timestamp\": \"20240418 12:13:23\"","36":" }","37":" \"6\": {","38":" \"jobname\": \"DWHEAP_TALANX_LSP_FW_NODATA\"","39":" \"status\": \"ENDED NOTOK\"","40":" \"execution_time_in_seconds\": \"120.12\"","41":" \"Timestamp\": \"20240418 12:13:23\"","42":" }","43":" \"7\": {","44":" \"jobname\": \"RDV_INFRASTRUCTURE_DETAILS\"","45":" \"status\": \"ENDED OK\"","46":" \"execution_time_in_seconds\": \"81.16\"","47":" \"Timestamp\": \"20240418 12:13:23\"","48":" }","49":" \"8\": {","50":" \"jobname\": \"VIPASNEU_STG\"","51":" \"status\": \"ENDED OK\"","52":" \"execution_time_in_seconds\": \"45.04\"","53":" \"Timestamp\": \"20240418 12:13:23\"","54":" }","55":"}"}}

Please look into this and kindly help us in extraction of the job which contains string NODATA from the above set of job names that has been extracted

PickleRick · ‎04-18-2024

OK, is the "Dataframe row :" part really a part of the event or just a header you posted before the actual event.

Anyway, it seems like it's a relatively well-formed (unless I'm missing something) json embedded (and escaped) within another json. Possibly prepended with that "Dataframe row :" header.

I'd say just cut the header if applicable, parse the outer json, extract the inner json, split if needed into multiple events, then spath the inner json(s).

And don't use regexes to manipulate structured data unless you really can't avoid it.

ITWhisperer · ‎04-18-2024

You could try extracting each job as a complete event, before extracting the individual fields. You can then filter out the jobs you don't want (btw, your regex seems to have way too many backslashes, but you may need them if your actual data is different to the example you shared)

| rex max_match=0 "(?<job>\\\\\"jobname\\\\\":\s*\\\\\"[^\\\]+.*?\\\\\"status\\\\\":\s*\\\\\"ENDED OK.*?Timestamp\\\\\": \\\\\"\d+\s*\d+\:\d+\:\d+.*?execution_time_in_seconds\\\\\": \\\\\"[\d\.\-]+)"
| mvexpand job
| rex field=job "\\\\\"jobname\\\\\":\s*\\\\\"(?<Name>[^\\\]+).*?\\\\\"status\\\\\":\s*\\\\\"(?<State>ENDED OK).*?Timestamp\\\\\": \\\\\"(?<TIME>\d+\s*\d+\:\d+\:\d+).*?execution_time_in_seconds\\\\\": \\\\\"(?<EXECUTION_TIME>[\d\.\-]+)"
| regex Name!="NODATA"
| table TIME Name State EXECUTION_TIME

Renunaren · ‎04-18-2024

Hi ITWhisperer,

Thank you for your response.

But the query which you have provided is eliminating the job name that contains NODATA string, but we only need that job name that contains NODATA string, rest all jobs, we can eliminate. Kindly help us on this. Thank you

ITWhisperer · ‎04-18-2024

If you examine and try to understand the solution I posted, you will see there is a not equals condition on the regex. Perhaps you could have figured out for yourself that you could simply change not equals to equals!

| regex Name="NODATA"

Need to filter particular job name from the list of job names which were extracted using rex

rex

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Need to filter particular job name from the list of job names which were extracted using rex

rex

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers