Hi Team,
Good day!
We have extracted the set of job names from the event using the below rex query.
index=app_events_dwh2_de_uat _raw=*jobname* | rex max_match=0 "\\\\\\\\\\\\\"jobname\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<Name>[^\\\]+).*?\\\\\\\\\\\\\"status\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<State>ENDED OK).*?Timestamp\\\\\\\\\\\\\": \\\\\\\\\\\\\"(?<TIME>\d+\s*\d+\:\d+\:\d+).*?execution_time_in_seconds\\\\\\\\\\\\\": \\\\\\\\\\\\\"(?<EXECUTION_TIME>[\d\.\-]+)" | table "TIME", "Name", "State", "EXECUTION_TIME" | mvexpand TIME | dedup TIME
After using the above query we have obtained the result in the table format like below.
20240417 21:13:23 CONTROL_M_REPORT ENDED OK 73.14
DWHEAP_FW_BHW ENDED OK 80.66
DWHEAP_FW_TALANX ENDED OK 80.18
DWHEAP_TALANX_LSP_FW_NODATA ENDED OK 3.25
SALES_EVENT_TRANSACTION_RDV ENDED OK 141.41
Is it possible to extract only the jobs with name consists of string NODATA from the above set of job names.
Below is the sample event for the above one.
Dataframe row : {"_c0":{"0":"{","1":" \"0\": {","2":" \"jobname\": \"CONTROL_M_REPORT\"","3":" \"status\": \"ENDED OK\"","4":" \"execution_time_in_seconds\": \"46.39\"","5":" \"Timestamp\": \"20240418 12:13:23\"","6":" }","7":" \"1\": {","8":" \"jobname\": \"DWHEAP_FW_AIMA_001\"","9":" \"status\": \"ENDED OK\"","10":" \"execution_time_in_seconds\": \"73.14\"","11":" \"Timestamp\": \"20240418 12:13:23\"","12":" }","13":" \"2\": {","14":" \"jobname\": \"DWHEAP_FW_BHW\"","15":" \"status\": \"ENDED OK\"","16":" \"execution_time_in_seconds\": \"71.19\"","17":" \"Timestamp\": \"20240418 12:13:23\"","18":" }","19":" \"3\": {","20":" \"jobname\": \"DWHEAP_FW_NODATA\"","21":" \"status\": \"ENDED OK\"","22":" \"execution_time_in_seconds\": \"80.63\"","23":" \"Timestamp\": \"20240418 12:13:23\"","24":" }","25":" \"4\": {","26":" \"jobname\": \"DWHEAP_FW_TALANX\"","27":" \"status\": \"ENDED OK\"","28":" \"execution_time_in_seconds\": \"80.20\"","29":" \"Timestamp\": \"20240418 12:13:23\"","30":" }","31":" \"5\": {","32":" \"jobname\": \"DWHEAP_FW_UC4_001\"","33":" \"status\": \"ENDED OK\"","34":" \"execution_time_in_seconds\": \"80.13\"","35":" \"Timestamp\": \"20240418 12:13:23\"","36":" }","37":" \"6\": {","38":" \"jobname\": \"DWHEAP_TALANX_LSP_FW_NODATA\"","39":" \"status\": \"ENDED NOTOK\"","40":" \"execution_time_in_seconds\": \"120.12\"","41":" \"Timestamp\": \"20240418 12:13:23\"","42":" }","43":" \"7\": {","44":" \"jobname\": \"RDV_INFRASTRUCTURE_DETAILS\"","45":" \"status\": \"ENDED OK\"","46":" \"execution_time_in_seconds\": \"81.16\"","47":" \"Timestamp\": \"20240418 12:13:23\"","48":" }","49":" \"8\": {","50":" \"jobname\": \"VIPASNEU_STG\"","51":" \"status\": \"ENDED OK\"","52":" \"execution_time_in_seconds\": \"45.04\"","53":" \"Timestamp\": \"20240418 12:13:23\"","54":" }","55":"}"}}
Please look into this and kindly help us in extraction of the job which contains string NODATA from the above set of job names that has been extracted
OK, is the "Dataframe row :" part really a part of the event or just a header you posted before the actual event.
Anyway, it seems like it's a relatively well-formed (unless I'm missing something) json embedded (and escaped) within another json. Possibly prepended with that "Dataframe row :" header.
I'd say just cut the header if applicable, parse the outer json, extract the inner json, split if needed into multiple events, then spath the inner json(s).
And don't use regexes to manipulate structured data unless you really can't avoid it.
You could try extracting each job as a complete event, before extracting the individual fields. You can then filter out the jobs you don't want (btw, your regex seems to have way too many backslashes, but you may need them if your actual data is different to the example you shared)
| rex max_match=0 "(?<job>\\\\\"jobname\\\\\":\s*\\\\\"[^\\\]+.*?\\\\\"status\\\\\":\s*\\\\\"ENDED OK.*?Timestamp\\\\\": \\\\\"\d+\s*\d+\:\d+\:\d+.*?execution_time_in_seconds\\\\\": \\\\\"[\d\.\-]+)"
| mvexpand job
| rex field=job "\\\\\"jobname\\\\\":\s*\\\\\"(?<Name>[^\\\]+).*?\\\\\"status\\\\\":\s*\\\\\"(?<State>ENDED OK).*?Timestamp\\\\\": \\\\\"(?<TIME>\d+\s*\d+\:\d+\:\d+).*?execution_time_in_seconds\\\\\": \\\\\"(?<EXECUTION_TIME>[\d\.\-]+)"
| regex Name!="NODATA"
| table TIME Name State EXECUTION_TIME
Hi ITWhisperer,
Thank you for your response.
But the query which you have provided is eliminating the job name that contains NODATA string, but we only need that job name that contains NODATA string, rest all jobs, we can eliminate. Kindly help us on this. Thank you
If you examine and try to understand the solution I posted, you will see there is a not equals condition on the regex. Perhaps you could have figured out for yourself that you could simply change not equals to equals!
| regex Name="NODATA"