topic rex for complex data in Other Usage

rex for complex data

Supriya — Mon, 04 Oct 2021 17:39:13 GMT

Hi Team,

Could someone help me with the field extraction for the below complex data(1000 lines of data I concised to 10 lines of data ) :

columns to be extracted are statement_text , cnt, total_reads, total_writes, db_name

statement_text="insert into #pt_queryhistory_time (
[sample_time],command_id,cnt,total_time,[db_name],sqlhandle,hash_char) select top 500
[sample_time] = convert(smalldatetime,'2021-09-27 18:55:00'),
total_time = qs.total_elapsed_time/1000,
avg_cpu = case when qs.execution_count = 0 then 0 else qs.total_worker_time/qs.execution_count/1000 end,
db_name = case convert(int, pa.value) when null then '--unknown--' when 0 then '--unknown--' when 32767 then 'Resource' else db_name(convert(int, pa.value)) end,
[db_id] = coalesce(convert(int, pa.value),0),
hash_char = '' from sys.db_stats (nolock) as qs cross apply sys.dm_exec_plan_attributes(qs.plan_handle)as pa
where pa.attribute = N'dbid' and isnull(convert(int,pa.value),0) = 8
order by qs.total_elapsed_time desc",
cnt="1", total_reads="1888", total_writes="29", avg_writes="29",db_name="db1"

I couldn't able to extract the statement_text column completely and the remaining columns are working fine

index="index" source="source1"| rex field=_raw "statement_text\=\"(?<statement_text>[@ ( ) $ . , \"A-Z ! ^ | \" - _ : { } A-Z a-z _ 0-9]+]+)\"" | rex field=_raw "cnt\=\"(?<cnt>[0-9]+)\"" | rex field=_raw "diff_reads\=\"(?<diff_reads>[0-9]+)\""| rex field=_raw "total_writes\=\"(?<total_writes>[0-9]+)\"" | rex field=_raw "db_name\=\"(?<db_name>[A-Z a-z _ 0-9]+)\""

Please provide me rex for statement_text column where the data can be extracted till the 2nd column "cnt"

Re: rex for complex data

ITWhisperer — Mon, 04 Oct 2021 18:34:48 GMT

Would this work for you?

| rex field=_raw "statement_text\=\"(?<statement_text>[^\"]+)\""

Re: rex for complex data

somesoni2 — Mon, 04 Oct 2021 18:41:24 GMT

Give this a try for statement_text field extraction

| rex "statement_text=\"(?m)(?<statement_text>(.+[\r\n])+[^\"]+)\","

Re: rex for complex data

cpetterborg — Mon, 04 Oct 2021 21:15:35 GMT

In the case that you end up with escaped double quotes embedded in your string (e.g. "this is \"just\" what the doctor ordered") you will need to have a slightly different regex. Something like:

| rex field=_raw "statement_text\=\"(?<statement_text>(\\"|[^\"])+)\""

This should work for any number of singly-escaped double quotes in your string. I only add this because of the chance of a double quote ending up in there somewhere with such a long, unstructured string.

There is also the strange possibility that there could be unescaped double quotes, which might require something like:

|rex field=_raw "statement_text\=\"(?<statement_text>[\s\S]+)\",\s*cnt=\""

Re: rex for complex data

Supriya — Tue, 05 Oct 2021 11:32:28 GMT

Thank you everyone for your inputs!

Out of all only the below rex is working for my data, thank you @cpetterborg

|rex field=_raw "statement_text\=\"(?<statement_text>[\s\S]+)\",\s*cnt=\""

Here one more issue I have is while downloading the results in CSV file, the column statement_text is splitting in different rows. could you help me with this issue

Re: rex for complex data

cpetterborg — Tue, 05 Oct 2021 22:53:01 GMT

What do you mean by "the column statement_text is splitting in different rows"? Do you mean that there are more rows because there are "unintended" newlines among your CSV file? Is that because there are newlines in the statement text? Or is there something else that I'm not understanding?

Re: rex for complex data

Supriya — Thu, 07 Oct 2021 17:50:36 GMT

Hi @cpetterborg ,

In splunk cloud I have 500+ events and each event contains 100+ lines of data. while exporting in CSV file single event is splitting in different rows which should not happen. I need the data same as the splunk results row wise without splitting

Is there an limitation per single row while exporting in csv file?

Here is the screenshot for reference, where 2nd and 3rd rows are single event(but splitted in 2 rows) and 5&6 single event and 8&9 single event, data from 4th and 7th row is fine

Re: rex for complex data

cpetterborg — Thu, 07 Oct 2021 19:04:21 GMT

Are your events so long that they are being split before the end of the event? Look at the raw events. If you have a raw event that this is happening on, please provide a "cleaned" version of the event here, both in the raw version that is the original, but also the _raw value in splunk, in case there are differences.