topic Re: Splunk Regex (mail csv data extraction) in Dashboards & Visualizations

Splunk Regex (mail csv data extraction)

Joannna — Wed, 30 Jun 2021 09:25:51 GMT

I have a field that's called file_content on an source type.
This has a CSV inside.

Meaning every event has a field (file_content) that has a csv inside it. Every event is an email Can't be field extraction as the "file_content" is really hard to find inside the data.

I used the regex query to extract the data, and I have 2 issues with it,

1. field TicketNumber has both, but my regex ignores the " and I get number2 on the next column
- ," number , number2 ",
- ,number,

2. It's very slow as I get 1 CSV per hour every day. so i wonder if there is any automation or a better way to do this

First i get all lines:

| makemv delim="
" file_content
| mvexpand file_content
| table file_content _time

Then I get the regex per line

| rex field=file_content "(?P<ContactId>[^\s,]+),(?P<Customernumber>[^\s,]+),(?P<AfterContactWorkDuration>[^\s,]+),(?P<AfterContactWorkEndTimestamp>[^\s,]+),(?P<AfterContactWorkStartTimestamp>[^\s,]+),(?P<AgentInteractionDuration>[^\s,]+),(?P<ConnectedToAgentTimestamp>[^\s,]+),(?P<CustomerHoldDuration>[^\s,]+),(?P<Hierarchygroups_Level1_GroupName>[^\s,]+),(?P<Hierarchygroups_Level2_GroupName>[^\s,]+),(?P<Hierarchygroups_Level3_GroupName>[^\s,]+),(?P<LongestHoldDuration>[^\s,]+),(?P<NumberOfHolds>[^\s,]+),(?P<Routingprofile>[^\s,]+),(?P<Agent>[^\s,]+),(?P<AgentConnectionAttempts>[^\s,]+),(?P<ConnectedToSystemTimestamp>[^\s,]+),(?P<DisconnectTimestamp>[^\s,]+),(?P<InitiationMethod>[^\s,]+),(?P<InitiationTimestamp>[^\s,]+),(?P<LastUpdateTimestamp>[^\s,]+),(?P<NextContactId>[^\s,]+),(?P<PreviousContactId>[^\s,]+),(?P<DequeueTimestamp>[^\s,]+),(?P<Duration>[^\s,]+),(?P<EnqueueTimestamp>[^\s,]+),(?P<Name>[^\s,]+),(?P<TransferCompletedTimestamp>[^\s,]+),(?P<HandleTime>[^\s,]+),(?P<TicketNumber>[^\s,]+),(?P<Account>[^\s,]+),(?P<AccountName>[^\s,]+),(?P<Country>[^\s,]+),(?P<Language>[^\s,]+),(?P<Site>[^\s,]+),(?P<WrapCode>[^\s,]+)"

And here is an example of how the data should look like (in csv):

ContactId,Customernumber,AfterContactWorkDuration,AfterContactWorkEndTimestamp,AfterContactWorkStartTimestamp,AgentInteractionDuration,ConnectedToAgentTimestamp,CustomerHoldDuration,Hierarchygroups_Level1_GroupName,Hierarchygroups_Level2_GroupName,Hierarchygroups_Level3_GroupName,LongestHoldDuration,NumberOfHolds,Routingprofile,Agent,AgentConnectionAttempts,ConnectedToSystemTimestamp,DisconnectTimestamp,InitiationMethod,InitiationTimestamp,LastUpdateTimestamp,NextContactId,PreviousContactId,DequeueTimestamp,Duration,EnqueueTimestamp,Name,TransferCompletedTimestamp,HandleTime,TicketNumber,Account,AccountName,Country,Language,Site,WrapCode
aaaa-xxxxxx,123456789,90,29/06/2021 01:00,29/06/2021 01:00,111,29/06/2021 01:00,0,country1,xx,yy,90,90,language,dummy,1,29/06/2021 01:00,29/06/2021 01:00,type_x,29/06/2021 01:00,29/06/2021 01:00,,,29/06/2021 01:00,11,29/06/2021 01:00,type_y,29/06/2021 01:00,201,A123,xxx,xxx,country_y,language,type_w,xxxx
bbbb-xxxxxx,987654321,90,29/06/2021 01:00,29/06/2021 01:00,111+P4,29/06/2021 01:00,0,country1,xx,yy,90,90,language,dummy,1,29/06/2021 01:00,29/06/2021 01:00,type_x,29/06/2021 01:00,29/06/2021 01:00,,,29/06/2021 01:00,11,29/06/2021 01:00,type_y,29/06/2021 01:00,201,"""A123,B123""",xxx,xxx,country_y,language,type_w,xxxx

Re: Splunk Regex (mail csv data extraction)

ITWhisperer — Wed, 30 Jun 2021 11:23:53 GMT

Your timestamps have spaces in (not accounted for in your expression) and the ticket number optionally has pairs of double quotes. Try something like this

| rex field=file_content "(?P<ContactId>[^\s,]*),(?P<Customernumber>[^\s,]*),(?P<AfterContactWorkDuration>[^\s,]*),(?P<AfterContactWorkEndTimestamp>[^,]*),(?P<AfterContactWorkStartTimestamp>[^,]*),(?P<AgentInteractionDuration>[^\s,]*),(?P<ConnectedToAgentTimestamp>[^,]*),(?P<CustomerHoldDuration>[^\s,]*),(?P<Hierarchygroups_Level1_GroupName>[^\s,]*),(?P<Hierarchygroups_Level2_GroupName>[^\s,]*),(?P<Hierarchygroups_Level3_GroupName>[^\s,]*),(?P<LongestHoldDuration>[^\s,]*),(?P<NumberOfHolds>[^\s,]*),(?P<Routingprofile>[^\s,]*),(?P<Agent>[^\s,]*),(?P<AgentConnectionAttempts>[^\s,]*),(?P<ConnectedToSystemTimestamp>[^,]*),(?P<DisconnectTimestamp>[^,]*),(?P<InitiationMethod>[^\s,]*),(?P<InitiationTimestamp>[^,]*),(?P<LastUpdateTimestamp>[^,]*),(?P<NextContactId>[^\s,]*),(?P<PreviousContactId>[^\s,]*),(?P<DequeueTimestamp>[^,]*),(?P<Duration>[^\s,]*),(?P<EnqueueTimestamp>[^,]*),(?P<Name>[^\s,]*),(?P<TransferCompletedTimestamp>[^,]*),(?P<HandleTime>[^\s,]*),(?P<TicketNumber>((\"[^\"]*\")+|[^\s,]*)),(?P<Account>[^\s,]*),(?P<AccountName>[^\s,]*),(?P<Country>[^\s,]*),(?P<Language>[^\s,]*),(?P<Site>[^\s,]*),(?P<WrapCode>[^\s,]*)"

Re: Splunk Regex (mail csv data extraction)

Joannna — Wed, 30 Jun 2021 12:21:09 GMT

wow thank you so much the issue is now , the 2second one that it's too much data and I get this error :

command.mvexpand: output will be truncated at 15700 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.

Anything I can do here or any other way I can do this extraction?

For example you can run a report everyday and save the outcome to an lookup , but this wouldn't work as it would be too much data for a lookup , is there any other solution?

Re: Splunk Regex (mail csv data extraction)

ITWhisperer — Wed, 30 Jun 2021 12:28:24 GMT

Try moving the table command to before the mvexpand that way _raw doesn't get copied into every event before being eliminated. Other than that, you could try increasing the limit in limits.conf, or there is a longer way around it described in a post I made a while ago (although this doesn't really solve the memory issue, only the row count issue).

Re: Splunk Regex (mail csv data extraction)

kamlesh_vaghela — Wed, 30 Jun 2021 12:46:35 GMT

@Joannna

Instead of this

| makemv delim=" " file_content | mvexpand file_content | table file_content _time

I suggest below solutions. You can pick any of them

| rename file_content as _raw | multikv forceheader=1 | table _time _raw

| makemv delim=" " file_content | stats count by _time file_content

Thanks
KV
▄︻̷̿┻̿═━一

If any of my reply helps you to solve the problem Or gain knowledge, an upvote would be appreciated.

Re: Splunk Regex (mail csv data extraction)

Joannna — Wed, 30 Jun 2021 12:47:28 GMT

Well you fixed my issues, thank you, 🙂

I looked around and I found somepeople talk about summary index , do you think this would be a good option for me ?

Re: Splunk Regex (mail csv data extraction)

Joannna — Wed, 30 Jun 2021 12:53:49 GMT

Thank you, i tested both and checked the fastest one is the second approach takes 1.321 secs my original after moving the table to the top as sugested was 1.789 secs. 🙂 thanks