Knowledge Management

How to parse fields properly?

Charlie5
Loves-to-Learn

Hello,

I am trying to get a field extraction working, and have written regex accordingly that the field extractor seems to like. The raw logs are a list of quotes-encapsulated fields separated by commas:

"field1","field2","field3",...

Certain fields can have multiple values, wherein the values are separated only by a comma but quotes enclose only the entire list of fields. For example:

"field1","field2","field3value1,field3value2,field3value3",...

To complicate matters, values that belong to a certain field can contain multiple words separated by other characters, such as "Software/Technology" or "Business and Industry" so that the entire field may look something like this:

"Software/Technology,Business Services,Application,Business and Industry,Computers and Internet"

That field needs to be extracted and displayed exactly as it is shown, The regex I have attempted for this is as follows:

"(?<categories>[^\"]+|)
"(?<categories_again>[\w\s\/\,]+|)

Although the field extractor, rex function, and regex101 like both of these extractions and they work exactly as expected, when I search I get each word from within the field as its own independent value, which is not what I need:

Software
Technology
Business
Services
Application
and
Industry

At this point I'm out of ideas as to regex modifications or other work-arounds that can be applied to fix this. Has anyone else encountered this problem and if so, were you able to fix it and how? Otherwise I think I have to bring this to Splunk support.

Thank you

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It is not entirely clear what your expected results are. For example, are you looking for the extract to produce a multi-value field like this

Software/Technology
Business Services
Application
Business and Industry
Computers and Internet

or a single field like this

Software/Technology,Business Services,Application,Business and Industry,Computers and Internet

or in the more generic case a multi-value field like this

field1
field2
field3value1,field3value2,field3value3

or is this three fields

field1
field2
field3value1,field3value2,field3value3

or, in the case of the last field

field3value1
field3value2
field3value3

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share some sanitized example events for us to test with.  Are you trying to parse the fields at search time or index time?  If the former, please share the SPL you're using; otherwise, share the relevant props.conf stanza.

---
If this reply helps you, Karma would be appreciated.
0 Karma

Charlie5
Loves-to-Learn

Thanks for the responses thus far, it is much appreciated. Here are some sanitized examples of logs:

"2023-04-25 13:14:27","QZ-NewYork_DMZ","QZ-NewYork_DMZ","80.20.59.143","80.20.59.143","Allowed","28 (AAAA)","NOERROR","webdefence.global.whitespider.com","Software/Technology,Application,Computers and Internet","Networks","Networks",""

"2022-10-23 11:34:59","Charlie Five (cfive@workplace.com)","Charlie Five (cfive@workplace.com),QZ-NewYork_Verizon_VPN_NAT,QZ-845310891334","172.32.5.8","8.8.8.8","Allowed","1 (A)","NOERROR","outlook.office365.com","Software/Technology,Webmail,Business Services,Organizational Email,Application,Web-based Email,Online Document Sharing and Collaboration","AD Users","AD Users,Networks,Anyconnect Roaming Client",""

In the first example, I would want the values for the categories field to be as follows; each line represents one complete field value as it would display in a search:

Software/Technology
Application
Computers and Internet

Alternatively, this would also suffice, which is the entire string exactly as it displays in the log:

Software/Technology,Application,Computers and Internet

The same applies to the second example, here I will display them as if I clicked on the field in the event drop-down and selected "view events", this is what would be added to the search bar:

categories="Software/Technology,Webmail,Business Services,Organizational Email,Application,Web-based Email,Online Document Sharing and Collaboration"

Or (I'll only show 1 here for the sake of brevity):

categories="Online Document Sharing and Collaboration"

Hope this helps you more, and thank you again for your assistance.

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Depending on whether the final field is important, you could do something like this

| rex max_match=0 "(?<field>\"[^\"]*\"),?"
| eval categories=split(trim(mvindex(field,9),"\""),",")
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Are you trying to parse the fields at search time or index time?  If the former, please share the SPL you're using; otherwise, share the relevant props.conf stanza.

---
If this reply helps you, Karma would be appreciated.
0 Karma

Charlie5
Loves-to-Learn

@richgalloway Search time, here is the SPL for manual extraction:

index=my_index sourcetype=proxy_sourcetype
| rex field=_raw "^("([^\"]+)",){9}"(?<categories>[^\"]+)"

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think you're most of the way there.  To separate the categories, use the split function.

index=my_index sourcetype=proxy_sourcetype
| rex field=_raw "^("([^\"]+)",){9}"(?<categories>[^\"]+)"
| eval categories=split(categories,",")

 

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...