Splunk Search

How to edit my regex to extract fields from an imported CSV file with extra commas in some rows?

banderson7
Communicator

So this is fun...
I need to import a CSV on a regular basis, and I have no control over the format or data in the csv. Contents include:

name1-name2-Uptime,N. California [RealBrowser],8419,100,0,23
name1-name2-Uptime,N. Virginia [RealBrowser],10062,100,0,25
name1-name2-Uptime,Chicago, IL [RealBrowser],6882,95.83300018,1,24
name1-name2-Uptime,Dallas, TX [RealBrowser],9558,100,0,23

Some of the second columns have commas, but not all, and my regex is getting thrown by them.

I've tried the following regex, but it combines the second, third and fourth columns.

^(?<check_name>[^,]+),(?<location>.*),(?<avg_response_time>[^,]+)
0 Karma
1 Solution

woodcock
Esteemed Legend

Like this:

^(?<check_name>[^,]+),(?<location>.*),(?<num1>[^,]*),(?<num2>[^,]*),(?<num3>[^,]*),(?<num4>[^,]*)$

View solution in original post

landen99
Motivator

From the data I am seeing, the fields are not separated by commas and need regex that looks for things other than commas:

 name1-name2-Uptime    N. California [RealBrowser]    8419    100    0    23
 name1-name2-Uptime    N. Virginia [RealBrowser]    10062    100    0    25
 name1-name2-Uptime    Chicago, IL [RealBrowser]    6882    95.83300018    1    24
 name1-name2-Uptime    Dallas, TX [RealBrowser]    9558    100    0    23

Based on the form of your regex for the value patterns for each field, I suggest this regex:

(?<check_name>[A-Za-z]\S+)\s+(?<location>[^\]]+\])\s+(?<avg_response_time>\d+)

But I recommend breaking the fields out a little more:

(?<check_name1>[A-Za-z]\w+)\-(?<check_name2>[A-Za-z]\w+)\-(?<state>\w+)\s+(?<location>[^\[]+) \[(?<browser>[^\]]+)\]\s+(?<avg_response_time>\d+)

https://regex101.com/r/zK5yT5/1

0 Karma

woodcock
Esteemed Legend

Like this:

^(?<check_name>[^,]+),(?<location>.*),(?<num1>[^,]*),(?<num2>[^,]*),(?<num3>[^,]*),(?<num4>[^,]*)$

banderson7
Communicator

That did it!
What's the difference between the first field and the rest? What's the * do that the + didn't?

0 Karma

landen99
Motivator

"*" is zero or more allowing the possibility of no matches for the pattern. But your data lacks the commas for those regex to work so I don't think it should work. As if you and Gregg are seeing different data than what myself and others are seeing. I'll post mine below, just in case.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Is your sample data before or after it's been indexed by Splunk? If the latter, please post the data in it's original format. I ask because your regex is expecting a comma between fields, but the data has white space between fields.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...