Security

Field Extraction with CSV File

SplunkDash
Motivator

Hello.

I have some issues with field parsing for the CSV files using props configuration. I should be getting 11 fields for each of the events/rows, but parsing is giving me 17 fields. Here are the 3 sample events (First row is header row) from that CSV file and the props.conf file is also provided below:

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11

APIDEV,4xs54,000916,DEV,Update,Integrate,String\,Set\,Number\,ID,Standard,2024-07-10T23:10:45.001Z,Process_TIME\,URI\,Session_Key\,Services,Hourly

APITEST,4ys34,000916,TEST,Update,Integrate,String\,Set\,Number\,String,Typicall\,Response,2024-07-10T23:10:45.021Z,CPU_TIME\,URI\,Session_Key\,Type\,Request,Monthly

APITEST,4ys34,000916,DEV,Insert,Integrate,Char\,Set\,System\,ID,On_Demand,2024-07-10T23:10:45.051Z,CPU_TIME\,URI\,Session_Key\,Services,Hourly

 

*Bold texts in each of the events should count one field 

 

props.conf

[mypropscon]

SHOULD_LINEMERGE=False

LINE_BREAKER=([\r\n]+)

INDEXED_EXTRACTIONS=CSV

KV_MODE=none

disabled=false

TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%QZ

HEARDER_FIELD_LINE_NUMBER=1

 

Any recommendation to resolve that issue will be highly appreciated. Thank you so much for your support as always.

 

 

  

Labels (1)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

The number of backslashes in the data doesn't matter, it is the number of backslashes in the regex string I was talking about. Backslashes normally need to be escaped (with a backslash), however, sometimes these backslashes have to be escaped as well, hence the need for 4 backslashes to represent a single backslash. Try something like this (which allows for escaped (backslashed) commas in all the columns).

(?<Col1>.+?(?<!\\\\)),(?<Col2>.+?(?<!\\\\)),(?<Col3>.+?(?<!\\\\)),(?<Col4>.+?(?<!\\\\)),(?<Col5>.+?(?<!\\\\)),(?<Col6>.+?(?<!\\\\)),(?<Col7>.+?(?<!\\\\)),(?<Col8>.+?(?<!\\\\)),(?<Col9>.+?(?<!\\\\)),(?<Col10>.+?(?<!\\\\)),(?<Col11>.+?(?<!\\\\))$

Again, it might be that you only need two backslashes each time instead of four.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

That's an interesting problem but I think it's a bit of a malformed data. If your field values contain commas, they should be enclosed in quotes.

If your column order is constant, you can define a regex-based search-time extraction including escaped commas in field value.

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash ,

let me understand: you have 17 fields in your csv but you want to extract only 11 of them, is it correct?

do you want to delete the other fields or only you don't need them in visualization?

if you want to delete the extra fields, you can use SEDCMD to delete the extra fields before indexing.

In the second case, you can leave all as is and take only the 11 fields.

Ciao.

Giuseppe

SplunkDash
Motivator

Hello @gcusello and @PickleRick 

Thank you so much for your response, truly appreciate it. All values should be part of this extraction and outcome of this extraction should be and as an example:

APIDEV,4xs54,000916,DEV,Update,Integrate,String\,Set\,Number\,ID,Standard,2024-07-10T23:10:45.001Z,Process_TIME\,URI\,Session_Key\,Services,Hourly

Col1=APIDEV

Col2=4xs54

Col3=000916

Col4=DEV

Col5=Update

Col6=Integrate

Col7=String\,Set\,Number\,ID

Col8=Standard

Col9=2024-07-10T23:10:45.001Z

Col10=Process_TIME\,URI\,Session_Key\,Services

Col11=Hourly

In the case of Regex, what would be the Regex for this extraction within the props.conf. 

 

 

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If it is only Col7 and Col10 that could have escaped commas, try something like this

(?<Col1>[^,]+),(?<Col2>[^,]+),(?<Col3>[^,]+),(?<Col4>[^,]+),(?<Col5>[^,]+),(?<Col6>[^,]+),(?<Col7>.+?(?<!\\)),(?<Col8>[^,]+),(?<Col9>[^,]+),(?<Col10>.+?(?<!\\)),(?<Col11>[^,]+)$

You may have to double-up the backslashes

(?<Col1>[^,]+),(?<Col2>[^,]+),(?<Col3>[^,]+),(?<Col4>[^,]+),(?<Col5>[^,]+),(?<Col6>[^,]+),(?<Col7>.+?(?<!\\\\)),(?<Col8>[^,]+),(?<Col9>[^,]+),(?<Col10>.+?(?<!\\\\)),(?<Col11>[^,]+)$

SplunkDash
Motivator

Hello @ITWhisperer,

Thank you so much for sharing that with me. But it's not only for Col7 and Col10 all time. It might be within any Column and also # of Backslashes might not always be the same for the same Column. In that case how would I address that issue

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The number of backslashes in the data doesn't matter, it is the number of backslashes in the regex string I was talking about. Backslashes normally need to be escaped (with a backslash), however, sometimes these backslashes have to be escaped as well, hence the need for 4 backslashes to represent a single backslash. Try something like this (which allows for escaped (backslashed) commas in all the columns).

(?<Col1>.+?(?<!\\\\)),(?<Col2>.+?(?<!\\\\)),(?<Col3>.+?(?<!\\\\)),(?<Col4>.+?(?<!\\\\)),(?<Col5>.+?(?<!\\\\)),(?<Col6>.+?(?<!\\\\)),(?<Col7>.+?(?<!\\\\)),(?<Col8>.+?(?<!\\\\)),(?<Col9>.+?(?<!\\\\)),(?<Col10>.+?(?<!\\\\)),(?<Col11>.+?(?<!\\\\))$

Again, it might be that you only need two backslashes each time instead of four.

Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...