Security

Field Extraction with CSV File

SplunkDash
Motivator

Hello.

I have some issues with field parsing for the CSV files using props configuration. I should be getting 11 fields for each of the events/rows, but parsing is giving me 17 fields. Here are the 3 sample events (First row is header row) from that CSV file and the props.conf file is also provided below:

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11

APIDEV,4xs54,000916,DEV,Update,Integrate,String\,Set\,Number\,ID,Standard,2024-07-10T23:10:45.001Z,Process_TIME\,URI\,Session_Key\,Services,Hourly

APITEST,4ys34,000916,TEST,Update,Integrate,String\,Set\,Number\,String,Typicall\,Response,2024-07-10T23:10:45.021Z,CPU_TIME\,URI\,Session_Key\,Type\,Request,Monthly

APITEST,4ys34,000916,DEV,Insert,Integrate,Char\,Set\,System\,ID,On_Demand,2024-07-10T23:10:45.051Z,CPU_TIME\,URI\,Session_Key\,Services,Hourly

 

*Bold texts in each of the events should count one field 

 

props.conf

[mypropscon]

SHOULD_LINEMERGE=False

LINE_BREAKER=([\r\n]+)

INDEXED_EXTRACTIONS=CSV

KV_MODE=none

disabled=false

TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%QZ

HEARDER_FIELD_LINE_NUMBER=1

 

Any recommendation to resolve that issue will be highly appreciated. Thank you so much for your support as always.

 

 

  

Labels (1)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

The number of backslashes in the data doesn't matter, it is the number of backslashes in the regex string I was talking about. Backslashes normally need to be escaped (with a backslash), however, sometimes these backslashes have to be escaped as well, hence the need for 4 backslashes to represent a single backslash. Try something like this (which allows for escaped (backslashed) commas in all the columns).

(?<Col1>.+?(?<!\\\\)),(?<Col2>.+?(?<!\\\\)),(?<Col3>.+?(?<!\\\\)),(?<Col4>.+?(?<!\\\\)),(?<Col5>.+?(?<!\\\\)),(?<Col6>.+?(?<!\\\\)),(?<Col7>.+?(?<!\\\\)),(?<Col8>.+?(?<!\\\\)),(?<Col9>.+?(?<!\\\\)),(?<Col10>.+?(?<!\\\\)),(?<Col11>.+?(?<!\\\\))$

Again, it might be that you only need two backslashes each time instead of four.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

That's an interesting problem but I think it's a bit of a malformed data. If your field values contain commas, they should be enclosed in quotes.

If your column order is constant, you can define a regex-based search-time extraction including escaped commas in field value.

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash ,

let me understand: you have 17 fields in your csv but you want to extract only 11 of them, is it correct?

do you want to delete the other fields or only you don't need them in visualization?

if you want to delete the extra fields, you can use SEDCMD to delete the extra fields before indexing.

In the second case, you can leave all as is and take only the 11 fields.

Ciao.

Giuseppe

SplunkDash
Motivator

Hello @gcusello and @PickleRick 

Thank you so much for your response, truly appreciate it. All values should be part of this extraction and outcome of this extraction should be and as an example:

APIDEV,4xs54,000916,DEV,Update,Integrate,String\,Set\,Number\,ID,Standard,2024-07-10T23:10:45.001Z,Process_TIME\,URI\,Session_Key\,Services,Hourly

Col1=APIDEV

Col2=4xs54

Col3=000916

Col4=DEV

Col5=Update

Col6=Integrate

Col7=String\,Set\,Number\,ID

Col8=Standard

Col9=2024-07-10T23:10:45.001Z

Col10=Process_TIME\,URI\,Session_Key\,Services

Col11=Hourly

In the case of Regex, what would be the Regex for this extraction within the props.conf. 

 

 

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If it is only Col7 and Col10 that could have escaped commas, try something like this

(?<Col1>[^,]+),(?<Col2>[^,]+),(?<Col3>[^,]+),(?<Col4>[^,]+),(?<Col5>[^,]+),(?<Col6>[^,]+),(?<Col7>.+?(?<!\\)),(?<Col8>[^,]+),(?<Col9>[^,]+),(?<Col10>.+?(?<!\\)),(?<Col11>[^,]+)$

You may have to double-up the backslashes

(?<Col1>[^,]+),(?<Col2>[^,]+),(?<Col3>[^,]+),(?<Col4>[^,]+),(?<Col5>[^,]+),(?<Col6>[^,]+),(?<Col7>.+?(?<!\\\\)),(?<Col8>[^,]+),(?<Col9>[^,]+),(?<Col10>.+?(?<!\\\\)),(?<Col11>[^,]+)$

SplunkDash
Motivator

Hello @ITWhisperer,

Thank you so much for sharing that with me. But it's not only for Col7 and Col10 all time. It might be within any Column and also # of Backslashes might not always be the same for the same Column. In that case how would I address that issue

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The number of backslashes in the data doesn't matter, it is the number of backslashes in the regex string I was talking about. Backslashes normally need to be escaped (with a backslash), however, sometimes these backslashes have to be escaped as well, hence the need for 4 backslashes to represent a single backslash. Try something like this (which allows for escaped (backslashed) commas in all the columns).

(?<Col1>.+?(?<!\\\\)),(?<Col2>.+?(?<!\\\\)),(?<Col3>.+?(?<!\\\\)),(?<Col4>.+?(?<!\\\\)),(?<Col5>.+?(?<!\\\\)),(?<Col6>.+?(?<!\\\\)),(?<Col7>.+?(?<!\\\\)),(?<Col8>.+?(?<!\\\\)),(?<Col9>.+?(?<!\\\\)),(?<Col10>.+?(?<!\\\\)),(?<Col11>.+?(?<!\\\\))$

Again, it might be that you only need two backslashes each time instead of four.

Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...