Hello.
I have some issues with field parsing for the CSV files using props configuration. I should be getting 11 fields for each of the events/rows, but parsing is giving me 17 fields. Here are the 3 sample events (First row is header row) from that CSV file and the props.conf file is also provided below:
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11
APIDEV,4xs54,000916,DEV,Update,Integrate,String\,Set\,Number\,ID,Standard,2024-07-10T23:10:45.001Z,Process_TIME\,URI\,Session_Key\,Services,Hourly
APITEST,4ys34,000916,TEST,Update,Integrate,String\,Set\,Number\,String,Typicall\,Response,2024-07-10T23:10:45.021Z,CPU_TIME\,URI\,Session_Key\,Type\,Request,Monthly
APITEST,4ys34,000916,DEV,Insert,Integrate,Char\,Set\,System\,ID,On_Demand,2024-07-10T23:10:45.051Z,CPU_TIME\,URI\,Session_Key\,Services,Hourly
*Bold texts in each of the events should count one field
props.conf
[mypropscon]
SHOULD_LINEMERGE=False
LINE_BREAKER=([\r\n]+)
INDEXED_EXTRACTIONS=CSV
KV_MODE=none
disabled=false
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%QZ
HEARDER_FIELD_LINE_NUMBER=1
Any recommendation to resolve that issue will be highly appreciated. Thank you so much for your support as always.
The number of backslashes in the data doesn't matter, it is the number of backslashes in the regex string I was talking about. Backslashes normally need to be escaped (with a backslash), however, sometimes these backslashes have to be escaped as well, hence the need for 4 backslashes to represent a single backslash. Try something like this (which allows for escaped (backslashed) commas in all the columns).
(?<Col1>.+?(?<!\\\\)),(?<Col2>.+?(?<!\\\\)),(?<Col3>.+?(?<!\\\\)),(?<Col4>.+?(?<!\\\\)),(?<Col5>.+?(?<!\\\\)),(?<Col6>.+?(?<!\\\\)),(?<Col7>.+?(?<!\\\\)),(?<Col8>.+?(?<!\\\\)),(?<Col9>.+?(?<!\\\\)),(?<Col10>.+?(?<!\\\\)),(?<Col11>.+?(?<!\\\\))$
Again, it might be that you only need two backslashes each time instead of four.
That's an interesting problem but I think it's a bit of a malformed data. If your field values contain commas, they should be enclosed in quotes.
If your column order is constant, you can define a regex-based search-time extraction including escaped commas in field value.
Hi @SplunkDash ,
let me understand: you have 17 fields in your csv but you want to extract only 11 of them, is it correct?
do you want to delete the other fields or only you don't need them in visualization?
if you want to delete the extra fields, you can use SEDCMD to delete the extra fields before indexing.
In the second case, you can leave all as is and take only the 11 fields.
Ciao.
Giuseppe
Hello @gcusello and @PickleRick
Thank you so much for your response, truly appreciate it. All values should be part of this extraction and outcome of this extraction should be and as an example:
APIDEV,4xs54,000916,DEV,Update,Integrate,String\,Set\,Number\,ID,Standard,2024-07-10T23:10:45.001Z,Process_TIME\,URI\,Session_Key\,Services,Hourly
Col1=APIDEV
Col2=4xs54
Col3=000916
Col4=DEV
Col5=Update
Col6=Integrate
Col7=String\,Set\,Number\,ID
Col8=Standard
Col9=2024-07-10T23:10:45.001Z
Col10=Process_TIME\,URI\,Session_Key\,Services
Col11=Hourly
In the case of Regex, what would be the Regex for this extraction within the props.conf.
If it is only Col7 and Col10 that could have escaped commas, try something like this
(?<Col1>[^,]+),(?<Col2>[^,]+),(?<Col3>[^,]+),(?<Col4>[^,]+),(?<Col5>[^,]+),(?<Col6>[^,]+),(?<Col7>.+?(?<!\\)),(?<Col8>[^,]+),(?<Col9>[^,]+),(?<Col10>.+?(?<!\\)),(?<Col11>[^,]+)$
You may have to double-up the backslashes
(?<Col1>[^,]+),(?<Col2>[^,]+),(?<Col3>[^,]+),(?<Col4>[^,]+),(?<Col5>[^,]+),(?<Col6>[^,]+),(?<Col7>.+?(?<!\\\\)),(?<Col8>[^,]+),(?<Col9>[^,]+),(?<Col10>.+?(?<!\\\\)),(?<Col11>[^,]+)$
Hello @ITWhisperer,
Thank you so much for sharing that with me. But it's not only for Col7 and Col10 all time. It might be within any Column and also # of Backslashes might not always be the same for the same Column. In that case how would I address that issue
The number of backslashes in the data doesn't matter, it is the number of backslashes in the regex string I was talking about. Backslashes normally need to be escaped (with a backslash), however, sometimes these backslashes have to be escaped as well, hence the need for 4 backslashes to represent a single backslash. Try something like this (which allows for escaped (backslashed) commas in all the columns).
(?<Col1>.+?(?<!\\\\)),(?<Col2>.+?(?<!\\\\)),(?<Col3>.+?(?<!\\\\)),(?<Col4>.+?(?<!\\\\)),(?<Col5>.+?(?<!\\\\)),(?<Col6>.+?(?<!\\\\)),(?<Col7>.+?(?<!\\\\)),(?<Col8>.+?(?<!\\\\)),(?<Col9>.+?(?<!\\\\)),(?<Col10>.+?(?<!\\\\)),(?<Col11>.+?(?<!\\\\))$
Again, it might be that you only need two backslashes each time instead of four.