topic CSV empty quoted field extraction problem in Splunk Search

CSV empty quoted field extraction problem

cajose3pepe — Wed, 30 Sep 2020 02:23:49 GMT

Hi there,

I have the next CSV file:

"CLM_TIMESTAMP","CLM_DATE","CLM_NUMBER"
"1569301200","24/09/2019 00:00:00","389721519283162"
"1569301400","24/09/2019 00:00:00",""
"1569301600","24/09/2019 00:00:00",""

When forwarded to index, the CLM_NUMBER "" then appears indexed as 'CLM_NUMBER = ' so if you look at statistics it says that 100% of events has the field, however, I only want to have the field in those cases where it really appears, and to consider empty double quotes to null.

Any idea to solve this problem?

I have tried to apply SEDCMD commands to the source and change the _raw but even if I change the whole _raw to "I have changed the raw", all of the fields of the csv still remains with the values that contains in the csv.

Re: CSV empty quoted field extraction problem

gcusello — Mon, 07 Oct 2019 12:56:54 GMT

Hi
try something like this:

| makeresults | eval CLM_TIMESTAMP="1569301200", CLM_DATE="24/09/2019 00:00:00", CLM_NUMBER="389721519283162"
| append [ | makeresults | eval CLM_TIMESTAMP="1569301400", CLM_DATE="24/09/2019 00:00:00", CLM_NUMBER="" ]
| append [ | makeresults | eval CLM_TIMESTAMP="1569301600", CLM_DATE="24/09/2019 00:00:00", CLM_NUMBER="" ]
| eval CLM_NUMBER=if(tonumber(CLM_NUMBER)>0,CLM_NUMBER,NULL)
| stats count BY CLM_NUMBER

Bye.
Giuseppe

Re: CSV empty quoted field extraction problem

cajose3pepe — Mon, 07 Oct 2019 15:35:16 GMT

Many thanks for your answer Giuseppe. I forgot to mention that I need it at index time.

Re: CSV empty quoted field extraction problem

cajose3pepe — Wed, 30 Sep 2020 02:24:05 GMT

I have found a "solution" that fits for me:

PROPS.CONF
[my_sourcetype]
CHARSET = ISO-8859-1
TZ = America/Sao_Paulo
TIME_PREFIX = \" # Line to get first field as timestamp
TIME_FORMAT=%s
MAX_TIMESTAMP_LOOKAHEAD=10
SHOULD_LINEMERGE = false
disabled = false
pulldown_type = true
KV_MODE = none
NO_BINARY_CHECK = true
PREAMBLE_REGEX = .CLM_NUMBER" #Line to avoid header indexing
SEDCMD-changeeventformat1 = s/(\"[^\"]\"),(\"[^\"]\"),(\"[^\"]\")/clm_timestamp=\1 clm_date=\2 clm_number=\3/g
SEDCMD-changeeventformat2 = s/ \w+=\"\"//g #This line deletes empty fields

Not a beautiful solution but after hours of tries is the only solution I have found.

Hope is helpful for others.