Splunk Enterprise Security
Highlighted

How to remove unwanted data while indexing a CSV file?

Path Finder

I have a CSV file that has some data at the start of the file and in end.
Like:

----BEGIN_RESPONSE_BODY_CSV
"Date","Action","Module","Details","User Name","User Role","User IP"
"2019-12-30T05:41:34Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:55Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:12Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:39:53Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
"CODE","TEXT","URL"
"1980","10 record limit exceeded. Use URL to get next batch of results.","/api/?action=list&truncation_limit=10&id_max=1111111"
----END_RESPONSE_FOOTER_CSV

I need to index the CSV data only. Need to remove the first line

 ----BEGIN_RESPONSE_BODY_CSV

the lines between

 ----END_RESPONSE_BODY_CSV
 ----END_RESPONSE_FOOTER_CSV

Please suggest what should I do in the props.conf or any conf file, to remove the unwanted data.
My props.conf:

[ csv ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TZ=UTC
TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
0 Karma
Highlighted

Re: How to remove unwanted data while indexing a CSV file?

Ultra Champion
| makeresults
| eval _raw="----BEGIN_RESPONSE_BODY_CSV
\"Date\",\"Action\",\"Module\",\"Details\",\"User Name\",\"User Role\",\"User IP\"
\"2019-12-30T05:41:34Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:55Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:12Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:39:53Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
\"CODE\",\"TEXT\",\"URL\"
\"1980\",\"10 record limit exceeded. Use URL to get next batch of results.\",\"/api/?action=list&truncation_limit=10&id_max=1111111\"
----END_RESPONSE_FOOTER_CSV"
| rex mode=sed "s/(?sm)^(\-.+?|WARNING)$/#/g"
| rex mode=sed "s/\"//g"
| makemv delim="#" _raw
| stats count by _raw
| multikv forceheader=1
| where match(_raw,".+")
| fields - *count _raw
| rex field=Date mode=sed "s/Z$/-0000/"
| eval Date=strptime(Date,"%FT%T%z")
| fieldformat Date=strftime(Date,"%F %T")

This is sample as your hope is the setting in props.conf.
multikv don't work properly, so I removed ".
This is also modified to recognize UTC.

0 Karma
Highlighted

Re: How to remove unwanted data while indexing a CSV file?

Path Finder

to4kawa, thanks for the input. I'm looking for a fix while indexing the data in Splunk using configuration file. Is it possible?

0 Karma
Highlighted

Re: How to remove unwanted data while indexing a CSV file?

Ultra Champion

I'm not an expert, so I don't know.
But I think you may use this regular expressions.

Configure event line breaking

0 Karma
Highlighted

Re: How to remove unwanted data while indexing a CSV file?

SplunkTrust
SplunkTrust

In your props.conf:

[yoursourcetype]
SEDCMD-beginAndend = s/^-{4}.*//g

above removes lines that start with ----

But then you'll have two csv stuck together with both header lines too. Do you need the data in the footer responses too?

You won't be able to use indexed_extractions. You'll need a to create a report/transforms and transforms.conf to extract the fields as such.

0 Karma
Highlighted

Re: How to remove unwanted data while indexing a CSV file?

Ultra Champion

props.conf:

    [ csv ]
     SHOULD_LINEMERGE=false
     LINE_BREAKER=([\r\n]+)
     NO_BINARY_CHECK=true
     CHARSET=UTF-8
     INDEXED_EXTRACTIONS=csv
     KV_MODE=none
     category=Structured
     description=Comma-separated value format. Set header and other settings in "Delimited Settings"
     disabled=false
     pulldown_type=true
     TZ=UTC
     TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
     TRANSFORMS-csv=response_body_csv, response_footer_csv  

trensforms.conf:

    [response_body_csv]
    REGEX=\"(?<Date>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\",\"(?<Action>\w+)\",\"(?<Module>\w+)\",\"(?<Details>\w+: \w+)\",\"(?<User_Name>\w+)\",\"(?<User_Role>\w+)\",\"(?<User_IP>[\w.:]+)\"
    [response_footer_csv]
    REGEX=\"(?<CODE>\d{4})\",\"(?<TEXT>.+)\",\"(?<URL>.+)\"

If it is a field extraction at the time of search, it looks like this.

0 Karma