I have a CSV file that has some data at the start of the file and in end.
Like:
----BEGIN_RESPONSE_BODY_CSV
"Date","Action","Module","Details","User Name","User Role","User IP"
"2019-12-30T05:41:34Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:55Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:12Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:39:53Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
"CODE","TEXT","URL"
"1980","10 record limit exceeded. Use URL to get next batch of results.","/api/?action=list&truncation_limit=10&id_max=1111111"
----END_RESPONSE_FOOTER_CSV
I need to index the CSV data only. Need to remove the first line
----BEGIN_RESPONSE_BODY_CSV
the lines between
----END_RESPONSE_BODY_CSV
----END_RESPONSE_FOOTER_CSV
Please suggest what should I do in the props.conf or any conf file, to remove the unwanted data.
My props.conf:
[ csv ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TZ=UTC
TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
In your props.conf:
[yoursourcetype]
SEDCMD-beginAndend = s/^-{4}.*//g
But then you'll have two csv stuck together with both header lines too. Do you need the data in the footer responses too?
You won't be able to use indexed_extractions. You'll need a to create a report/transforms and transforms.conf to extract the fields as such.
props.conf:
[ csv ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TZ=UTC
TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
TRANSFORMS-csv=response_body_csv, response_footer_csv
trensforms.conf:
[response_body_csv]
REGEX=\"(?<Date>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\",\"(?<Action>\w+)\",\"(?<Module>\w+)\",\"(?<Details>\w+: \w+)\",\"(?<User_Name>\w+)\",\"(?<User_Role>\w+)\",\"(?<User_IP>[\w.:]+)\"
[response_footer_csv]
REGEX=\"(?<CODE>\d{4})\",\"(?<TEXT>.+)\",\"(?<URL>.+)\"
If it is a field extraction at the time of search, it looks like this.
| makeresults
| eval _raw="----BEGIN_RESPONSE_BODY_CSV
\"Date\",\"Action\",\"Module\",\"Details\",\"User Name\",\"User Role\",\"User IP\"
\"2019-12-30T05:41:34Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:55Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:12Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:39:53Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
\"CODE\",\"TEXT\",\"URL\"
\"1980\",\"10 record limit exceeded. Use URL to get next batch of results.\",\"/api/?action=list&truncation_limit=10&id_max=1111111\"
----END_RESPONSE_FOOTER_CSV"
| rex mode=sed "s/(?sm)^(\-.+?|WARNING)$/#/g"
| rex mode=sed "s/\"//g"
| makemv delim="#" _raw
| stats count by _raw
| multikv forceheader=1
| where match(_raw,".+")
| fields - *count _raw
| rex field=Date mode=sed "s/Z$/-0000/"
| eval Date=strptime(Date,"%FT%T%z")
| fieldformat Date=strftime(Date,"%F %T")
This is sample as your hope is the setting in props.conf.
multikv
don't work properly, so I removed ".
This is also modified to recognize UTC.
to4kawa, thanks for the input. I'm looking for a fix while indexing the data in Splunk using configuration file. Is it possible?
I'm not an expert, so I don't know.
But I think you may use this regular expressions.