Splunk Enterprise Security

How to remove unwanted data while indexing a CSV file?

pbankar
Path Finder

I have a CSV file that has some data at the start of the file and in end.
Like:

----BEGIN_RESPONSE_BODY_CSV
"Date","Action","Module","Details","User Name","User Role","User IP"
"2019-12-30T05:41:34Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:55Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:12Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:39:53Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
"CODE","TEXT","URL"
"1980","10 record limit exceeded. Use URL to get next batch of results.","/api/?action=list&truncation_limit=10&id_max=1111111"
----END_RESPONSE_FOOTER_CSV

I need to index the CSV data only. Need to remove the first line

 ----BEGIN_RESPONSE_BODY_CSV

the lines between

 ----END_RESPONSE_BODY_CSV
 ----END_RESPONSE_FOOTER_CSV

Please suggest what should I do in the props.conf or any conf file, to remove the unwanted data.
My props.conf:

[ csv ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TZ=UTC
TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
0 Karma

jkat54
SplunkTrust
SplunkTrust

In your props.conf:

[yoursourcetype]
SEDCMD-beginAndend = s/^-{4}.*//g

above removes lines that start with ----

But then you'll have two csv stuck together with both header lines too. Do you need the data in the footer responses too?

You won't be able to use indexed_extractions. You'll need a to create a report/transforms and transforms.conf to extract the fields as such.

0 Karma

to4kawa
Ultra Champion

props.conf:

    [ csv ]
     SHOULD_LINEMERGE=false
     LINE_BREAKER=([\r\n]+)
     NO_BINARY_CHECK=true
     CHARSET=UTF-8
     INDEXED_EXTRACTIONS=csv
     KV_MODE=none
     category=Structured
     description=Comma-separated value format. Set header and other settings in "Delimited Settings"
     disabled=false
     pulldown_type=true
     TZ=UTC
     TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
     TRANSFORMS-csv=response_body_csv, response_footer_csv  

trensforms.conf:

    [response_body_csv]
    REGEX=\"(?<Date>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\",\"(?<Action>\w+)\",\"(?<Module>\w+)\",\"(?<Details>\w+: \w+)\",\"(?<User_Name>\w+)\",\"(?<User_Role>\w+)\",\"(?<User_IP>[\w.:]+)\"
    [response_footer_csv]
    REGEX=\"(?<CODE>\d{4})\",\"(?<TEXT>.+)\",\"(?<URL>.+)\"

If it is a field extraction at the time of search, it looks like this.

0 Karma

to4kawa
Ultra Champion
| makeresults
| eval _raw="----BEGIN_RESPONSE_BODY_CSV
\"Date\",\"Action\",\"Module\",\"Details\",\"User Name\",\"User Role\",\"User IP\"
\"2019-12-30T05:41:34Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:55Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:12Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:39:53Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
\"CODE\",\"TEXT\",\"URL\"
\"1980\",\"10 record limit exceeded. Use URL to get next batch of results.\",\"/api/?action=list&truncation_limit=10&id_max=1111111\"
----END_RESPONSE_FOOTER_CSV"
| rex mode=sed "s/(?sm)^(\-.+?|WARNING)$/#/g"
| rex mode=sed "s/\"//g"
| makemv delim="#" _raw
| stats count by _raw
| multikv forceheader=1
| where match(_raw,".+")
| fields - *count _raw
| rex field=Date mode=sed "s/Z$/-0000/"
| eval Date=strptime(Date,"%FT%T%z")
| fieldformat Date=strftime(Date,"%F %T")

This is sample as your hope is the setting in props.conf.
multikv don't work properly, so I removed ".
This is also modified to recognize UTC.

0 Karma

pbankar
Path Finder

to4kawa, thanks for the input. I'm looking for a fix while indexing the data in Splunk using configuration file. Is it possible?

0 Karma

to4kawa
Ultra Champion

I'm not an expert, so I don't know.
But I think you may use this regular expressions.

Configure event line breaking

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...