Solved: How can I import this CSV file into Splunk?

dperry · ‎09-22-2017

I have:
1 Searchhead
1 Deployment Server
4 Indexers (Non clustered)

This is the raw CSV file:
date,name,capacity,free_capacity,virtual_capacity,used_capacity,real_capacity,overallocation,compression_virtual_capacity,compression_compressed_capacity,compression_uncompressed_capacity
1470207600,myserver,62.00TB,16.67TB,163.02TB,41.80TB,45.24TB,262,86.72TB,34.97TB,69.88TB
1470207600,MigrationPool_8192,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB
1470207600,MigrationPool_512,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB
1470294000,myserver,62.00TB,16.67TB,163.02TB,41.81TB,45.25TB,262,86.72TB,34.99TB,69.88TB
1470294000,MigrationPool_8192,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB

the top line is the header........

I have the Props and Transform conf on my UF along side my inputs.conf

/opt/splunkforwarder/etc/apps/myapp/local/inputs.conf:

[monitor:///usr/local/bin/reports/storage/emc_capacity.out]
disabled = false
index = zz_test
sourcetype = VMAX_capacity

[monitor:///usr/local/bin/reports/storage/tustin_svc_capacity_rpts.out]
disabled = false
index = zz_test
sourcetype = SVC_capacity

[monitor:///usr/local/bin/reports/storage/idc_svc_capacity_rpts.out]
disabled = false
index = zz_test
sourcetype = SVC_capacity

/opt/splunkforwarder/etc/apps/myapp/local/props.conf:
[VMAX_capacity]
REPORT -VMAX_capacity = VMAX_storage_csv

[SVC_capacity]
REPORT -SVC_capacity = SVC_storage_csv

/opt/splunkforwarder/etc/apps/myapp/local/transforms.conf
[SVC_storage_csv]
DELIMS = ","
FIELDS = "date","name","capacity","free_capacity","virtual_capacity","used_capacity","real_capacity","overallocation","compression_virtual_capacity","compression_compressed_capacity","compression_uncompressed_capacity"

[VMAX_storage_csv]
DELIMS = ","
FIELDS = "Date","Array","Useable","Used","UsedPercent","UsedGrowth","Free","Subscribed","SubscribedMax","SubscribedPercent","SubscribedGrowth","Snapshot","compression","ExpansionNeeded"

When I run the search on my Searchhead: index=zz_test Sourcetype=SVC_capacity

The data is not parsed....my question...does the props and Transform need to ne on my Indexers? on the UF? does my Props and Transform conf look correct?

Any assistance much appreciated.

sbbadri · ‎09-22-2017

@dperry

VMAX_storage_csv,SVC_storage_csv, props.conf and transforms.conf should be on indexers not on forwarders. Put your lookup files under $SPLUNK_HOME$/etc/apps/your_app/lookups/ folde and props.conf and transforms.conf under $SPLUNK_HOME$/etc/apps/your_app/local/ folder. All these files should be in all the indexers. Once done please do restart splunk. Do it one indexer at a time.

View solution in original post

dperry · ‎09-22-2017

sbbadri · ‎09-22-2017

@dperry

VMAX_storage_csv,SVC_storage_csv, props.conf and transforms.conf should be on indexers not on forwarders. Put your lookup files under $SPLUNK_HOME$/etc/apps/your_app/lookups/ folde and props.conf and transforms.conf under $SPLUNK_HOME$/etc/apps/your_app/local/ folder. All these files should be in all the indexers. Once done please do restart splunk. Do it one indexer at a time.

_internal · ‎10-30-2020

@dperry

I think it's a bit more nuanced than putting the props and transforms files on all the indexers.

The first question you really want to ask yourself before you do this, when do you want your extractions to take place. In the most general sense, you can put both files on almost all Splunk server instances. But not all the settings will take affect or make sense.

Do you want INDEX time extractions OR SEARCH time extractions.

INDEX time extractions:

Are done prior to indexing and will increase license cost.
Moves the processing load to the indexer side(when data comes in)

If you want indexed extractions :

Add something like this to your props and deploy them to the HF/UF(initial index time processing node) depending on how your architecture is setup.

Props:

INDEXED_EXTRACTIONS = CSV

*************************************************************************************************Important caveat, forwarded structured data is not parsed at the indexer. This needs to be done a the forwarding level: https://docs.splunk.com/Documentation/Splunk/8.1.0/Forwarding/Routeandfilterdatad#Caveats_for_routin...

*************************************************************************************************

SEARCH time extractions:

Are done at search time on the Search heads.
If many users are using search heads moves the processing load to search time, and may affect search performance.
No additional license cost.

If you want search-time extractions :

Add something like this to your props and transforms, and deploy to your processing node AND search head (You could split up the configs and deploy parts of the required configs to each server but for simplicity just deploy the same package everywhere. The REPORT vs TRANSFORMS is used to control index time vs search time extractions).

Assume the simplified source file is like this, and your values don't have commas within them:

name,number,colour
bob,34,red
sam,23,blue
gary,4,cyan

Props:

[yourSourcetype]
... All your other settings ...
KV_MODE=none
TRANSFORMS-deleteHeader = deleteHeader
REPORT-searchTimeExtractions = searchTimeExtractions

Transforms:

[deleteHeader]
REGEX=name,number,colour
DEST_KEY = queue
FORMAT = nullQueue

[searchTimeExtractions]
REGEX=^(?<name>[^,]*?),(?<number>[^,]*?),(?<colour>[^,]*?)[\n\r]

Link to props docs, explaining the difference between REPORT and TRANSFORMS: https://docs.splunk.com/Documentation/Splunk/8.1.0/Admin/Propsconf

Link to the sequence of search time operations in Splunk: https://docs.splunk.com/Documentation/Splunk/8.1.0/Knowledge/Searchtimeoperationssequence

to4kawa · ‎10-30-2020

Hi @_internal

Removing the CSV header, How about SEDCMD?

your SEARCH time extractions solution doesn't work without SHOULD_LINEMERGE=false, LINE_BREAKER=([\r\n]+) , I guess.

_internal · ‎10-30-2020

In general I use at bare minimum these 6 props settings. I almost always have the should line merge to false, so I not really sure how the specific this setting applies to the pipeline, and how it affects the searchTimeExtractions transform. Feel free to knowledge transfer 😁 I try to avoid line merging for performance issues, and just try to make more complex line breakers to account for multi lines.

SHOULD_LINEMERGE=false

LINE_BREAKER=([\r\n]+)

TRUNCATE=

MAX_TIMESTAMP_LOOKAHEAD=

TIME_FORMAT=

TIME_PREFIX=

For the delete command you could do something like this in the PROPS file, again I am not to familiar of the deeper level difference between nullQueue vs SEDCMD. Feel free to show me some pros and cons of either:

SEDCMD-removeHeaders = s/name,number,colour//g

dperry · ‎09-22-2017

thank you @sbbadri

So the VMAX_Storage_csv & SVC_storage_csv lookup files goes on the indexers.......Im trying to find an example of what the file would look like in my case with the headers.....

Also are my props and tranforms stanza correrct?

sbbadri · ‎09-22-2017

[monitor:///usr/local/bin/reports/storage/emc_capacity.out]
disabled = false
index = zz_test
sourcetype = VMAX_capacity

props.conf

[VMAX_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

do the same thing for others sourcetypes..

below is the link,
http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/Data/Extractfieldsfromfileswithstructureddata

dperry · ‎09-22-2017

thank you again....I see the props is this:

[VMAX_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

So the tranforms is this:

[VMAX_storage_csv]
DELIMS = ","
FIELDS = "Date","Array","Useable","Used","UsedPercent","UsedGrowth","Free","Subscribed","SubscribedMax","SubscribedPercent","SubscribedGrowth","Snapshot","compression","ExpansionNeeded"

and then place both files on my indexers......

dperry · ‎09-22-2017

So I dont need a transforms.conf, right?

All I need is the these two, right?

On the UF : inputs.conf
on the Indexer: props.conf

inputs
[monitor:///usr/local/bin/reports/storage/emc_capacity.out]
disabled = false
index = zz_test
sourcetype = VMAX_capacity

props
[VMAX_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

sbbadri · ‎09-22-2017

Yeah you are correct. No need of transforms.conf. because you are not importing any lookup file.

dperry · ‎09-22-2017

So added the Props to all four indexers, restarted them:
[SVC_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

I run the search sourctype=SVC_capacity index=zz_test

And the index data is only tailing the newer data without checking the headers:

date,name,capacity,free_capacity,virtual_capacity,used_capacity,real_capacity,overallocation,compression_virtual_capacity,compression_compressed_capacity,compression_uncompressed_capacity

so my index data looks like this:

1470207600,myserver,62.00TB,16.67TB,163.02TB,41.80TB,45.24TB,262,86.72TB,34.97TB,69.88TB
1470207600,MigrationPool_8192,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB
1470207600,MigrationPool_512,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB

sbbadri · ‎09-22-2017

so the fields extracted as per the header line.

dperry · ‎09-22-2017

I want the headers to be in the interesting fields, like so:

date - 1470207600
name - myserver
capacity - 62.00TB

and so on.....attached example

sbbadri · ‎09-22-2017

try this props.conf

[testCSVSourcetype]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
disabled = false
pulldown_type = true
TIMESTAMP_FIELDS = date

replace with proper sourcetype. I have tested above config in my local

dperry · ‎09-22-2017

thx @sbbadri

I will add this to my four indexers and restart. I will let you know the outcome

dperry · ‎09-25-2017

@sbbadri

it worked.....placed this on my four indexers, restarted the instance and the next time the file generated new data the interesting fields (headers) were parsed out.

Thank you!

How can I import this CSV file into Splunk?

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...