Getting Data In

How can I import this CSV file into Splunk?

dperry
Communicator

I have:
1 Searchhead
1 Deployment Server
4 Indexers (Non clustered)

This is the raw CSV file:
date,name,capacity,free_capacity,virtual_capacity,used_capacity,real_capacity,overallocation,compression_virtual_capacity,compression_compressed_capacity,compression_uncompressed_capacity
1470207600,myserver,62.00TB,16.67TB,163.02TB,41.80TB,45.24TB,262,86.72TB,34.97TB,69.88TB
1470207600,MigrationPool_8192,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB
1470207600,MigrationPool_512,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB
1470294000,myserver,62.00TB,16.67TB,163.02TB,41.81TB,45.25TB,262,86.72TB,34.99TB,69.88TB
1470294000,MigrationPool_8192,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB

the top line is the header........

I have the Props and Transform conf on my UF along side my inputs.conf

/opt/splunkforwarder/etc/apps/myapp/local/inputs.conf:

[monitor:///usr/local/bin/reports/storage/emc_capacity.out]
disabled = false
index = zz_test
sourcetype = VMAX_capacity

[monitor:///usr/local/bin/reports/storage/tustin_svc_capacity_rpts.out]
disabled = false
index = zz_test
sourcetype = SVC_capacity

[monitor:///usr/local/bin/reports/storage/idc_svc_capacity_rpts.out]
disabled = false
index = zz_test
sourcetype = SVC_capacity

/opt/splunkforwarder/etc/apps/myapp/local/props.conf:
[VMAX_capacity]
REPORT -VMAX_capacity = VMAX_storage_csv

[SVC_capacity]
REPORT -SVC_capacity = SVC_storage_csv

/opt/splunkforwarder/etc/apps/myapp/local/transforms.conf
[SVC_storage_csv]
DELIMS = ","
FIELDS = "date","name","capacity","free_capacity","virtual_capacity","used_capacity","real_capacity","overallocation","compression_virtual_capacity","compression_compressed_capacity","compression_uncompressed_capacity"

[VMAX_storage_csv]
DELIMS = ","
FIELDS = "Date","Array","Useable","Used","UsedPercent","UsedGrowth","Free","Subscribed","SubscribedMax","SubscribedPercent","SubscribedGrowth","Snapshot","compression","ExpansionNeeded"

When I run the search on my Searchhead: index=zz_test Sourcetype=SVC_capacity

The data is not parsed....my question...does the props and Transform need to ne on my Indexers? on the UF? does my Props and Transform conf look correct?

Any assistance much appreciated.

Tags (2)
0 Karma
1 Solution

sbbadri
Motivator

@dperry

VMAX_storage_csv,SVC_storage_csv, props.conf and transforms.conf should be on indexers not on forwarders. Put your lookup files under $SPLUNK_HOME$/etc/apps/your_app/lookups/ folde and props.conf and transforms.conf under $SPLUNK_HOME$/etc/apps/your_app/local/ folder. All these files should be in all the indexers. Once done please do restart splunk. Do it one indexer at a time.

View solution in original post

0 Karma

dperry
Communicator

alt text

0 Karma

sbbadri
Motivator

@dperry

VMAX_storage_csv,SVC_storage_csv, props.conf and transforms.conf should be on indexers not on forwarders. Put your lookup files under $SPLUNK_HOME$/etc/apps/your_app/lookups/ folde and props.conf and transforms.conf under $SPLUNK_HOME$/etc/apps/your_app/local/ folder. All these files should be in all the indexers. Once done please do restart splunk. Do it one indexer at a time.

0 Karma

_internal
Loves-to-Learn Lots

@dperry 

I think it's a bit more nuanced than putting the props and transforms files on all the indexers. 

The first question you really want to ask yourself before you do this, when do you want your extractions to take place. In the most general sense, you can put both files on almost all Splunk server instances. But not all the settings will take affect or make sense.

Do you want INDEX time extractions OR SEARCH time extractions. 

 

INDEX time extractions:

  • Are done prior to indexing and will increase license cost. 
  • Moves the processing load to the indexer side(when data comes in)

If you want indexed extractions :

Add something like this to your props and deploy them to the HF/UF(initial index time processing node) depending on how your architecture is setup. 

Props:

INDEXED_EXTRACTIONS = CSV

*************************************************************************************************Important caveat, forwarded structured data is not parsed at the indexer. This needs to be done a the forwarding level: https://docs.splunk.com/Documentation/Splunk/8.1.0/Forwarding/Routeandfilterdatad#Caveats_for_routin...

*************************************************************************************************

 

SEARCH time extractions:

  • Are done at search time on the Search heads.
  • If many users are using search heads moves the processing load to search time, and may affect search performance. 
  • No additional license cost.

If you want search-time extractions :

Add something like this to your props and transforms, and deploy to your processing node AND search head (You could split up the configs and deploy parts of the required configs to each server but for simplicity just deploy the same package everywhere. The REPORT vs TRANSFORMS is used to control index time vs search time extractions). 

Assume the simplified source file is like this, and your values don't have commas within them:

name,number,colour
bob,34,red
sam,23,blue
gary,4,cyan

Props:

[yourSourcetype]
... All your other settings ...
KV_MODE=none
TRANSFORMS-deleteHeader = deleteHeader
REPORT-searchTimeExtractions = searchTimeExtractions

Transforms:

[deleteHeader]
REGEX=name,number,colour
DEST_KEY = queue
FORMAT = nullQueue

[searchTimeExtractions]
REGEX=^(?<name>[^,]*?),(?<number>[^,]*?),(?<colour>[^,]*?)[\n\r]

 

Link to props docs, explaining the difference between REPORT and TRANSFORMS: https://docs.splunk.com/Documentation/Splunk/8.1.0/Admin/Propsconf

Link to the sequence of search time operations in Splunk: https://docs.splunk.com/Documentation/Splunk/8.1.0/Knowledge/Searchtimeoperationssequence

 

0 Karma

to4kawa
Ultra Champion

Hi @_internal 

 

Removing the CSV header, How about SEDCMD?

your SEARCH time extractions solution doesn't work without SHOULD_LINEMERGE=false, LINE_BREAKER=([\r\n]+) , I guess.

 

 

0 Karma

_internal
Loves-to-Learn Lots

In general I use at bare minimum these 6 props settings. I almost always have the should line merge to false, so I not really sure how the specific this setting applies to the pipeline, and how it affects the searchTimeExtractions transform. Feel free to knowledge transfer 😁 I try to avoid line merging for performance issues, and  just try to make more complex line breakers to account for multi lines.

SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
TRUNCATE=
MAX_TIMESTAMP_LOOKAHEAD=
TIME_FORMAT=
TIME_PREFIX=
 
 
For the delete command you could do something like this in the PROPS file, again I am not to familiar of the deeper level difference between nullQueue vs SEDCMD. Feel free to show me some pros and cons of either:
 
SEDCMD-removeHeaders = s/name,number,colour//g
 
 
0 Karma

dperry
Communicator

thank you @sbbadri

So the VMAX_Storage_csv & SVC_storage_csv lookup files goes on the indexers.......Im trying to find an example of what the file would look like in my case with the headers.....

Also are my props and tranforms stanza correrct?

0 Karma

sbbadri
Motivator

[monitor:///usr/local/bin/reports/storage/emc_capacity.out]
disabled = false
index = zz_test
sourcetype = VMAX_capacity

props.conf

[VMAX_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

do the same thing for others sourcetypes..

below is the link,
http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/Data/Extractfieldsfromfileswithstructureddata

0 Karma

dperry
Communicator

thank you again....I see the props is this:

[VMAX_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

So the tranforms is this:

[VMAX_storage_csv]
DELIMS = ","
FIELDS = "Date","Array","Useable","Used","UsedPercent","UsedGrowth","Free","Subscribed","SubscribedMax","SubscribedPercent","SubscribedGrowth","Snapshot","compression","ExpansionNeeded"

and then place both files on my indexers......

0 Karma

dperry
Communicator

So I dont need a transforms.conf, right?

All I need is the these two, right?

On the UF : inputs.conf
on the Indexer: props.conf

inputs
[monitor:///usr/local/bin/reports/storage/emc_capacity.out]
disabled = false
index = zz_test
sourcetype = VMAX_capacity

props
[VMAX_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

0 Karma

sbbadri
Motivator

Yeah you are correct. No need of transforms.conf. because you are not importing any lookup file.

0 Karma

dperry
Communicator

So added the Props to all four indexers, restarted them:
[SVC_capacity]
FIELD_DELIMITER = ","
CHECK_FOR_HEADER = true
HEADER_MODE = firstline

I run the search sourctype=SVC_capacity index=zz_test

And the index data is only tailing the newer data without checking the headers:

date,name,capacity,free_capacity,virtual_capacity,used_capacity,real_capacity,overallocation,compression_virtual_capacity,compression_compressed_capacity,compression_uncompressed_capacity

so my index data looks like this:

1470207600,myserver,62.00TB,16.67TB,163.02TB,41.80TB,45.24TB,262,86.72TB,34.97TB,69.88TB
1470207600,MigrationPool_8192,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB
1470207600,MigrationPool_512,0,0,0.00MB,0.00MB,0.00MB,0,0.00MB,0.00MB,0.00MB

0 Karma

sbbadri
Motivator

so the fields extracted as per the header line.

0 Karma

dperry
Communicator

I want the headers to be in the interesting fields, like so:

date - 1470207600
name - myserver
capacity - 62.00TB

and so on.....attached example

0 Karma

sbbadri
Motivator

try this props.conf

[testCSVSourcetype]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
disabled = false
pulldown_type = true
TIMESTAMP_FIELDS = date

replace with proper sourcetype. I have tested above config in my local

0 Karma

dperry
Communicator

thx @sbbadri

I will add this to my four indexers and restart. I will let you know the outcome

0 Karma

dperry
Communicator

@sbbadri

it worked.....placed this on my four indexers, restarted the instance and the next time the file generated new data the interesting fields (headers) were parsed out.

Thank you!

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...