Getting Data In

best practices for search against CSV data with a fixed header?

New Member

I've defined a sourcetype for CSV data with a fixed header
and data that looks like:

Date,Color,Data1,Data2
2015-01-30 10:11:12,Red,1.1,1.01
2015-01-30 10:11:12,Green,0,0
2015-01-30 10:11:13,Red,2.2,2.02
2015-01-30 10:11:14,Red,3.3,3.03
...

so the header contains the field names of the sourcetype.
What is the best way to search, using something like
this pseudo-SQL query:

SELECT Color1 WHERE Color=Red

Splunk looks like it can do much more than this but
I'd like to start out simple. I tried queries that I thought
included the clause

...WHERE Color=Red

in Splunk-speak but I couldn't figure out how to reference
the pre-defined columns, because there's no sense looking
for 'Red' in the Date or Data fields.

Thank you.

Tags (2)
0 Karma

Splunk Employee
Splunk Employee

For your sourcetype, you should be setting you delimiter in props for CSV..

props.conf
[mycsvsourcetype]
HEADERFIELDLINENUMBER=1
TIMESTAMP
FIELDS=date
FIELD_DELIMITER=,

Your sources need to be consistent in terms of the header existing.

Once indexed correctly, you can search for fields as desired:

sourcetype=mycsvsourcetype Color=red | table Color, Data1,Data2

That will return all events with the field named Color, that has the value of red. Do note, field names are case sensitive. So if your header is "Color" and "Data1", you have to use those fields names.

0 Karma

New Member

Thanks for the response. I tried what you mentioned with no luck.

  • when you say '... props for CSV..' which file do you mean? In my installation I have eight different 'props.conf' files. From your instructions I updated these files

$SPLUNKHOME/etc/apps/MyApp/local/props.conf
$SPLUNK
HOME/etc/apps/search/local/props.conf
$SPLUNK_HOME/etc/apps/local/props.conf

with a stanza that looks something like:

....
[CST]
INDEXEDEXTRACTIONS = csv
KV
MODE = none
NOBINARYCHECK = true
PREAMBLEREGEX = ^Date
SHOULD
LINEMERGE = false
category = Custom
description = my log files
disabled = false
pulldowntype = true
FIELD
NAMES = Date,Color,Data1,Data2
TIMESTAMPFIELDS = Date
TIME
FORMAT = %Y/%m/%d %H:%M:%S
FIELDDELIMITER=,
HEADER
FIELD_DELIMITER=,
...

Notice I did not include

HEADERFIELDLINE_NUMBER=1

because the data will always be coming in - via file or TCP - without
the header.

0 Karma

New Member

To restate the situation, it looks like the data coming in on the TCP
port is not being parsed using the custom CST sourcetype I made,
even though I referenced that sourcetype when I created the TCP port input.

0 Karma

Splunk Employee
Splunk Employee

CSV's are not TCP inputs. CSV's are flat files that are read into Splunk from the file system and parsed differently because the header applies to the whole file, where as TCP inputs are sent over the network and processed per event and headers are not maintained.

So this changes the whole process. If you try to read in a CSV file, the sourcetype will be different then a TCP input due to the nature of how the file and network flow look.

I recommend creating a sourcetype for the CSV on disk first, and validate that based on the above recommendations. Once that is done and validated, move on to the network input based sourcetype. For that, share how the events look coming over the wire and we can help more.

0 Karma

New Member

Ahhh. Makes sense. OK, the sourcetype I created for the
flat file works. The TCP input process essentially is that a
remote host periodically sends lines from a CSV file that is
being tailed to the TCP port. So what is being sent to
the TCP port is a string that looks like it was taken from a CSV file, because the original source is in fact a CSV file. The headers are still fixed, because all lines have the same data layout. So I assumed the sourcetype for the CSV file would work for the TCP port as well.

0 Karma

New Member

As a special case of this, I would also like to be able
to load an entire CSV file into Splunk using the TCP port,
rather than reading a local file found on the Splunk server,
and have it automatically parsed using the CST sourcetype
I created.

0 Karma

New Member

Thanks for the response. The query works for me some of the time. The situation I seem
to have is that my custom sourcetype, call it CST, works on header-less data that
is read in from a file, but is ignored if the exact same header-less data is read in through
a TCP port. I could understand if it worked for neither or both, but I can't understand
how it can work on only one. By 'work', I mean the column headers, such as 'Color',
appear in the "Interesting Fields" seen from the GUI's search page and this query
returns results:

sourcetype=CST Color=Red

And I assume it doesn't work because the literal string 'Color=Red' does not exist in the data.

0 Karma

Motivator

Could you please post your search?

sourcetype=csv Color=Red | table Date,color,data2,data2 would give you results where color field is red.

Post what you have and the desired output.

Thanks,
Raghav

0 Karma