<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic best practices for search against CSV data with a fixed header? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123857#M25542</link>
    <description>&lt;P&gt;I've defined a sourcetype for CSV data with a fixed header&lt;BR /&gt;
and data that looks like:&lt;/P&gt;

&lt;P&gt;Date,Color,Data1,Data2&lt;BR /&gt;
2015-01-30 10:11:12,Red,1.1,1.01&lt;BR /&gt;
2015-01-30 10:11:12,Green,0,0&lt;BR /&gt;
2015-01-30 10:11:13,Red,2.2,2.02&lt;BR /&gt;
2015-01-30 10:11:14,Red,3.3,3.03&lt;BR /&gt;
...&lt;/P&gt;

&lt;P&gt;so the header contains the field names of the sourcetype.&lt;BR /&gt;
What is the best way to search, using something like&lt;BR /&gt;
this pseudo-SQL query:&lt;/P&gt;

&lt;P&gt;SELECT Color1 WHERE Color=Red&lt;/P&gt;

&lt;P&gt;Splunk looks like it can do much more than this but&lt;BR /&gt;
I'd like to start out simple. I tried queries that I thought&lt;BR /&gt;
included the clause&lt;/P&gt;

&lt;P&gt;...WHERE Color=Red&lt;/P&gt;

&lt;P&gt;in Splunk-speak but I couldn't figure out how to reference&lt;BR /&gt;
the pre-defined columns, because there's no sense looking&lt;BR /&gt;
for 'Red' in the Date or Data fields.&lt;/P&gt;

&lt;P&gt;Thank you.&lt;/P&gt;</description>
    <pubDate>Sat, 31 Jan 2015 02:17:31 GMT</pubDate>
    <dc:creator>drmark</dc:creator>
    <dc:date>2015-01-31T02:17:31Z</dc:date>
    <item>
      <title>best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123857#M25542</link>
      <description>&lt;P&gt;I've defined a sourcetype for CSV data with a fixed header&lt;BR /&gt;
and data that looks like:&lt;/P&gt;

&lt;P&gt;Date,Color,Data1,Data2&lt;BR /&gt;
2015-01-30 10:11:12,Red,1.1,1.01&lt;BR /&gt;
2015-01-30 10:11:12,Green,0,0&lt;BR /&gt;
2015-01-30 10:11:13,Red,2.2,2.02&lt;BR /&gt;
2015-01-30 10:11:14,Red,3.3,3.03&lt;BR /&gt;
...&lt;/P&gt;

&lt;P&gt;so the header contains the field names of the sourcetype.&lt;BR /&gt;
What is the best way to search, using something like&lt;BR /&gt;
this pseudo-SQL query:&lt;/P&gt;

&lt;P&gt;SELECT Color1 WHERE Color=Red&lt;/P&gt;

&lt;P&gt;Splunk looks like it can do much more than this but&lt;BR /&gt;
I'd like to start out simple. I tried queries that I thought&lt;BR /&gt;
included the clause&lt;/P&gt;

&lt;P&gt;...WHERE Color=Red&lt;/P&gt;

&lt;P&gt;in Splunk-speak but I couldn't figure out how to reference&lt;BR /&gt;
the pre-defined columns, because there's no sense looking&lt;BR /&gt;
for 'Red' in the Date or Data fields.&lt;/P&gt;

&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Sat, 31 Jan 2015 02:17:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123857#M25542</guid>
      <dc:creator>drmark</dc:creator>
      <dc:date>2015-01-31T02:17:31Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123858#M25543</link>
      <description>&lt;P&gt;Could you please post your search?&lt;/P&gt;

&lt;P&gt;sourcetype=csv  Color=Red | table Date,color,data2,data2 would give you results where color field is red. &lt;/P&gt;

&lt;P&gt;Post what you have and the desired output.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
Raghav&lt;/P&gt;</description>
      <pubDate>Sat, 31 Jan 2015 03:12:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123858#M25543</guid>
      <dc:creator>Raghav2384</dc:creator>
      <dc:date>2015-01-31T03:12:20Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123859#M25544</link>
      <description>&lt;P&gt;Thanks for the response. The query works for me some of the time. The situation I seem&lt;BR /&gt;
to have is that my custom sourcetype, call it CST, works on header-less data that&lt;BR /&gt;
is read in from a file, but is ignored if the exact same header-less data is read in through&lt;BR /&gt;
a TCP port. I could understand if it worked for neither or both, but I can't understand&lt;BR /&gt;
how it can work on only one. By 'work', I mean the column headers, such as 'Color',&lt;BR /&gt;
appear in the "Interesting Fields" seen from the GUI's search page and this query&lt;BR /&gt;
returns results:&lt;/P&gt;

&lt;P&gt;sourcetype=CST Color=Red&lt;/P&gt;

&lt;P&gt;And I assume it doesn't work because the literal string 'Color=Red' does not exist in the data.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 00:46:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123859#M25544</guid>
      <dc:creator>drmark</dc:creator>
      <dc:date>2015-02-03T00:46:12Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123860#M25545</link>
      <description>&lt;P&gt;For your sourcetype, you should be setting you delimiter in props for CSV..&lt;/P&gt;

&lt;P&gt;props.conf&lt;BR /&gt;
[mycsvsourcetype]&lt;BR /&gt;
HEADER_FIELD_LINE_NUMBER=1&lt;BR /&gt;
TIMESTAMP_FIELDS=date&lt;BR /&gt;
FIELD_DELIMITER=,&lt;/P&gt;

&lt;P&gt;Your sources need to be consistent in terms of the header existing. &lt;/P&gt;

&lt;P&gt;Once indexed correctly, you can search for fields as desired:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=mycsvsourcetype Color=red | table Color, Data1,Data2
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That will return all events with the field named Color, that has the value of red. Do note, field names are &lt;STRONG&gt;case sensitive&lt;/STRONG&gt;. So if your header is "Color" and "Data1", you have to use those fields names.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 18:46:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123860#M25545</guid>
      <dc:creator>esix_splunk</dc:creator>
      <dc:date>2020-09-28T18:46:18Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123861#M25546</link>
      <description>&lt;P&gt;Thanks for the response. I tried what you mentioned with no luck.&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;when you say  '... props for CSV..' which file do you mean? In my installation
I have eight different 'props.conf' files. From your instructions I updated
these files&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;$SPLUNK_HOME/etc/apps/MyApp/local/props.conf&lt;BR /&gt;
  $SPLUNK_HOME/etc/apps/search/local/props.conf&lt;BR /&gt;
  $SPLUNK_HOME/etc/apps/local/props.conf&lt;/P&gt;

&lt;P&gt;with a stanza that looks something like:&lt;/P&gt;

&lt;P&gt;....&lt;BR /&gt;
[CST]&lt;BR /&gt;
INDEXED_EXTRACTIONS = csv&lt;BR /&gt;
KV_MODE = none&lt;BR /&gt;
NO_BINARY_CHECK = true&lt;BR /&gt;
PREAMBLE_REGEX = ^Date&lt;BR /&gt;
SHOULD_LINEMERGE = false&lt;BR /&gt;
category = Custom&lt;BR /&gt;
description = my log files&lt;BR /&gt;
disabled = false&lt;BR /&gt;
pulldown_type = true&lt;BR /&gt;
FIELD_NAMES = Date,Color,Data1,Data2&lt;BR /&gt;
TIMESTAMP_FIELDS = Date&lt;BR /&gt;
TIME_FORMAT = %Y/%m/%d %H:%M:%S&lt;BR /&gt;
FIELD_DELIMITER=,&lt;BR /&gt;
HEADER_FIELD_DELIMITER=,&lt;BR /&gt;
...&lt;/P&gt;

&lt;P&gt;Notice I did not include&lt;/P&gt;

&lt;P&gt;HEADER_FIELD_LINE_NUMBER=1&lt;/P&gt;

&lt;P&gt;because the data will always be coming in - via file or TCP - without&lt;BR /&gt;
the header.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 18:49:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123861#M25546</guid>
      <dc:creator>drmark</dc:creator>
      <dc:date>2020-09-28T18:49:24Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123862#M25547</link>
      <description>&lt;P&gt;To restate the situation, it looks like the data coming in on the TCP&lt;BR /&gt;
port is not being parsed using the custom CST sourcetype I made,&lt;BR /&gt;
even though I referenced that sourcetype when I created the TCP port input.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 02:27:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123862#M25547</guid>
      <dc:creator>drmark</dc:creator>
      <dc:date>2015-02-03T02:27:39Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123863#M25548</link>
      <description>&lt;P&gt;CSV's are not TCP inputs. CSV's are flat files that are read into Splunk from the file system and parsed differently because the header applies to the whole file, where as TCP inputs are sent over the network and processed per event and headers are not maintained.&lt;/P&gt;

&lt;P&gt;So this changes the whole process. If you try to read in a CSV file, the sourcetype will be different then a TCP input due to the nature of how the file and network flow look.&lt;/P&gt;

&lt;P&gt;I recommend creating a sourcetype for the CSV on disk first, and validate that based on the above recommendations. Once that is done and validated, move on to the network input based sourcetype. For that, share how the events look coming over the wire and we can help more.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 02:33:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123863#M25548</guid>
      <dc:creator>esix_splunk</dc:creator>
      <dc:date>2015-02-03T02:33:52Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123864#M25549</link>
      <description>&lt;P&gt;Ahhh. Makes sense. OK, the sourcetype I created for the&lt;BR /&gt;
flat file works. The TCP input process essentially is that a&lt;BR /&gt;
remote host periodically sends lines from a CSV file that is&lt;BR /&gt;
being tailed to the TCP port. So what is being sent to&lt;BR /&gt;
the TCP port is a string that looks like it was taken from a CSV file, because the original source is in fact a CSV file. The headers are still fixed, because all lines have the same data layout. So I assumed the sourcetype for the CSV file would work for the TCP port as well.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 02:42:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123864#M25549</guid>
      <dc:creator>drmark</dc:creator>
      <dc:date>2015-02-03T02:42:40Z</dc:date>
    </item>
    <item>
      <title>Re: best practices for search against CSV data with a fixed header?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123865#M25550</link>
      <description>&lt;P&gt;As a special case of this, I would also like to be able&lt;BR /&gt;
to load an entire CSV file into Splunk using the TCP port,&lt;BR /&gt;
rather than reading a local file found on the Splunk server,&lt;BR /&gt;
and have it automatically parsed using the CST sourcetype&lt;BR /&gt;
I created.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 19:42:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/best-practices-for-search-against-CSV-data-with-a-fixed-header/m-p/123865#M25550</guid>
      <dc:creator>drmark</dc:creator>
      <dc:date>2015-02-03T19:42:53Z</dc:date>
    </item>
  </channel>
</rss>

