Solved: How do I extract fields from my CSV file to graph ...

fisuser1 · ‎02-11-2016

I'm trying to transform the current CSV file output we are getting from an APM into Splunk to graph response times. We have a forwarder set up to extract the CSV file data into Splunk. Currently, I am having an issue creating fields due to the poor CSV format with unnecessary data and numerous servers in one CSV. I am trying to key on the fields mean and count, and not concerned about any other fields. I've tried multiple suggestions to modify my props.conf file, with no luck.

I've attached an example of the CSVs, my inputs.conf, and props.conf. Any help would be greatly appreciated. I'm starting to fear the CSV format being passed from the APM is not workable. I've reviewed the current Splunk section at: http://docs.splunk.com/Documentation/Splunk/6.2.8/Data/Extractfieldsfromfileheadersatindextime

props.conf

[wily_reports]
#FIELD_HEADER_REGEX=Ignore_This_Stuff:\s(.*)
#FIELD_DELIMITER=,
FIELD_NAMES = "Mean" , "Count"

inputs.conf

[monitor://D:\ReportOutput\Client2_Weekly_Phil_Report.CSV]
disabled = false
host = Collector
index = wily
sourcetype = wily_reports

lguinn2 · ‎02-11-2016

It may be comma-separated, but your data is not true CSV format. In a true CSV-format file, every line of the file contains a list of the same values. I would avoid extracting fields at index time anyway. And yes, the format of the file is pretty sucky. And finally, only creating fields for mean and count - without a field like "process" to say what the mean and count apply to - seems like a bad idea.

@richGalloway is correct - you can treat this data as a normal text input.
You do need to make some decisions though, about what constitutes an "event" for this data source. Is the whole file a single event? Is each line of the file a single event? If each line is an event, how is the timestamp determined?

Here is an example of one way to do it:
props.conf

[wily_reports]
SHOULD_LINEMERGE=false
EXTRACT-fe1=(?<process>.*?)Average Response Time \(ms\)\s*\,\,\,\,(?<mean>\d+)\,(?<count>.*?)
DATETIME_CONFIG = CURRENT

inputs.conf

[monitor://D:\ReportOutput\Client2_Weekly_Phil_Report.CSV]
disabled = false
host = Collector
index = wily
sourcetype = wily_reports

Now you could write this search

index=wily process=*| table process mean count

Note that my example sets each line of the data source as an event. Since there is no timestamp on each line, props.conf tells Splunk to use the current time (when this data is indexed) as the time stamp. This approach is fine, but you may lose some other correlations; you have to decide which you prefer.

View solution in original post

lguinn2 · ‎02-11-2016

It may be comma-separated, but your data is not true CSV format. In a true CSV-format file, every line of the file contains a list of the same values. I would avoid extracting fields at index time anyway. And yes, the format of the file is pretty sucky. And finally, only creating fields for mean and count - without a field like "process" to say what the mean and count apply to - seems like a bad idea.

@richGalloway is correct - you can treat this data as a normal text input.
You do need to make some decisions though, about what constitutes an "event" for this data source. Is the whole file a single event? Is each line of the file a single event? If each line is an event, how is the timestamp determined?

Here is an example of one way to do it:
props.conf

[wily_reports]
SHOULD_LINEMERGE=false
EXTRACT-fe1=(?<process>.*?)Average Response Time \(ms\)\s*\,\,\,\,(?<mean>\d+)\,(?<count>.*?)
DATETIME_CONFIG = CURRENT

inputs.conf

[monitor://D:\ReportOutput\Client2_Weekly_Phil_Report.CSV]
disabled = false
host = Collector
index = wily
sourcetype = wily_reports

Now you could write this search

index=wily process=*| table process mean count

Note that my example sets each line of the data source as an event. Since there is no timestamp on each line, props.conf tells Splunk to use the current time (when this data is indexed) as the time stamp. This approach is fine, but you may lose some other correlations; you have to decide which you prefer.

fisuser1 · ‎02-11-2016

I added the following to the props.conf and do not see any difference in Splunk, (no defined fields) nor does the search provided produce any results.

 [wily_reports]
 SHOULD_LINEMERGE=false
 EXTRACT-fe1=(?.*?)Average Response Time \(ms\)\s*\,(?\d+)\,(?.*?)\,
 DATETIME_CONFIG = CURRENT

Attached is a copy and paste of an event displayed currently.

Again, if I try to extract a field (mean) for more than two values above, I get a "Field names must be unique." Is there a way I can break up these results into individual lines (events) so I can create the field without the conflict since they are all in the same event?

richgalloway · ‎02-12-2016

I trust you restarted Splunk after changing props.conf.
The regex in the EXTRACT-fe1 line does not match the sample events - the number of commas after "(ms)" is different and the data does not end with a comma.

---
If this reply helps you, Karma would be appreciated.

fisuser1 · ‎02-12-2016

Fixing the regex and adding MV_ADD did exactly what I was looking for... Thank you so much for your help!

richgalloway · ‎02-15-2016

Please accept the answer.

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎02-12-2016

I trust you restarted Splunk after changing props.conf.
The regex in the EXTRACT-fe1 line does not match the sample data - there is a different number of commas after "(ms)" and the data does not end with a comma.
Field names can only be specified once per regex. If multiple instances of a field are expected, add `MV_ADD=true

---
If this reply helps you, Karma would be appreciated.

lguinn2 · ‎02-12-2016

@richgalloway - thanks for catching the typos. I tried to fix a little - but it would help if we actually saw the data in its raw format, not the Excel view.

fisuser1 · ‎02-11-2016

Thank you @lguinn and @richGalloway for the responses. To answer your question, "Is the whole file a single event? ", yes... That is what was throwing me off for the most part. Each CSV is represented in Splunk as a Single event, so defining fields with multiple values in one event has been the challenge. I am not too concerned with the timestamps, only the mean and count for reporting.

richgalloway · ‎02-11-2016

Your sample data is not in CSV form. I suggest you treat it as a text file containing a sequence of events and then search for events containing "Average Response Time (ms) ". For example:

index=wily "Average Response Time (ms)" | rex "Time \(ms\) (?<Mean>\d+) (?<Count>.+$)" | table Mean Count

---
If this reply helps you, Karma would be appreciated.

fisuser1 · ‎02-11-2016

Unfortunately I could not attach the CSV file, so I had to paste the output above from the CSV, but can assure you the data is being pulled from CSV files.

How do I extract fields from my CSV file to graph response times?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How do I extract fields from my CSV file to graph response times?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits