I want to monitor a csv file which generated through a script and producing output as below
Below am having 4 columns one with id one with date,one with description and one with explanation in some kind of xml stuffs
123,2016-07-07 05:00:00,gooddata,somexmldata
123,2016-07-07 06:00:00,baddata,somexmldata
123,2016-07-07 07:00:00,gooddata,somexmldata
123,2016-07-07 08:00:00,baddata,somexmldata
Any help appreciated !! Thanks in advance
http://docs.splunk.com/Documentation/Splunk/6.4.2/Data/Howdoyouwanttoadddata
The oft overlooked add data wizard is a great tool to help with creating inputs and props via the gui. It allows you to play with the many props settings and see the result as you work, to ensure you get the desired outcome. Also the getting data in manual will cover indepth much of what I tried below in more detail.
To start, I put your csv fields into a text document then uploaded it to my splunk instance
I began by selecting the default csv sourcetype to build off of, then defined the csv header schema explicitly, which you may or may not have to do depending on whether the headers exist in the file.
Then I went to work on explicitly teaching splunk how to read the timestamp field.
You can then review your creation and export it to your clipboard for easy pasting to cli or just save it to the local instance.
That would get the data coming in and getting indexed. A potential "gotcha" will be the characters in your xml string. If it has commas (not sure if xml contains commas off the top of my head) you will need to choose a different delimiter or insert one in pre-parsing.
http://docs.splunk.com/Documentation/Splunk/6.4.2/Forwarding/Routeandfilterdatad
If in a distributed environment (using forwarders) be sure to put the props on your forwarder along with your inputs.conf (again something you can use the gui to create a template for, see getting data in link above for file monitor options) if you are in distributed environment to ensure that you can nullqueue the fields values you don't want.
Caveats for routing and filtering structured data[edit]
Splunk Enterprise does not parse structured data that has been forwarded to an indexer[edit]
When you forward structured data to an indexer, Splunk Enterprise does not parse this data once it arrives at the indexer, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS and its associated attributes. Forwarded data skips the following queues on the indexer, which precludes any parsing of that data on the indexer:
parsing
aggregation
typing
The forwarded data must arrive at the indexer already parsed. To achieve this, you must also set up props.conf on the forwarder that sends the data. This includes configuration of INDEXED_EXTRACTIONS and any other parsing, filtering, anonymizing, and routing rules. Universal forwarders are capable of performing these tasks solely for structured data. See "Forward data extracted from header files".
There is a good example in the docs that should cover what you need:
Discard specific events and keep the rest
This example discards all sshd events in /var/log/messages by sending them to nullQueue:
1. In props.conf, set the TRANSFORMS-null attribute:
[source::/var/log/messages]
TRANSFORMS-null= setnull
2. Create a corresponding stanza in transforms.conf. Set DEST_KEY to "queue" and FORMAT to "nullQueue":
[setnull]
REGEX = \[sshd\]
DEST_KEY = queue
FORMAT = nullQueue
I haven't tried it yet, but I am assuming regex provided in the above documentation would work for your scenario by filtering whatever baddata looks like, or there may be an option to reference the field name from your header in the regex. to tighten it up. Would have to test or have one of our other splunk community superstars chime in.
Hope this helps get you started! I will update if/when I have a chance to test the fitering in the lab.