Hi All,
I have written a python HTTP downloader which is pulling down multiple zip files and extracting the contents then feeding them to a TCP port on Splunk.
Inside each zip are a whole bunch of csv files with the format
header1, header2, header3, aTimestampIwant1
data1, data2, data3, dateData1
data1, data2, data3, dateData1
etc....
Now the python script is unzipping this file and creating a sourcetype based on the filename. It is also building a Splunk friendly format for Splunk to consume. This entire string is then sent to a TCP port that Splunk is listening on.
Splunk recieves something like this in the one connection
***SPLUNK*** host=myhost, source=theOriginalFilenameFromTheZip, sourcetype=extractedFromFilename\r\n
header1=data1, header2=data2, header3=data3, aTimestampIwant1=dateData1\r\n
header1=data1, header2=data2, header3=data3, aTimestampIwant1=dateData1\r\n
etc.....
now when I look at the data in Splunk it has the correct source and sourcetype.... but..
There are a few things that I need to resolve.
Now I know that some will answer create something in props.conf for each type of file. I am trying to avoid this as there are over 30 different types of files.
If I can get this to work then it will allow this script to handle new file(source) types in the future should they start getting fed into the stream.
Any help would be greatly appreciated.
Did you manage to get a solution for this?
Here is a screenshot of the data if that clears things up.