Getting Data In

TCP input - tcp-raw not splitting rows by newline

phoenixdigital
Builder

Hi All,

I have written a python HTTP downloader which is pulling down multiple zip files and extracting the contents then feeding them to a TCP port on Splunk.

Inside each zip are a whole bunch of csv files with the format

header1, header2, header3, aTimestampIwant1
data1, data2, data3, dateData1
data1, data2, data3, dateData1
etc....

Now the python script is unzipping this file and creating a sourcetype based on the filename. It is also building a Splunk friendly format for Splunk to consume. This entire string is then sent to a TCP port that Splunk is listening on.

Splunk recieves something like this in the one connection

***SPLUNK*** host=myhost, source=theOriginalFilenameFromTheZip, sourcetype=extractedFromFilename\r\n
header1=data1, header2=data2, header3=data3, aTimestampIwant1=dateData1\r\n
header1=data1, header2=data2, header3=data3, aTimestampIwant1=dateData1\r\n
etc.....

now when I look at the data in Splunk it has the correct source and sourcetype.... but..

There are a few things that I need to resolve.

  1. Each 'event' in Splunk is the entire message of all rows of data. They are not split by the newlines I am passing through the stream
  2. It appears dates/times are not being translated when a column is of a common date type.
  3. more importantly the timestamp(_time) is not being found and it is using the time the data was recieved.

Now I know that some will answer create something in props.conf for each type of file. I am trying to avoid this as there are over 30 different types of files.

If I can get this to work then it will allow this script to handle new file(source) types in the future should they start getting fed into the stream.

Any help would be greatly appreciated.

isaacvb
Explorer

Did you manage to get a solution for this?

0 Karma

phoenixdigital
Builder

Here is a screenshot of the data if that clears things up.

http://i56.tinypic.com/xmmyh1.gif

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...