I have this TCP input that is coming into my system and the fields are fixed width and sometimes the fields are blank.
I need help with my props and transforms so I can read the data correctly and the have fields properly extracted
Has anyone had any experience with this. Also from the stream not all the fields have data they are just blank but the spacing remains the same.
Check this I just wrote on LinkedIn:
Hi!
I don't know how many of you had to manage data coming from logs with fixed-lenght records. This is the old cobol/mainframe like method of managing datafiles.
With Splunk there's no problem in defining field extraction with regex. If, for instance we have a schema like this:
field1 - 4chars
field2 - 8chars
and so on, we could write a simple EXTRACT like this:
[mysourcetype]
EXTRACT = ^(?
This works fine in a search. So you can do something like:
* | stats count by field1
But, what happens if you try to filter by the field value? If you try something like
* field1=abcd
this won't work!!! To make it work, you have to write something like
* field1=abcd
So, all the power of fields become useless with this kind of data! Transactions are gone, subsearches are gone, most of the power of Splunk Analytics is gone!
Gone forever? Thanks god not! I found a very easy workaround to this. I know it's not a real solution to the problem, but, if you don't have a strong requirement on file integrity, this will save your life!!! The answer is: "Add fields delimiters!!!"
Splunk indexing works by indexing every single "string", which means every sequence of letters and numbers divided by some non alphanumeric character, like ".,!/!?:" and so on.
So, the solution is to modify the raw data written in the Index using such a separator.
Here is how:
1. props.conf
[mysourcetype]
TRANSFORMS = add_separators
EXTRACT = ^(?
Now, finally, everything works back as usual!!
and for the empty fields you can use the "fillnull" command.
I've had to index data like this in the past and I've always just used a very specific regex to do it. Shoretel CDR data for instance is fixed-width and many of the fields are often blank. Yet that didn't stop us from creating a pretty sweet app for that data - http://sideviewapps.com/apps/splunk-for-shoretel/
Getting the regex to the point where it is reliable even when one or more fields are absent in the given row can be done and tools such as RegexBuddy is utterly instrumental in testing out the representative cases in your data.