Check this I just wrote on LinkedIn:
Hi!
I don't know how many of you had to manage data coming from logs with fixed-lenght records. This is the old cobol/mainframe like method of managing datafiles.
With Splunk there's no problem in defining field extraction with regex. If, for instance we have a schema like this:
field1 - 4chars
field2 - 8chars
and so on, we could write a simple EXTRACT like this:
[mysourcetype]
EXTRACT = ^(? .{4})(? .{8}) and so on....
This works fine in a search. So you can do something like:
* | stats count by field1
But, what happens if you try to filter by the field value? If you try something like
* field1=abcd
this won't work!!! To make it work, you have to write something like
* field1=abcd
So, all the power of fields become useless with this kind of data! Transactions are gone, subsearches are gone, most of the power of Splunk Analytics is gone!
Gone forever? Thanks god not! I found a very easy workaround to this. I know it's not a real solution to the problem, but, if you don't have a strong requirement on file integrity, this will save your life!!! The answer is: "Add fields delimiters!!!"
Splunk indexing works by indexing every single "string", which means every sequence of letters and numbers divided by some non alphanumeric character, like ".,!/!?:" and so on.
So, the solution is to modify the raw data written in the Index using such a separator.
Here is how:
1. props.conf
[mysourcetype]
TRANSFORMS = add_separators
EXTRACT = ^(? .{4}).(? .{8}). and so on....
transforms.conf
[add_separators]
DEST_KEY = _raw
SOURCE_KEY = _raw
REGEX = ^(.{4})(.8{})(.*)
FORMAT = $1.$2.$3
Now, finally, everything works back as usual!!
... View more