Due to the nature of the data, we can't use any delimiters.
The data layout is as follows by character position.
Name = 1-8
Department = 9-12
Location= 13-24
New Department = 25-28
Status = 29-30
Is there a way to specify the lookup definition based on these character position?
The field names are not an issue. Knowing that the data is abstracted and/or encrypted is enough.
Splunk CAN bring in and process binary files...
Assuming the data is all in one 30 byte field, then this would extract the binary-valued fields...
| rex "^(?<Name>.{8})(?<Department>.{4})(?<Location>.{12})(?<New Department>.{4})(?<Status>.{2})$"
...but I'm just not sure what other gotchas there might be involved with just slapping that data into a lookup and trying to use it as is.
I am TEMPTED to think in terms of having each of those fields except Status being converted into and represented by one to three 4-byte numbers. I know that would perform the function without issue, but I don't know if I'm introducing unneeded complexity that the vanilla system would handle straight out of the box.
I SUSPECT, based on other questions and answers about binary data, that splunk just isn't architected to handle it very well.
The best option that I can suggest is to convert the binary data into display-hex, thus taking up twice as much space, but consisting only of [0-9A-F]
. Then it can be treated as character data.
The field names are not an issue. Knowing that the data is abstracted and/or encrypted is enough.
Splunk CAN bring in and process binary files...
Assuming the data is all in one 30 byte field, then this would extract the binary-valued fields...
| rex "^(?<Name>.{8})(?<Department>.{4})(?<Location>.{12})(?<New Department>.{4})(?<Status>.{2})$"
...but I'm just not sure what other gotchas there might be involved with just slapping that data into a lookup and trying to use it as is.
I am TEMPTED to think in terms of having each of those fields except Status being converted into and represented by one to three 4-byte numbers. I know that would perform the function without issue, but I don't know if I'm introducing unneeded complexity that the vanilla system would handle straight out of the box.
I SUSPECT, based on other questions and answers about binary data, that splunk just isn't architected to handle it very well.
The best option that I can suggest is to convert the binary data into display-hex, thus taking up twice as much space, but consisting only of [0-9A-F]
. Then it can be treated as character data.
Much appreciated @DalJeanis
Yes, but no.
First, there is no reason your delimiter can't be something not possible to be present in the data, such as "!!!!".
Second, unless the data is encrypted, those fields don't present as data types that would necessarily include ALL OF the special characters... semicolons, exclamation points, commas, @ # $ ^ & and so on.
So, what's up here?
Great - thank you @DalJeanis
Instead of the field names mentioned before please consider the following -
Field1 = positions 01-08
Field2 = positions 09-12
Field3 = positions 13-24
Field4 = positions 25-28
Field5 = positions 29-30
These fields may contain any combinations of characters (displayable and non-displayable) including special characters. So there are no combinations of characters that could reliably be used as a delimiter field.
So the question is - Can a lookup table be built from a structured file where the records are fixed length as defined before, and how?
Are you indexing this data? Do you want to use data as-is as lookup table file?
We would like to use the data as-is...
Will the system have to deal with any binary zeroes x'00' in the data?