Splunk Search

Dynamic field extraction based on other field data

davidatpinger
Path Finder

My apologies if this is easy - I couldn't find a good example.

I've got some log data that is mostly nicely formatted. It's all comma separated with set fields at the beginning of a line. So far so good. The last standard field is special, because it contains a description of the remaining fields in the line, and the number or remaining fields is variable and dynamic. A sample log might look like (in a generic sense):

field1,field2,field3,(arg1Name;arg2Name;arg3Name),arg1,arg2,arg3

What I'd like is to have fields extracted such that the name of the field for arg1 is arg1Name, arg2's field name is arg2Name, etc. Note that there may be 0-many arguments (although there should always be a matching number of names for those fields in the earlier descriptive field).

I'm pretty sure there's a clever way to do this with transforms, but I haven't figured it out. I'm pretty new to splunk, so please be gentle.

Thanks much!

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Splunk> see the forest, and the trees
Splunk> extract the values, and the keys

You can tell Splunk to extract the values and their keys in transforms.conf, but since regular expressions can't count well this is a bit ugly. There's no simple way of telling a regex to pair up part X of this list with part X of that list without enumerating the N parts between that... so here's a workaround:

props.conf
[your_sourcetype]
REPORT-kvs = kv_one,kv_two,kv_three,...

transforms.conf
[kv_one]
REGEX = ,\((?<_KEY_1>\w+)\),(?<_VAL_1>[^,]+)

[kv_two]
REGEX = ,\((?<_KEY_1>\w+);(?<_KEY_2>\w+)\),(?<_VAL_1>[^,]+),(?<_VAL_2>[^,]+)

[kv_three]
REGEX = ,\((?<_KEY_1>\w+);(?<_KEY_2>\w+);(?<_KEY_3>\w+)\),(?<_VAL_1>[^,]+),(?<_VAL_2>[^,]+),(?<_VAL_3>[^,]+)

...

You may be able to make the latter parts optional for a bit of laziness, but you're not going to get around listing the maximum number of dynamic arguments at least once - and thereby figuring out and limiting yourself to this number.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Splunk> see the forest, and the trees
Splunk> extract the values, and the keys

You can tell Splunk to extract the values and their keys in transforms.conf, but since regular expressions can't count well this is a bit ugly. There's no simple way of telling a regex to pair up part X of this list with part X of that list without enumerating the N parts between that... so here's a workaround:

props.conf
[your_sourcetype]
REPORT-kvs = kv_one,kv_two,kv_three,...

transforms.conf
[kv_one]
REGEX = ,\((?<_KEY_1>\w+)\),(?<_VAL_1>[^,]+)

[kv_two]
REGEX = ,\((?<_KEY_1>\w+);(?<_KEY_2>\w+)\),(?<_VAL_1>[^,]+),(?<_VAL_2>[^,]+)

[kv_three]
REGEX = ,\((?<_KEY_1>\w+);(?<_KEY_2>\w+);(?<_KEY_3>\w+)\),(?<_VAL_1>[^,]+),(?<_VAL_2>[^,]+),(?<_VAL_3>[^,]+)

...

You may be able to make the latter parts optional for a bit of laziness, but you're not going to get around listing the maximum number of dynamic arguments at least once - and thereby figuring out and limiting yourself to this number.

davidatpinger
Path Finder

I fooled around with this quite a bit, and found this form to work pretty well too:

[kv_1]
REGEX=\((\w+)\S+\),([^,]*)
FORMAT=$1::$2

[kv_2]
REGEX=\((\w+);(\w+)\S+\),([^,]*),([^,]*)
FORMAT=$2::$4

[kv_3]
REGEX=\((\w+);(\w+);(\w+)\S+\),([^,]*),([^,]*),([^,]*)
FORMAT=$3::$6

Iterate as necessary, and then list these out in props.conf. It seems to handle nulls in the data a bit easier, and it's easier to read, to some degree.

martin_mueller
SplunkTrust
SplunkTrust

It'd be a lot easier and more generic if you had a list of pairs rather than a pair of lists for your keys and values in the event.

0 Karma

davidatpinger
Path Finder

Yes it would, but the log format and content is outside of my control. Shucks.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You can include any regex-based condition in the REGEX itself. As it is now, I'm only looking for a comma followed by parentheses... if for example that number you speak of is before the comma you could match for that in the regex and only include the correct number of key-value pairs after that. Should make executing the regexes a lot faster.

0 Karma

davidatpinger
Path Finder

Yeah - it occurred to me that all of the event type codes with (for example) three k-v pairs could all be in one regex. It should be pretty easy to put together. I still have to enumerate a bunch of cases, but not so many as I was originally thinking. This seems like an easy an extensible (if verbose) way to cover all of the possible cases.

0 Karma

davidatpinger
Path Finder

There is a fixed upper size limit (I think it may be about 12), so this should work great. Thanks!

It's also the case that the number of key/value pairs in this form can be deduced from another field. (It's basically a type ID.) For example, if field2 is '3', then there are 5 k-v pairs, but if it's '4', then there are only 3...and so on. Can the transform be made conditional on the value of another field? I suppose I can enumerate a regex that calls out the correct number of k-v pairs for each of the typeID values. That's pretty ugly but should work well. It just makes the props.conf entry really long and ugly.

Well, this should work, even if it's a big ungainly in the files. Thanks very much!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...