Archive

transforming an ip

Contributor

So i have some custom app logs that contain an ip address in the filename. I am attempting to extract them. any ideas what im missing?

when i query ip literally equals "$2.$3.$4.$5" ... literally with $ signs ... not the field matches.

props.conf

[mysource]
REPORTS-ipreplacer = ipreplacer

transforms.conf

[ipreplacer]
REGEX = ^(.*)?<orig_ip>(\d{1,3})\_(\d{1,3})\_(\d{1,3})\_(\d{1,3})(.*)
FORMAT = ip::$2.$3.$4.$5
SOURCE_KEY = source
1 Solution

Super Champion

Try using something like this:

[ipreplacer]
REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_
FORMAT = ip::$1.$2.$3.$4
SOURCE_KEY = source

Note that you will NOT be able to search on your IP address directly due to the fact that your value of your field is not directly in the index. So if your are only reporting against this field, then you should be fine. But if you want to search for a specific value of ip, then you should know that the search sourcetype=xyz ip=172.0.0.1 will not work. You can workaround this limitation by doing a secondary search command like so: sourcetype=xyz | search ip=172.0.0.1

If this doesn't work, you may need to actually make this an indexed field. I think I've run into issues like this before, and indexing (rather than extracting) was my only option. But that may have changed in newer version. I'm not sure.

If you must do a search-time extraction of this field, another (very ugly) approach is the following:

[ipreplacer]
REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_
FORMAT = _ip1::$1 _ip2::$2 _ip3::$3 _ip4::$4
SOURCE_KEY = source

Then, in your search you will have to add | eval ip=_ip1."."._ip2."."._ip3."."._ip4 to combine the parts into a whole ip address.

Another options... If you are using Splunk4.0 or higher, you can use the "sed" mode of rex to do all of this within a search command, with something like:

| rex field=source "_(?<ip>\d{1,3}_\d{1,3}_\d{1,3}\_\d{1,3})_" | rex field=ip mode=sed "s/_/./g"

If you end up needing to use any of these last couple of options, I would recommend putting this hiding all these expression within a macro. This would make things look a lot cleaner.

You may find some of the discussion here relevant to your situation:

View solution in original post

Super Champion

Try using something like this:

[ipreplacer]
REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_
FORMAT = ip::$1.$2.$3.$4
SOURCE_KEY = source

Note that you will NOT be able to search on your IP address directly due to the fact that your value of your field is not directly in the index. So if your are only reporting against this field, then you should be fine. But if you want to search for a specific value of ip, then you should know that the search sourcetype=xyz ip=172.0.0.1 will not work. You can workaround this limitation by doing a secondary search command like so: sourcetype=xyz | search ip=172.0.0.1

If this doesn't work, you may need to actually make this an indexed field. I think I've run into issues like this before, and indexing (rather than extracting) was my only option. But that may have changed in newer version. I'm not sure.

If you must do a search-time extraction of this field, another (very ugly) approach is the following:

[ipreplacer]
REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_
FORMAT = _ip1::$1 _ip2::$2 _ip3::$3 _ip4::$4
SOURCE_KEY = source

Then, in your search you will have to add | eval ip=_ip1."."._ip2."."._ip3."."._ip4 to combine the parts into a whole ip address.

Another options... If you are using Splunk4.0 or higher, you can use the "sed" mode of rex to do all of this within a search command, with something like:

| rex field=source "_(?<ip>\d{1,3}_\d{1,3}_\d{1,3}\_\d{1,3})_" | rex field=ip mode=sed "s/_/./g"

If you end up needing to use any of these last couple of options, I would recommend putting this hiding all these expression within a macro. This would make things look a lot cleaner.

You may find some of the discussion here relevant to your situation:

View solution in original post

Super Champion

Well, you still have two options. (1) create a custom drill down that takes into consideration your extra eval logic, or (2) create this an an indexed field as you suggested. (Neither option seems best. If you setup an indexed field than that will obviously take effect only for new events, so you may make things more complicated for a time by needing two different ways to get to this field, but if you don't setup an indexed field you may regret every time you end up trying to use it and are constantly work around it....)

0 Karma

Contributor

The problem with using the EVAL solution at search is ... if i want to dashboard on that value and drill back to the original event. x.x.x.x != xxx_x 😞 This needs to be a transformation at index time. thoughts?

0 Karma

Super Champion

Nick, if you know of a better solution please post it as an answer. I'd love a better way to do this. But, from everything I've seen, Splunk (up through 4.1x) does not allow field extractions where the field value is composed of multiple regex extraction groups. Yes, I fully agree that using a second search will be slower, but it doesn't work without it. (Which is why I refereed to it as a "workaround", which I feel is the most accurate description. (This does work when using an indexed field, but using indexed fields are generally discouraged.)

0 Karma

SplunkTrust
SplunkTrust

things are no longer as absolute as they were in the 3.X days, as to whether "10.10.2.35" is "in the index". Also there's no need for a second search command. ip="10.10.2.35" will work fine even if its just an extracted field. (Although adding a second search command will definitely make things slower.)

0 Karma

Contributor

I think in my conclusion.. the short answer is... 'FORMAT = ip::$1.$2.$3.$4' doesn't work.

but 'FORMAT = _ip1::$1 _ip2::$2 _ip3::$3 _ip4::$4' will work.

0 Karma

Contributor

The third suggestion only return the first octect for me. Thanks for everything this was very very helpful.

0 Karma

Contributor

The second suggestion worked like a charm.

0 Karma

Contributor

The first suggestions gave me one occurance of '$1.$2.$3.$4' so no go on that one.

0 Karma

Contributor

thisfilelogfrom127001_20100501.txt

0 Karma

Contributor

Can you give a sample original format of what you're extracting?

0 Karma