So i have some custom app logs that contain an ip address in the filename. I am attempting to extract them. any ideas what im missing?
when i query ip literally equals "$2.$3.$4.$5" ... literally with $ signs ... not the field matches.
props.conf
[mysource]
REPORTS-ipreplacer = ipreplacer
transforms.conf
[ipreplacer]
REGEX = ^(.*)?<orig_ip>(\d{1,3})\_(\d{1,3})\_(\d{1,3})\_(\d{1,3})(.*)
FORMAT = ip::$2.$3.$4.$5
SOURCE_KEY = source
Try using something like this:
[ipreplacer] REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_ FORMAT = ip::$1.$2.$3.$4 SOURCE_KEY = source
Note that you will NOT be able to search on your IP address directly due to the fact that your value of your field is not directly in the index.
So if your are only reporting against this field, then you should be fine. But if you want to search for a specific value of ip
, then you should know that the search sourcetype=xyz ip=172.0.0.1
will not work. You can workaround this limitation by doing a secondary search
command like so: sourcetype=xyz | search ip=172.0.0.1
If this doesn't work, you may need to actually make this an indexed field. I think I've run into issues like this before, and indexing (rather than extracting) was my only option. But that may have changed in newer version. I'm not sure.
If you must do a search-time extraction of this field, another (very ugly) approach is the following:
[ipreplacer] REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_ FORMAT = _ip1::$1 _ip2::$2 _ip3::$3 _ip4::$4 SOURCE_KEY = source
Then, in your search you will have to add | eval ip=_ip1."."._ip2."."._ip3."."._ip4
to combine the parts into a whole ip address.
Another options... If you are using Splunk4.0 or higher, you can use the "sed" mode of rex
to do all of this within a search command, with something like:
| rex field=source "_(?<ip>\d{1,3}_\d{1,3}_\d{1,3}\_\d{1,3})_" | rex field=ip mode=sed "s/_/./g"
If you end up needing to use any of these last couple of options, I would recommend putting this hiding all these expression within a macro. This would make things look a lot cleaner.
You may find some of the discussion here relevant to your situation:
Try using something like this:
[ipreplacer] REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_ FORMAT = ip::$1.$2.$3.$4 SOURCE_KEY = source
Note that you will NOT be able to search on your IP address directly due to the fact that your value of your field is not directly in the index.
So if your are only reporting against this field, then you should be fine. But if you want to search for a specific value of ip
, then you should know that the search sourcetype=xyz ip=172.0.0.1
will not work. You can workaround this limitation by doing a secondary search
command like so: sourcetype=xyz | search ip=172.0.0.1
If this doesn't work, you may need to actually make this an indexed field. I think I've run into issues like this before, and indexing (rather than extracting) was my only option. But that may have changed in newer version. I'm not sure.
If you must do a search-time extraction of this field, another (very ugly) approach is the following:
[ipreplacer] REGEX = _(\d{1,3})_(\d{1,3})_(\d{1,3})_(\d{1,3})_ FORMAT = _ip1::$1 _ip2::$2 _ip3::$3 _ip4::$4 SOURCE_KEY = source
Then, in your search you will have to add | eval ip=_ip1."."._ip2."."._ip3."."._ip4
to combine the parts into a whole ip address.
Another options... If you are using Splunk4.0 or higher, you can use the "sed" mode of rex
to do all of this within a search command, with something like:
| rex field=source "_(?<ip>\d{1,3}_\d{1,3}_\d{1,3}\_\d{1,3})_" | rex field=ip mode=sed "s/_/./g"
If you end up needing to use any of these last couple of options, I would recommend putting this hiding all these expression within a macro. This would make things look a lot cleaner.
You may find some of the discussion here relevant to your situation:
Well, you still have two options. (1) create a custom drill down that takes into consideration your extra eval logic, or (2) create this an an indexed field as you suggested. (Neither option seems best. If you setup an indexed field than that will obviously take effect only for new events, so you may make things more complicated for a time by needing two different ways to get to this field, but if you don't setup an indexed field you may regret every time you end up trying to use it and are constantly work around it....)
The problem with using the EVAL solution at search is ... if i want to dashboard on that value and drill back to the original event. x.x.x.x != x_x_x_x 😞 This needs to be a transformation at index time. thoughts?
Nick, if you know of a better solution please post it as an answer. I'd love a better way to do this. But, from everything I've seen, Splunk (up through 4.1x) does not allow field extractions where the field value is composed of multiple regex extraction groups. Yes, I fully agree that using a second search
will be slower, but it doesn't work without it. (Which is why I refereed to it as a "workaround", which I feel is the most accurate description. (This does work when using an indexed field, but using indexed fields are generally discouraged.)
things are no longer as absolute as they were in the 3.X days, as to whether "10.10.2.35" is "in the index". Also there's no need for a second search command. ip="10.10.2.35" will work fine even if its just an extracted field. (Although adding a second search command will definitely make things slower.)
I think in my conclusion.. the short answer is... 'FORMAT = ip::$1.$2.$3.$4' doesn't work.
but 'FORMAT = _ip1::$1 _ip2::$2 _ip3::$3 _ip4::$4' will work.
The third suggestion only return the first octect for me. Thanks for everything this was very very helpful.
The second suggestion worked like a charm.
The first suggestions gave me one occurance of '$1.$2.$3.$4' so no go on that one.
thisfilelogfrom_127_0_0_1_20100501.txt
Can you give a sample original format of what you're extracting?