Basically, I want to perform a regex search for a number that is, for example, 50 digits long, but I know for sure that there are fields that contain similar numbers (apart from fields and just free xml code that has 50 digit long numbers I need) and I am not interested in them. I know that they have wrong information I need, so I want regex to not search these fields. How do I do that?
Surfing Splunk answers for a valid answer has not yielded any results.. "fields - something" does not work, it seems to serve a different purpose.
Simple example for more clarity:
Let's take this JSON log record:
{
"freetext": "this is freetext",
"some-nums": 1513513671,
"needed-nums": 5156716983,
}
Suppose there are thousands of such log records and I am searching for particular ten-digit numbers, but I know for sure that "some-nums" field does not contain my numbers at all times, so I want to exclude it from searching. In Splunk pseudo-code this could look like this:
index=myindex | excludefields some-nums | regex "\d{10}"
Ideally, this search should show me only log records where there is a "needed-nums" field and it contains a ten-digit number, but NOT those logs where there is no "needed-nums" field and only "some-nums", since the latter is irrelevant.
I do not know if there is a ready-made solution for this in Splunk, but I am looking for something pretty simple here..
Thanks!
So, I'm going to make some assumptions (which I will explain in this posting), but I think I know kind of what you are wanting to do. There are some simple solutions to your problem that you might be able to use. Sorry for this being so long and drawn out, but I thought it good to explain things as I perceived them.
Assumption #1
You stated the following:
but I know for sure that "some-nums" field does not contain my numbers at all times
I'm going to assume that you meant something like:
but I know for sure that the "some-nums" field never contains numbers I care about
Assumption #2
I think that you also want to look for any field that might contain a large number, but you want to exclude those fields that you don't care about (see Assumption #1).
Assumption #3
(This is relevant for Solution #1 only) You just want a list of the numbers in the data, not caring what the field name actually is. I don't know why I'm making this assumption other than that you haven't stated anything to the contrary and it seemed like a logical assumption given what you described.
Assumption #4
(This is relevant for Solution #2 only) You only want to get the events that contain numbers you care about, because that is what the pseudo-search-code you wrote appears to describe.
Solution #1
Let's get rid of the values in the fields you don't care about with:
| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | rex max_match=0 "(?P<need_nums>\d{10,50})"
This results in a field called need_nums
that is multi-valued with all the numbers between 10 and 50 digits within the data, excluding the field called some-nums
(and you could add others you wish to exclude with more rex mode=sed
commands).
Solution #2
The following will still produce the event as a matching event:
| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"
But the following will not include the event that can't match (some-nums
is long enough, but the other two are not, so it won't match some-nums
since it has been changed with the rex mode=sed
command):
| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 515671698, \"other-nums\": 123456789 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"
This will work on the thousands of lines from your search, but this example just shows it working as I described.
Conclusion
Hopefully one of these two approaches will at least help you out in creating a solution.
So, I'm going to make some assumptions (which I will explain in this posting), but I think I know kind of what you are wanting to do. There are some simple solutions to your problem that you might be able to use. Sorry for this being so long and drawn out, but I thought it good to explain things as I perceived them.
Assumption #1
You stated the following:
but I know for sure that "some-nums" field does not contain my numbers at all times
I'm going to assume that you meant something like:
but I know for sure that the "some-nums" field never contains numbers I care about
Assumption #2
I think that you also want to look for any field that might contain a large number, but you want to exclude those fields that you don't care about (see Assumption #1).
Assumption #3
(This is relevant for Solution #1 only) You just want a list of the numbers in the data, not caring what the field name actually is. I don't know why I'm making this assumption other than that you haven't stated anything to the contrary and it seemed like a logical assumption given what you described.
Assumption #4
(This is relevant for Solution #2 only) You only want to get the events that contain numbers you care about, because that is what the pseudo-search-code you wrote appears to describe.
Solution #1
Let's get rid of the values in the fields you don't care about with:
| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | rex max_match=0 "(?P<need_nums>\d{10,50})"
This results in a field called need_nums
that is multi-valued with all the numbers between 10 and 50 digits within the data, excluding the field called some-nums
(and you could add others you wish to exclude with more rex mode=sed
commands).
Solution #2
The following will still produce the event as a matching event:
| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"
But the following will not include the event that can't match (some-nums
is long enough, but the other two are not, so it won't match some-nums
since it has been changed with the rex mode=sed
command):
| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 515671698, \"other-nums\": 123456789 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"
This will work on the thousands of lines from your search, but this example just shows it working as I described.
Conclusion
Hopefully one of these two approaches will at least help you out in creating a solution.
Thank you for the tips, I have managed to achieve my goal by excluding the irrelevant fields by using rex mode=sed and replacing numbers with some short message like "irrelevant". That is what I was looking for 🙂
I thought that was what your eventual goal was. I'm glad I waited to get the clarification in your original question. Glad it helped.
I am not at all sure that I get what you mean but maybe this:
index=myindex 'needed-nums'="*" | regex 'needed-nums'="\d{10}"
Let's suppose that your desired number is, say, between 50 and 65 digits long, somewhere in the _raw data.
Let's say that there is an already extracted field, foo, that may also contain numbers that may be that large. Additionally, there is a field, bar, that also contain numbers that may be that large, and that you want to ignore foo and bar.
Here is some run-anywhere code that shows how to make it happen...
| makeresults
| eval _raw = "this one is too short 1234567890123456789012345678901234567890 this one is too long 1234567890123456789012345678901234567890123456789012345678901234567890 this one is foo: 123456123456123456123456123456123456123456123456123456123456 this one is bar: 12345671234567123456712345671234567123456712345671234567 and the one we want is here -> 987654321987654321987654321987654321987654321987654321"
| eval foo="123456123456123456123456123456123456123456123456123456123456"
| rename COMMENT as "The below will capture all words made up entirely of 50 to 65 digits."
| rex field=_raw "\b(?<maybenumbers>\d{50,65})\b" max_match=0
| rename COMMENT as "The below will extract bar."
| rex field=_raw "bar:\s+\b(?<bar>\d+)\b"
| rename COMMENT as "The below will spread out all the various values that were captured, elminate foo and bar, then put the remaining values back together."
| streamstats count as recno
| mvexpand maybenumbers
| where maybenumbers!=foo AND maybenumbers!=bar
| stats values(*) as * by recno
Please supply sample data. Without it, suggestions are stabs in the dark. And solutions may be very easy with knowing more about the data.