Splunk Search

How to make regex not search specific fields?

funghorn
Explorer

Basically, I want to perform a regex search for a number that is, for example, 50 digits long, but I know for sure that there are fields that contain similar numbers (apart from fields and just free xml code that has 50 digit long numbers I need) and I am not interested in them. I know that they have wrong information I need, so I want regex to not search these fields. How do I do that?
Surfing Splunk answers for a valid answer has not yielded any results.. "fields - something" does not work, it seems to serve a different purpose.

Simple example for more clarity:

Let's take this JSON log record:

{
    "freetext": "this is freetext",
    "some-nums": 1513513671,
    "needed-nums": 5156716983,
}

Suppose there are thousands of such log records and I am searching for particular ten-digit numbers, but I know for sure that "some-nums" field does not contain my numbers at all times, so I want to exclude it from searching. In Splunk pseudo-code this could look like this:

index=myindex | excludefields some-nums | regex "\d{10}"

Ideally, this search should show me only log records where there is a "needed-nums" field and it contains a ten-digit number, but NOT those logs where there is no "needed-nums" field and only "some-nums", since the latter is irrelevant.
I do not know if there is a ready-made solution for this in Splunk, but I am looking for something pretty simple here..

Thanks!

0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

So, I'm going to make some assumptions (which I will explain in this posting), but I think I know kind of what you are wanting to do. There are some simple solutions to your problem that you might be able to use. Sorry for this being so long and drawn out, but I thought it good to explain things as I perceived them.

Assumption #1
You stated the following:

but I know for sure that "some-nums" field does not contain my numbers at all times

I'm going to assume that you meant something like:

but I know for sure that the "some-nums" field never contains numbers I care about

Assumption #2
I think that you also want to look for any field that might contain a large number, but you want to exclude those fields that you don't care about (see Assumption #1).

Assumption #3
(This is relevant for Solution #1 only) You just want a list of the numbers in the data, not caring what the field name actually is. I don't know why I'm making this assumption other than that you haven't stated anything to the contrary and it seemed like a logical assumption given what you described.

Assumption #4
(This is relevant for Solution #2 only) You only want to get the events that contain numbers you care about, because that is what the pseudo-search-code you wrote appears to describe.

Solution #1
Let's get rid of the values in the fields you don't care about with:

| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | rex max_match=0 "(?P<need_nums>\d{10,50})"

This results in a field called need_nums that is multi-valued with all the numbers between 10 and 50 digits within the data, excluding the field called some-nums (and you could add others you wish to exclude with more rex mode=sed commands).

Solution #2
The following will still produce the event as a matching event:

| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"

But the following will not include the event that can't match (some-nums is long enough, but the other two are not, so it won't match some-nums since it has been changed with the rex mode=sed command):

| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 515671698, \"other-nums\": 123456789 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"

This will work on the thousands of lines from your search, but this example just shows it working as I described.

Conclusion
Hopefully one of these two approaches will at least help you out in creating a solution.

View solution in original post

cpetterborg
SplunkTrust
SplunkTrust

So, I'm going to make some assumptions (which I will explain in this posting), but I think I know kind of what you are wanting to do. There are some simple solutions to your problem that you might be able to use. Sorry for this being so long and drawn out, but I thought it good to explain things as I perceived them.

Assumption #1
You stated the following:

but I know for sure that "some-nums" field does not contain my numbers at all times

I'm going to assume that you meant something like:

but I know for sure that the "some-nums" field never contains numbers I care about

Assumption #2
I think that you also want to look for any field that might contain a large number, but you want to exclude those fields that you don't care about (see Assumption #1).

Assumption #3
(This is relevant for Solution #1 only) You just want a list of the numbers in the data, not caring what the field name actually is. I don't know why I'm making this assumption other than that you haven't stated anything to the contrary and it seemed like a logical assumption given what you described.

Assumption #4
(This is relevant for Solution #2 only) You only want to get the events that contain numbers you care about, because that is what the pseudo-search-code you wrote appears to describe.

Solution #1
Let's get rid of the values in the fields you don't care about with:

| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | rex max_match=0 "(?P<need_nums>\d{10,50})"

This results in a field called need_nums that is multi-valued with all the numbers between 10 and 50 digits within the data, excluding the field called some-nums (and you could add others you wish to exclude with more rex mode=sed commands).

Solution #2
The following will still produce the event as a matching event:

| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 5156716983, \"other-nums\": 12345678901234567890 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"

But the following will not include the event that can't match (some-nums is long enough, but the other two are not, so it won't match some-nums since it has been changed with the rex mode=sed command):

| makeresults | eval _raw="{ \"freetext\": \"this is freetext\", \"some-nums\": 1513513671, \"needed-nums\": 515671698, \"other-nums\": 123456789 }" | rex mode=sed "s/(some-nums\": )\d*/\\1/" | regex "\d{10,50}"

This will work on the thousands of lines from your search, but this example just shows it working as I described.

Conclusion
Hopefully one of these two approaches will at least help you out in creating a solution.

funghorn
Explorer

Thank you for the tips, I have managed to achieve my goal by excluding the irrelevant fields by using rex mode=sed and replacing numbers with some short message like "irrelevant". That is what I was looking for 🙂

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

I thought that was what your eventual goal was. I'm glad I waited to get the clarification in your original question. Glad it helped.

0 Karma

inventsekar
SplunkTrust
SplunkTrust
  1. may we know why you would like to exclude fields and search only on particular field? is it for performance considerations? is your search taking long time to finish?
  2. per my understanding, you need not worry about excluding the fields, just search for the required fields and rex it. as Splunk training says "inclusion search is better than exclusion search" (to search for the passed student, you need not search for "all students who are not failed". you should search for just passed students.)
  3. maybe, give some more clear details with some sample data.
0 Karma

woodcock
Esteemed Legend

I am not at all sure that I get what you mean but maybe this:

index=myindex 'needed-nums'="*" | regex 'needed-nums'="\d{10}"
0 Karma

DalJeanis
Legend

Let's suppose that your desired number is, say, between 50 and 65 digits long, somewhere in the _raw data.

Let's say that there is an already extracted field, foo, that may also contain numbers that may be that large. Additionally, there is a field, bar, that also contain numbers that may be that large, and that you want to ignore foo and bar.

Here is some run-anywhere code that shows how to make it happen...

| makeresults 
| eval _raw = "this one is too short 1234567890123456789012345678901234567890 this one is too long 1234567890123456789012345678901234567890123456789012345678901234567890 this one is foo:  123456123456123456123456123456123456123456123456123456123456 this one is bar: 12345671234567123456712345671234567123456712345671234567  and the one we want is here ->  987654321987654321987654321987654321987654321987654321"
| eval foo="123456123456123456123456123456123456123456123456123456123456"

| rename COMMENT as "The below will capture all words made up entirely of 50 to 65 digits."  
| rex field=_raw "\b(?<maybenumbers>\d{50,65})\b" max_match=0

| rename COMMENT as "The below will extract bar."  
| rex field=_raw "bar:\s+\b(?<bar>\d+)\b" 

| rename COMMENT as "The below will spread out all the various values that were captured, elminate foo and bar, then put the remaining values back together."  
| streamstats count as recno
| mvexpand maybenumbers
| where maybenumbers!=foo AND maybenumbers!=bar
| stats values(*) as * by recno
0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Please supply sample data. Without it, suggestions are stabs in the dark. And solutions may be very easy with knowing more about the data.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...