Splunk Search

How do underscores affect searches?

mihelic
Path Finder

While performing a search for log messages that contain the string "URIBL_" I got a lot less hits than by grepping the same log. I used the same daily log for grep and Splunk.

Using grep:

grep -c URIBL_ /logs/an_input.log

18686

This was the search in Splunk:

index=mail source="/logs/an_input.log" uribl_

Instead of the 18,686 events it only returned 15.

I got the desired results by using the following search:

index=mail source="/logs/an_input.log" uribl_*

It correctly returns 18,686 events.

The search

index=mail source="/logs/an_input.log" uribl

returns the wanted results plus a few more since the undesrscore no longer needs to be matched.

Among the fields in the log message are: URIBL_BLACK=1.5, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.5

I tried the search with the Field discovery turned on and off. The results were the same.
It looks like there is a difference in how Splunk reacts to the underscore in searches.
I had treated it as a regular letter or number and got the wrong results.

What does the underscore actually mean when it is used in the search and how does is acffect the search process?

Tags (1)
0 Karma
1 Solution

Ayn
Legend

The reason you get hits for "uribl" but not "uribl_" is because "_" is one of the characters Splunk considers to be a delimiter when dividing incoming data into individual segments to index. Basically if you have an event that contains, say, the string "my_string_with_underscores", Splunk will create 5 segments out of this: "my", "string", "with", "underscores" and finally the whole string as well, "my_string_with_underscores". This way if you search for "with", Splunk won't first have to retrieve ALL events and then do an equivalent to grep to see which ones have the string "with" in them. Instead it can just check which events have the segment "with" in them. This is way better explained in the docs: http://docs.splunk.com/Documentation/Splunk/5.0/Data/Abouteventsegmentation

Also the documentation for segmenters.conf shows you which default values are used. http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Segmentersconf

So if you search for "URIBL_BLACK" you will get results, because that is a major segment. If you search for "URIBL" you will get results as well, because it's a minor segment in that string. If you search for "URIBL_" you will not get results because it's neither a major nor minor segment because the delimiter will not be included in the segment. I hope that clears things up at least a bit rather than add more to the confusion 🙂

View solution in original post

Ayn
Legend

The reason you get hits for "uribl" but not "uribl_" is because "_" is one of the characters Splunk considers to be a delimiter when dividing incoming data into individual segments to index. Basically if you have an event that contains, say, the string "my_string_with_underscores", Splunk will create 5 segments out of this: "my", "string", "with", "underscores" and finally the whole string as well, "my_string_with_underscores". This way if you search for "with", Splunk won't first have to retrieve ALL events and then do an equivalent to grep to see which ones have the string "with" in them. Instead it can just check which events have the segment "with" in them. This is way better explained in the docs: http://docs.splunk.com/Documentation/Splunk/5.0/Data/Abouteventsegmentation

Also the documentation for segmenters.conf shows you which default values are used. http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Segmentersconf

So if you search for "URIBL_BLACK" you will get results, because that is a major segment. If you search for "URIBL" you will get results as well, because it's a minor segment in that string. If you search for "URIBL_" you will not get results because it's neither a major nor minor segment because the delimiter will not be included in the segment. I hope that clears things up at least a bit rather than add more to the confusion 🙂

mihelic
Path Finder

Your answer filled in the missing pieces of the puzzle. No confusion added 🙂

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...