I am trying to do a search for a number of strings that are hex encoded. For example, http would be stored as 68747470.
However, I am having an issue in that nothing is pulled up. For example, if I search:
wscript.exe 68747470
nothing pulls up. However, if I search:
wscript.exe 68747470*
it works. I suspect that the hex characters are being converted to a string with the wildcard, but since I have multiple ones I want to search for I don't think that would be very efficient.
Any suggestions for how to search for just the hex encoded string as stated?
What does your raw events look like? (paste sample, remember to mask any sensitive information) My guess is the issue is related to segmentation. If your raw event contains say http://localhost.my.domain.com.
, after Event Segmentation, Splunk creates may searchable segements like http
, localhost
,my
, domain
etc (period being segment delimiter). If your raw data contains hex equivalent of http://localhost.my.domain.com.
as continuous numeric numbers, the segmentation process may be treating it as one big segment and thus you need to use regex or asterisk wildcard character.
Update: I found if I use regex for this, it works. However, would still love to hear any other suggestions.
@trevlix -
The way the indexes work, they are storing information about which directory pages each "word" is found on. What they call it in rexx is each "token" -- any set of letters surrounded by white space. In splunk, my observation is that is seems to be any set of word characters surrounded by word boundaries, so in a regular expression it might be \b\w+\b
.
(Update - @somesoni2 has kindly pointed out that the term used here in splunk is "segment" and that this page https://docs.splunk.com/Documentation/SplunkCloud/6.6.1/Data/Abouteventsegmentation is an entree into the subject. The lists of major and minor breaking characters that demarcate segment boundaries are specified in segmenters.conf)
When you search for http
without an asterisk, then you will get only those events where http
is an entire token by itself... not the ones with https
.
Is this the effect you are experiencing?
Overall, that is actually the opposite of what I experience. If I search for "http", then all strings that contain http, including https.
However, my issue here is that splunk appears to be interpreting the hex string as a number that is in hex format, and not a string of characters. I'm trying to figure out how to force splunk to interpret it as a string. I have also surrounded the string with double and single quotes to no effect.
Hmmmm. I just tested a general search for records on index=foo
containing the first 6 letters of my userid (foobar
) anywhere in the record, and there were none, whereas the entire userid (foobarz
) or the first 6 plus *
(foobar*
) both yielded results.
That doesn't change your underlying issue, though, so I'll defer to wiser heads.