We are using a CSV input, which generates indexed extractions - some of the field values contain spaces.
Here is some walklex output that shows the values captured in the .tsidx
1887 2 product_categorization_tier_2::security systems
1888 3 product_categorization_tier_2::server systems
1889 1 product_categorization_tier_2::system software
However all of the following search terms fail:
product_categorization_tier_2::security systems
product_categorization_tier_2::"security systems"
"product_categorization_tier_2::security systems"
but product_categorization_tier_2::security*
works.
Where the searches work (in fields with no spaces) there is a noticeable improvement in search performance.
Looking at the job inspector, it looks like something is going wrong as the search term is translated to a remote search:
The search: search index=xxxxxx product_categorization_tier_2::"security systems"
Becomes: litsearch index=xxxxxx product_categorization_tier_2:: "security systems" |
Note the extra space after the double colon.
So - am I using an incorrect format for dealing with the spaces or is this a bug? (Splunk 6.3.0)
Indexed terms are just that...terms. They may look like fields with a :: operator (name::value vs name=value) but they are not fields. They are just explicitly formatted terms in TSIDX and within a given term, spaces are just characters like any other.
When processing SPL, however:
So Splunk is interpreting each of your "non-working" SPL examples as either a search for
product_categorization_tier_2::security AND systems
or product_categorization_tier_2:: AND security AND systems
as follows:
product_categorization_tier_2::security systems
(2 terms) because spaces separate termsproduct_categorization_tier_2::"security systems"
(3 terms) because when Splunk sees non-escaped quotes it assumes the intention is to begin a new (list of) term(s) even if there is no space and quotes are ignored when searching TSIDX"product_categorization_tier_2::security systems"
(2 terms) because quotes are ignored when searching TSIDXproduct_categorization_tier_2::security*
, in the other hand, will match any term beginning with product_categorization_tier_2::security
whether or not the remainder of the term contains spaces.
To achieve what you are trying to do use TERM(product_categorization_tier_2::security systems)
which causes everything in the parentheses to be considered a single term, including the space.
Indexed terms are just that...terms. They may look like fields with a :: operator (name::value vs name=value) but they are not fields. They are just explicitly formatted terms in TSIDX and within a given term, spaces are just characters like any other.
When processing SPL, however:
So Splunk is interpreting each of your "non-working" SPL examples as either a search for
product_categorization_tier_2::security AND systems
or product_categorization_tier_2:: AND security AND systems
as follows:
product_categorization_tier_2::security systems
(2 terms) because spaces separate termsproduct_categorization_tier_2::"security systems"
(3 terms) because when Splunk sees non-escaped quotes it assumes the intention is to begin a new (list of) term(s) even if there is no space and quotes are ignored when searching TSIDX"product_categorization_tier_2::security systems"
(2 terms) because quotes are ignored when searching TSIDXproduct_categorization_tier_2::security*
, in the other hand, will match any term beginning with product_categorization_tier_2::security
whether or not the remainder of the term contains spaces.
To achieve what you are trying to do use TERM(product_categorization_tier_2::security systems)
which causes everything in the parentheses to be considered a single term, including the space.
In Splunk 7.0.3.1, using TERM() didn't return results , but escaping the space did:
product_categorization_tier_2::security\ systems
Have you tried searching for product_categorization_tier_2="security systems" ? That is field=value, like you have for index=xxxxxx
Yes - that works but as noted in the question where :: works (no spaces in the values) the searches are significantly quicker than the equivilent search using =
Interesting I didn't know you could do that (other readers see here: http://docs.splunk.com/Documentation/Splunk/6.3.3/Search/Usefieldstoretrieveevents look for "double colon")
I can't get it working either. I tries escaping with
\" or "
I'm only getting very minor performance improvements when searching for fields like key::no_space
- milliseconds differences compared to key=value over millions of results. Are you seeing more dramatic improvements?
Yes and I think it is down to the order of filtering and the application of calculated fields and lookups.
For a search that ends up returning 48,532 events (historical so no change in the data between runs) the :: search does
1.99 command.search.calcfields 13 48,532 48,532
0.77 command.search.lookups 13 48,532 48,532
Compared to the = search doing
2.58 command.search.calcfields 12 58,963 58,963
1.05 command.search.lookups 12 58,963 58,963
The more events the :: discards the better the performance improvement (and I do appreciate that moving the lookups and field calculations into the search would probably yield the same results but the value of that enrichment always being available is pretty high)
It was more of a "does anyone know of an existing bug - before I keep digging" kind of question. Next step is to copy some of the data and test on 6.3.2 before bothering support.
Sometimes - on some queries I have seen a 20% difference on 30-60 seconds of execution time. Looking at the job inspector the key difference seems to be that the :: filter applies before lookups and field calculations, so that less overall "work" gets done - if you have no auto lookups or calculated fields that may be why we see such different results.
The more events that the :: operation filters out the better the performance gain
For a historical search - so the same number of events are involved (58,963 in scope, 48532 matching) I see the :: search do:
1.99 command.search.calcfields 13 48,532 48,532
and
0.77 command.search.lookups 13 48,532 48,532
While the = search does:
2.58 command.search.calcfields 12 58,963 58,963
and
1.05 command.search.lookups 12 58,963 58,963
I need to try on 6.3.2 before raising a support ticket.