Splunk Search

Why does my search fail when searching indexed extractions with double colons where the value contains spaces?

mevans292
New Member

We are using a CSV input, which generates indexed extractions - some of the field values contain spaces.

Here is some walklex output that shows the values captured in the .tsidx

1887 2 product_categorization_tier_2::security systems
1888 3 product_categorization_tier_2::server systems
1889 1 product_categorization_tier_2::system software

However all of the following search terms fail:

product_categorization_tier_2::security systems
product_categorization_tier_2::"security systems"
"product_categorization_tier_2::security systems"

but product_categorization_tier_2::security* works.

Where the searches work (in fields with no spaces) there is a noticeable improvement in search performance.

Looking at the job inspector, it looks like something is going wrong as the search term is translated to a remote search:

The search: search index=xxxxxx product_categorization_tier_2::"security systems"

Becomes: litsearch index=xxxxxx product_categorization_tier_2:: "security systems" |

Note the extra space after the double colon.

So - am I using an incorrect format for dealing with the spaces or is this a bug? (Splunk 6.3.0)

0 Karma
1 Solution

ybongart_splunk
Splunk Employee
Splunk Employee

Indexed terms are just that...terms. They may look like fields with a :: operator (name::value vs name=value) but they are not fields. They are just explicitly formatted terms in TSIDX and within a given term, spaces are just characters like any other.

When processing SPL, however:

  • spaces separate terms
  • the quotes around a list of terms are ignored when searching TSIDX

So Splunk is interpreting each of your "non-working" SPL examples as either a search for
product_categorization_tier_2::security AND systems or product_categorization_tier_2:: AND security AND systems
as follows:

  • product_categorization_tier_2::security systems (2 terms) because spaces separate terms
  • product_categorization_tier_2::"security systems" (3 terms) because when Splunk sees non-escaped quotes it assumes the intention is to begin a new (list of) term(s) even if there is no space and quotes are ignored when searching TSIDX
  • "product_categorization_tier_2::security systems" (2 terms) because quotes are ignored when searching TSIDX

product_categorization_tier_2::security*, in the other hand, will match any term beginning with product_categorization_tier_2::security whether or not the remainder of the term contains spaces.

To achieve what you are trying to do use TERM(product_categorization_tier_2::security systems) which causes everything in the parentheses to be considered a single term, including the space.

View solution in original post

ybongart_splunk
Splunk Employee
Splunk Employee

Indexed terms are just that...terms. They may look like fields with a :: operator (name::value vs name=value) but they are not fields. They are just explicitly formatted terms in TSIDX and within a given term, spaces are just characters like any other.

When processing SPL, however:

  • spaces separate terms
  • the quotes around a list of terms are ignored when searching TSIDX

So Splunk is interpreting each of your "non-working" SPL examples as either a search for
product_categorization_tier_2::security AND systems or product_categorization_tier_2:: AND security AND systems
as follows:

  • product_categorization_tier_2::security systems (2 terms) because spaces separate terms
  • product_categorization_tier_2::"security systems" (3 terms) because when Splunk sees non-escaped quotes it assumes the intention is to begin a new (list of) term(s) even if there is no space and quotes are ignored when searching TSIDX
  • "product_categorization_tier_2::security systems" (2 terms) because quotes are ignored when searching TSIDX

product_categorization_tier_2::security*, in the other hand, will match any term beginning with product_categorization_tier_2::security whether or not the remainder of the term contains spaces.

To achieve what you are trying to do use TERM(product_categorization_tier_2::security systems) which causes everything in the parentheses to be considered a single term, including the space.

mschaaf
Path Finder

In Splunk 7.0.3.1, using TERM() didn't return results , but escaping the space did:
product_categorization_tier_2::security\ systems

jplumsdaine22
Influencer

Have you tried searching for product_categorization_tier_2="security systems" ? That is field=value, like you have for index=xxxxxx

0 Karma

mevans292
New Member

Yes - that works but as noted in the question where :: works (no spaces in the values) the searches are significantly quicker than the equivilent search using =

0 Karma

jplumsdaine22
Influencer

Interesting I didn't know you could do that (other readers see here: http://docs.splunk.com/Documentation/Splunk/6.3.3/Search/Usefieldstoretrieveevents look for "double colon")

I can't get it working either. I tries escaping with

\" or  " 
  • no joy.

I'm only getting very minor performance improvements when searching for fields like key::no_space - milliseconds differences compared to key=value over millions of results. Are you seeing more dramatic improvements?

0 Karma

mevans292
New Member

Yes and I think it is down to the order of filtering and the application of calculated fields and lookups.

For a search that ends up returning 48,532 events (historical so no change in the data between runs) the :: search does

1.99    command.search.calcfields   13  48,532  48,532
0.77    command.search.lookups  13  48,532  48,532

Compared to the = search doing

2.58    command.search.calcfields   12  58,963  58,963
1.05    command.search.lookups  12  58,963  58,963

The more events the :: discards the better the performance improvement (and I do appreciate that moving the lookups and field calculations into the search would probably yield the same results but the value of that enrichment always being available is pretty high)

It was more of a "does anyone know of an existing bug - before I keep digging" kind of question. Next step is to copy some of the data and test on 6.3.2 before bothering support.

0 Karma

mevans292
New Member

Sometimes - on some queries I have seen a 20% difference on 30-60 seconds of execution time. Looking at the job inspector the key difference seems to be that the :: filter applies before lookups and field calculations, so that less overall "work" gets done - if you have no auto lookups or calculated fields that may be why we see such different results.

The more events that the :: operation filters out the better the performance gain

For a historical search - so the same number of events are involved (58,963 in scope, 48532 matching) I see the :: search do:

1.99 command.search.calcfields 13 48,532 48,532
and
0.77 command.search.lookups 13 48,532 48,532

While the = search does:

2.58 command.search.calcfields 12 58,963 58,963
and
1.05 command.search.lookups 12 58,963 58,963

I need to try on 6.3.2 before raising a support ticket.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...