Splunk Search

Bizarre bug with "head"

nick405060
Motivator

1) My boss goes to upload a small .csv to my indexer
2) My boss goes to search the .csv from my search head. Results are returned
3) My boss adds head 250 to the query and nothing else. Zero results are returned

This behavior does not occur copying and pasting the exact same searches to the indexer. I can reproduce the same behavior and have restarted my search head without any luck. Obviously this is a bug that needs to be fixed, but I would also like to know why head would produce this behavior.

woodcock
Esteemed Legend

Try running your search with this at the end; is it still broken?

... |noop search_optimization=false
0 Karma

bojanjanisch
New Member

To clarify the issue a bit:

@nick405060 is running a distributed Splunk environment. He indexed a CSV-file with no _time field and wanted to output the first 250 rows but he always got 0 results. So his search was looking something like search index=abc source=test.csv | head 250 | table a, b, c, d

We researched the search.logs a bit and found an interesting entry like "SimpleResultsCombiner - 236 events were discarded due to a missing or invalid _time field".

However when he changed the search to "index=abc source=test.csv | table a,b,c,d | head 250" he got the results. He also got results if the first search was executed on an indexer. But if he executed it on a SearchHead, he never got results. So the issue must've originated somewhere in the distributed search mode.

So my guess is that if "head" is applied after a base search (with no reporting commands in between), the centralized streaming mode (https://docs.splunk.com/Documentation/Splunk/7.3.2/SearchReference/Commandsbytype) first calls the SimpleResultsCombiner trying to merge/sort events by _time field. However there is no _time field in the CSV, so events are getting skipped. If we apply it on an indexer or after a reporting command, it'll use the distributed streaming mode, which does not seem to call SimpleResultsCombiner. Or maybe it does but not trying to merge/sort results as events by _time and therefore does not skip them.

Maybe there's some explanation from the Splunk devs regarding this issue and how we could avoid these in the future?

0 Karma

nick405060
Motivator

I think you may have totally missed the doubly tabled field mentioned in our conversation.

Clearly that was the root cause of the issue here, but

1) Why should a doubly tabled field break a head 250 search but not a non-headed search?
2) Why should a doubly tabled field break a SH head 250 search but not an IN head 250 search?

What you're saying here is likely related... but I do not understand how what you table affects what you are talking about

0 Karma

bojanjanisch
New Member

Can you explain the doubly tabled field more precisely? How does this look like?

0 Karma

nick405060
Motivator
| table a b c d d e f g
0 Karma

bojanjanisch
New Member

Ah okay... so when you fixed this, your search returned results?

0 Karma

nick405060
Motivator

Yep. It never broke the indexer though, and it didn't break a non-headed search

0 Karma

nick405060
Motivator

Nick 10:53 AM
its because he tabled the same value twice. but I still don't know why head would affect it if the non-head results don't)

Bojan Janisch 10:54 AM
To me it looks like a bug in SimpleResultsCombiner
Due to the fact that head is a centralized streaming command, the SimpleResultsCombiner is executed in order to merge event results... however altough you are using index=... you are not getting events, but csv or tabled results...
If you apply head on an indexer, it runs in distributed streaming mode... meaning that SimpleResultsCombiner is not executed

Nick 10:58 AM
but you should be able to run head on a search head for an indexed csv, correct? i mean you're intended to upload csvs into an index.

Bojan Janisch 11:00 AM
Yes... even though... each event in an index should always have 4 fields... _time, sourcetype, source and host...
you need to make sure that your indexed csvs rows become events
Make sure that there is a _time column in your csv

Nick 11:04 AM
i still think this is a very unexpected behavior. it's a very simple thing that I did (upload csv and search) and even as a 2yr full-time splunk developer I didn't know i had to make my csvs events in order to use head
so people e.g. my boss would never know to do this

Bojan Janisch 11:06 AM
Yes they could point to missing _time fields in your csv during index process

@bojanjanisch

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share the working and non-working queries.

---
If this reply helps you, Karma would be appreciated.
0 Karma

woodcock
Esteemed Legend

We need to see the EXACT search. I am suspicious of this claim. Perhaps he forgot the | before head 250?

0 Karma

nick405060
Motivator

Read the post. You shoudn't be suspicious of this claim. It's because tabling the same field twice in Splunk makes the world burn

0 Karma

bojanjanisch
New Member

I posted both searches, working and failing... I only changed the labels / field values, not the commands... If you wish to continue the research I can see if I can reproduce the bug... We simply avoid the non working version

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...