1) My boss goes to upload a small .csv to my indexer
2) My boss goes to search the .csv from my search head. Results are returned
3) My boss adds head 250
to the query and nothing else. Zero results are returned
This behavior does not occur copying and pasting the exact same searches to the indexer. I can reproduce the same behavior and have restarted my search head without any luck. Obviously this is a bug that needs to be fixed, but I would also like to know why head
would produce this behavior.
Try running your search with this at the end; is it still broken?
... |noop search_optimization=false
To clarify the issue a bit:
@nick405060 is running a distributed Splunk environment. He indexed a CSV-file with no _time
field and wanted to output the first 250 rows but he always got 0 results. So his search was looking something like search index=abc source=test.csv | head 250 | table a, b, c, d
We researched the search.logs a bit and found an interesting entry like "SimpleResultsCombiner - 236 events were discarded due to a missing or invalid _time field"
.
However when he changed the search to "index=abc source=test.csv | table a,b,c,d | head 250" he got the results. He also got results if the first search was executed on an indexer. But if he executed it on a SearchHead, he never got results. So the issue must've originated somewhere in the distributed search mode.
So my guess is that if "head" is applied after a base search (with no reporting commands in between), the centralized streaming mode (https://docs.splunk.com/Documentation/Splunk/7.3.2/SearchReference/Commandsbytype) first calls the SimpleResultsCombiner trying to merge/sort events by _time field. However there is no _time field in the CSV, so events are getting skipped. If we apply it on an indexer or after a reporting command, it'll use the distributed streaming mode, which does not seem to call SimpleResultsCombiner. Or maybe it does but not trying to merge/sort results as events by _time and therefore does not skip them.
Maybe there's some explanation from the Splunk devs regarding this issue and how we could avoid these in the future?
I think you may have totally missed the doubly tabled field mentioned in our conversation.
Clearly that was the root cause of the issue here, but
1) Why should a doubly tabled field break a head 250
search but not a non-headed search?
2) Why should a doubly tabled field break a SH head 250
search but not an IN head 250
search?
What you're saying here is likely related... but I do not understand how what you table affects what you are talking about
Can you explain the doubly tabled field more precisely? How does this look like?
| table a b c d d e f g
Ah okay... so when you fixed this, your search returned results?
Yep. It never broke the indexer though, and it didn't break a non-headed search
Nick 10:53 AM
its because he tabled the same value twice. but I still don't know why head would affect it if the non-head results don't)
Bojan Janisch 10:54 AM
To me it looks like a bug in SimpleResultsCombiner
Due to the fact that head is a centralized streaming command, the SimpleResultsCombiner is executed in order to merge event results... however altough you are using index=... you are not getting events, but csv or tabled results...
If you apply head on an indexer, it runs in distributed streaming mode... meaning that SimpleResultsCombiner is not executed
Nick 10:58 AM
but you should be able to run head on a search head for an indexed csv, correct? i mean you're intended to upload csvs into an index.
Bojan Janisch 11:00 AM
Yes... even though... each event in an index should always have 4 fields... _time, sourcetype, source and host...
you need to make sure that your indexed csvs rows become events
Make sure that there is a _time column in your csv
Nick 11:04 AM
i still think this is a very unexpected behavior. it's a very simple thing that I did (upload csv and search) and even as a 2yr full-time splunk developer I didn't know i had to make my csvs events in order to use head
so people e.g. my boss would never know to do this
Bojan Janisch 11:06 AM
Yes they could point to missing _time fields in your csv during index process
@bojanjanisch
Please share the working and non-working queries.
We need to see the EXACT search. I am suspicious of this claim. Perhaps he forgot the |
before head 250
?
Read the post. You shoudn't be suspicious of this claim. It's because tabling the same field twice in Splunk makes the world burn
I posted both searches, working and failing... I only changed the labels / field values, not the commands... If you wish to continue the research I can see if I can reproduce the bug... We simply avoid the non working version