Inconsistency between Splunk api vs GUI search results.
I am using the Rest API. When I use a search language string for a search on Rest API, After isDone the search end points shows a number of results and matching events, resultCount, eventCount. But when I use the same exact search language string to do manual search on the GUI, I get a different number of matching events.
Example "search earliest=xxx latest=yyy sourcetype=zzz" on Rest API returns 100,000 matching events, but using the same search string (without the 'search' keyword) on GUI returns 300,000 matching events. The difference is big. I am not specifying any other search options, I am 100% sure of that.
Anybody know why is there such difference for a same search on Rest and GUI?
Thanks
UPDATE:
in the end it's both quite simple and confusing. When you're using the REST API, if you're interested in the count of events and nothing more, you will have to tack on a " | stats count" on the end of your search. And when the job is done you have to hit the /results endpoint, and retrieve the value of the count field. Although the 'eventCount' property on the job looks like what you want, it will actually NOT BE ACCURATE. Once the job passes 100,000 events, and the search was submitted with the default of status_buckets=0, it knows that there is no point in continuing to run the search so it 'finalizes' the search. Yes, you might argue that the eventCount itself proceeding towards an accurate number amounts to meaningful progress so why not continue the search anyway. I guess the official answer is that properties on the job are really just meant to be internal debugging stuff, and for canonical answers you should use appropriate search language and get field values from the /results endpoints.
Anyway, when you run the same search in the flashtimeline, the reason that search does not quietly autofinalize when it passes 100,000 events, is that the UI submits the search with status_buckets=300. Whenever status_buckets is greater than 0, that means splunk has to summarize the field results (into at least one bucket), so in that case it doesnt let the search self-finalize and instead it runs to completion so that the summaries it's building will be accurate.
ORIGINAL ANSWER:
There's definitely shouldn't be a difference in the results. But there definitely is a difference in the arguments being used at some level, simply because the UI itself uses the REST API to dispatch its searches.
Unfortunately it's the POST that kicks off the jobs, otherwise the troubleshooting task would be very simple in that we could just go look in your splunkd_access log and read the arguments for ourselves.
I dont have any answers but I have more questions. 😃
Are either the first events or the last events the same in both search results?
Maybe somehow the timerange is being interpreted differently. If you go to 'inspect search job' in the UI or hit the jobs endpoint in the REST API, both jobs will have properties on them called earliestTime and latestTime. These represent the absolute-time equivalents of the time arguments you specified. Check that they are the same. Incidentally it's not best practice to set your earliest and latest in the search string when you're using the rest API. You can use the earliest and latest API args instead.
How long do the searches take to complete? It's possible that somehow a lower default threshold is being set to auto_finalize the search in the API.
Is there anything special about that sourcetype? Was this sourcetype ever renamed? Does it happen with other sourcetypes as well?
Incidentally how are you determining the eventCount for both searches?
THANKS A LOT!!
Just adding status_buckets = integer
to my API search query's post parameters solved the problem! 🙂
this is just summarizing my answer but if you add a "| stats count" onto the end of your REST search, and then when the job has finished, you make a separate request to the /results endpoint and retrieve the value of the 'count' field from the first row. I know it seems complicated. The other way is to submit your search with status_buckets set to 1, but then the search will run MUCH slower and do tons of work that you dont need. It's FAR better to take the step into the world of the search language and start with " | stats count".
Hello Nick, thanks for the reply.
I am adding inspects of both searches if that can give us any clues. One from API and other from GUI, I don't see any differences in there in search string, the only difference is of providers. Which I don't understand why would it use different sources if the search is run on a single platform. Anyway the Rest API has more sources and less number of results (110k), and GUI has less sources still more results (375k). The username doesn't matter, it can be one user or different user, all get the same result. And no the sourcetype is never changed, timezones are also same. API will finish the search relatively quickly (less than 30 seconds) compared to GUI (about a minutes).
Thanks!
GUI Search -
`Search job properties
createTime 2011-06-01T07:01:16.000+00:00
cursorTime 2011-05-30T02:30:00.000+00:00
delegate None
diskUsage 0
doneProgress 1.0
dropCount 0
eai:acl {'sharing': 'global', 'perms': {'read': ['user1'], 'write': ['user1']}, 'app': 'search', 'modifiable': 'true', 'can_write': 'true', 'owner': 'user1'}
earliestTime 2011-05-30T02:30:00.000+00:00
eventAvailableCount 10000
eventCount 375218
eventFieldCount 26
eventIsStreaming True
eventIsTruncated False
eventSearch search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
eventSorting desc
isDone True
isFailed False
isFinalized False
isPaused False
isPreviewEnabled 1
isRealTimeSearch False
isSaved False
isSavedSearch False
isZombie False
keywords earliest::05/30/2011:02:30:00 latest::05/30/2011:06:00:00 sourcetype::bankapp
label None
latestTime 2011-05-30T06:00:00.000+00:00
messages {'info': ['Your timerange was substituted based on your search string', '[splunk-tx-a1p] Your timerange was substituted based on your search string', '[splunk-tx-a2p] Your timerange was substituted based on your search string', '[splunk-tx-a3p] Your timerange was substituted based on your search string', '[splunk-nc-a2p] Your timerange was substituted based on your search string', '[splunk-nc-a3p] Your timerange was substituted based on your search string'], 'warn': ['Unable to distribute to peer named splunk-nc-a1p:8089 at uri https://splunk-nc-a1p:8089 because peer has status = "Down".']}
modifiedTime 2011-06-01T07:18:56.000+00:00
performance {'dispatch.fetch': {'duration_secs': '20.058', 'invocations': '102'}, 'command.search.typer': {'duration_secs': '0.001', 'output_count': '0', 'input_count': '0', 'invocations': '1'}, 'dispatch.timeline': {'duration_secs': '47.979', 'invocations': '102'}, 'command.search.index': {'duration_secs': '0.001', 'invocations': '1'}, 'dispatch.preview': {'duration_secs': '0.101', 'invocations': '101'}, 'command.search.tags': {'duration_secs': '0.001', 'output_count': '0', 'input_count': '0', 'invocations': '1'}, 'command.search.filter': {'duration_secs': '0.001', 'invocations': '1'}, 'command.fields': {'duration_secs': '0.001', 'output_count': '0', 'input_count': '0', 'invocations': '1'}, 'command.search': {'duration_secs': '0.002', 'output_count': '0', 'input_count': '0', 'invocations': '2'}}
priority 5
remoteSearch litsearch ( "sourcetype::bankapp" ) _time>=1306722600.000 _time<1306735200.000 | litsearch sourcetype="bankapp" _time>=1306722600.000 _time<1306735200.000 | fields keepcolorder=t * "*" "host" "index" "source" "sourcetype" "splunk_server"
reportSearch None
request {'time_format': '%s.%Q', 'search': 'search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00', 'required_field_list': '*', 'max_count': '10000', 'ui_dispatch_app': 'search', 'latest_time': None, 'status_buckets': '300', 'ui_dispatch_view': 'flashtimeline', 'earliest_time': None, 'auto_cancel': '100'}
resultCount 10000
resultIsStreaming True
resultPreviewCount 10000
runDuration 73.526
scanCount 375218
search search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
searchEarliestTime 1306722600.000000000
searchLatestTime 1306735200.000000000
searchProviders ['splunk-tx-a1p', 'splunk-tx-a2p', 'splunk-tx-a3p', 'splunk-nc-a2p', 'splunk-nc-a3p', 'splunkn-nc-a1p']
sid 1306911674.727
statusBuckets 300
ttl 555
Server info: Splunk 4.1.3, splunksearch, Wed Jun 1 07:19:41 2011; User: user1`
Rest API search -
Splunk Atom Feed: search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
Updated: 2011-06-01T06:49:28.000+00:00 Splunk build:
search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
cursorTime 1970-01-01T00:00:00.000+00:00
delegate
diskUsage 0
doneProgress 1.00000
dropCount 0
eai:acl
app search
can_write true
modifiable true
owner user3
perms
read
user3
write
user3
sharing global
earliestTime 2011-05-30T02:30:00.000+00:00
eventAvailableCount 110902
eventCount 110902
eventFieldCount 0
eventIsStreaming 1
eventIsTruncated 0
eventSearch search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
eventSorting desc
isDone 1
isFailed 0
isFinalized 0
isPaused 0
isPreviewEnabled 0
isRealTimeSearch 0
isSaved 0
isSavedSearch 0
isZombie 0
keywords earliest::05/30/2011:02:30:00 latest::05/30/2011:06:00:00 sourcetype::bankapp
label
latestTime 2011-05-30T06:00:00.000+00:00
messages
info
Your timerange was substituted based on your search string
[splunk-nc-a1p] Your timerange was substituted based on your search string
[splunk-nc-a2p] Your timerange was substituted based on your search string
[splunk-nc-a3p] Your timerange was substituted based on your search string
[splunk-tx-a1p] Your timerange was substituted based on your search string
[splunk-tx-a2p] Your timerange was substituted based on your search string
[splunk-tx-a3p] Your timerange was substituted based on your search string
performance
command.fields
duration_secs 0.001
input_count 0
invocations 1
output_count 0
command.search
duration_secs 0.002
input_count 0
invocations 2
output_count 0
command.search.filter
duration_secs 0.001
invocations 1
command.search.index
duration_secs 0.001
invocations 1
command.search.tags
duration_secs 0.001
input_count 0
invocations 1
output_count 0
command.search.typer
duration_secs 0.001
input_count 0
invocations 1
output_count 0
dispatch.fetch
duration_secs 5.373
invocations 71
dispatch.timeline
duration_secs 3.267
invocations 71
priority 5
remoteSearch litsearch ( "sourcetype::bankapp" ) _time>=1306722600.000 _time<1306735200.000 | litsearch sourcetype="bankapp" _time>=1306722600.000 _time<1306735200.000 | fields keepcolorder=t "host" "index" "source" "sourcetype" "splunk_server"
reportSearch
request
search search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
resultCount 110902
resultIsStreaming 1
resultPreviewCount 110902
runDuration 17.015000
scanCount 110902
searchEarliestTime 1306722600.000000000
searchLatestTime 1306735200.000000000
searchProviders
splunk-nc-a1p
splunk-nc-a2p
splunk-nc-a3p
splunk-tx-a1p
splunk-tx-a2p
splunk-tx-a3p
splunkn-tx-a1p
sid 1306910951.708
statusBuckets 0
ttl 574
events - results - results_preview - timeline - summary - control:
2011-06-01T06:49:28.000+00:00 | user3
Got it. I found out what was going on and updated my answer. See above.
UPDATE:
in the end it's both quite simple and confusing. When you're using the REST API, if you're interested in the count of events and nothing more, you will have to tack on a " | stats count" on the end of your search. And when the job is done you have to hit the /results endpoint, and retrieve the value of the count field. Although the 'eventCount' property on the job looks like what you want, it will actually NOT BE ACCURATE. Once the job passes 100,000 events, and the search was submitted with the default of status_buckets=0, it knows that there is no point in continuing to run the search so it 'finalizes' the search. Yes, you might argue that the eventCount itself proceeding towards an accurate number amounts to meaningful progress so why not continue the search anyway. I guess the official answer is that properties on the job are really just meant to be internal debugging stuff, and for canonical answers you should use appropriate search language and get field values from the /results endpoints.
Anyway, when you run the same search in the flashtimeline, the reason that search does not quietly autofinalize when it passes 100,000 events, is that the UI submits the search with status_buckets=300. Whenever status_buckets is greater than 0, that means splunk has to summarize the field results (into at least one bucket), so in that case it doesnt let the search self-finalize and instead it runs to completion so that the summaries it's building will be accurate.
ORIGINAL ANSWER:
There's definitely shouldn't be a difference in the results. But there definitely is a difference in the arguments being used at some level, simply because the UI itself uses the REST API to dispatch its searches.
Unfortunately it's the POST that kicks off the jobs, otherwise the troubleshooting task would be very simple in that we could just go look in your splunkd_access log and read the arguments for ourselves.
I dont have any answers but I have more questions. 😃
Are either the first events or the last events the same in both search results?
Maybe somehow the timerange is being interpreted differently. If you go to 'inspect search job' in the UI or hit the jobs endpoint in the REST API, both jobs will have properties on them called earliestTime and latestTime. These represent the absolute-time equivalents of the time arguments you specified. Check that they are the same. Incidentally it's not best practice to set your earliest and latest in the search string when you're using the rest API. You can use the earliest and latest API args instead.
How long do the searches take to complete? It's possible that somehow a lower default threshold is being set to auto_finalize the search in the API.
Is there anything special about that sourcetype? Was this sourcetype ever renamed? Does it happen with other sourcetypes as well?
Incidentally how are you determining the eventCount for both searches?
I am also using the Splunk REST API to do summary indexing (adding a " | collect index=