Splunk Search

Inconsistency between Splunk api vs GUI search results.

user121
Explorer

Inconsistency between Splunk api vs GUI search results.
I am using the Rest API. When I use a search language string for a search on Rest API, After isDone the search end points shows a number of results and matching events, resultCount, eventCount. But when I use the same exact search language string to do manual search on the GUI, I get a different number of matching events.
Example "search earliest=xxx latest=yyy sourcetype=zzz" on Rest API returns 100,000 matching events, but using the same search string (without the 'search' keyword) on GUI returns 300,000 matching events. The difference is big. I am not specifying any other search options, I am 100% sure of that.

Anybody know why is there such difference for a same search on Rest and GUI?

Thanks

Tags (4)
1 Solution

sideview
SplunkTrust
SplunkTrust

UPDATE:

in the end it's both quite simple and confusing. When you're using the REST API, if you're interested in the count of events and nothing more, you will have to tack on a " | stats count" on the end of your search. And when the job is done you have to hit the /results endpoint, and retrieve the value of the count field. Although the 'eventCount' property on the job looks like what you want, it will actually NOT BE ACCURATE. Once the job passes 100,000 events, and the search was submitted with the default of status_buckets=0, it knows that there is no point in continuing to run the search so it 'finalizes' the search. Yes, you might argue that the eventCount itself proceeding towards an accurate number amounts to meaningful progress so why not continue the search anyway. I guess the official answer is that properties on the job are really just meant to be internal debugging stuff, and for canonical answers you should use appropriate search language and get field values from the /results endpoints.

Anyway, when you run the same search in the flashtimeline, the reason that search does not quietly autofinalize when it passes 100,000 events, is that the UI submits the search with status_buckets=300. Whenever status_buckets is greater than 0, that means splunk has to summarize the field results (into at least one bucket), so in that case it doesnt let the search self-finalize and instead it runs to completion so that the summaries it's building will be accurate.

ORIGINAL ANSWER:

There's definitely shouldn't be a difference in the results. But there definitely is a difference in the arguments being used at some level, simply because the UI itself uses the REST API to dispatch its searches.

Unfortunately it's the POST that kicks off the jobs, otherwise the troubleshooting task would be very simple in that we could just go look in your splunkd_access log and read the arguments for ourselves.

I dont have any answers but I have more questions. 😃

Are either the first events or the last events the same in both search results?

Maybe somehow the timerange is being interpreted differently. If you go to 'inspect search job' in the UI or hit the jobs endpoint in the REST API, both jobs will have properties on them called earliestTime and latestTime. These represent the absolute-time equivalents of the time arguments you specified. Check that they are the same. Incidentally it's not best practice to set your earliest and latest in the search string when you're using the rest API. You can use the earliest and latest API args instead.

How long do the searches take to complete? It's possible that somehow a lower default threshold is being set to auto_finalize the search in the API.

Is there anything special about that sourcetype? Was this sourcetype ever renamed? Does it happen with other sourcetypes as well?

Incidentally how are you determining the eventCount for both searches?

View solution in original post

user121
Explorer

THANKS A LOT!!
Just adding status_buckets = integer to my API search query's post parameters solved the problem! 🙂

0 Karma

sideview
SplunkTrust
SplunkTrust

this is just summarizing my answer but if you add a "| stats count" onto the end of your REST search, and then when the job has finished, you make a separate request to the /results endpoint and retrieve the value of the 'count' field from the first row. I know it seems complicated. The other way is to submit your search with status_buckets set to 1, but then the search will run MUCH slower and do tons of work that you dont need. It's FAR better to take the step into the world of the search language and start with " | stats count".

0 Karma

user121
Explorer

Hello Nick, thanks for the reply.
I am adding inspects of both searches if that can give us any clues. One from API and other from GUI, I don't see any differences in there in search string, the only difference is of providers. Which I don't understand why would it use different sources if the search is run on a single platform. Anyway the Rest API has more sources and less number of results (110k), and GUI has less sources still more results (375k). The username doesn't matter, it can be one user or different user, all get the same result. And no the sourcetype is never changed, timezones are also same. API will finish the search relatively quickly (less than 30 seconds) compared to GUI (about a minutes).

Thanks!

GUI Search -

  `Search job properties

createTime  2011-06-01T07:01:16.000+00:00
cursorTime  2011-05-30T02:30:00.000+00:00
delegate    None
diskUsage   0
doneProgress    1.0
dropCount   0
eai:acl {'sharing': 'global', 'perms': {'read': ['user1'], 'write': ['user1']}, 'app': 'search', 'modifiable': 'true', 'can_write': 'true', 'owner': 'user1'}
earliestTime    2011-05-30T02:30:00.000+00:00
eventAvailableCount 10000
eventCount  375218
eventFieldCount 26
eventIsStreaming    True
eventIsTruncated    False
eventSearch search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
eventSorting    desc
isDone  True
isFailed    False
isFinalized False
isPaused    False
isPreviewEnabled    1
isRealTimeSearch    False
isSaved False
isSavedSearch   False
isZombie    False
keywords    earliest::05/30/2011:02:30:00 latest::05/30/2011:06:00:00 sourcetype::bankapp
label   None
latestTime  2011-05-30T06:00:00.000+00:00
messages    {'info': ['Your timerange was substituted based on your search string', '[splunk-tx-a1p] Your timerange was substituted based on your search string', '[splunk-tx-a2p] Your timerange was substituted based on your search string', '[splunk-tx-a3p] Your timerange was substituted based on your search string', '[splunk-nc-a2p] Your timerange was substituted based on your search string', '[splunk-nc-a3p] Your timerange was substituted based on your search string'], 'warn': ['Unable to distribute to peer named splunk-nc-a1p:8089 at uri https://splunk-nc-a1p:8089 because peer has status = "Down".']}
modifiedTime    2011-06-01T07:18:56.000+00:00
performance {'dispatch.fetch': {'duration_secs': '20.058', 'invocations': '102'}, 'command.search.typer': {'duration_secs': '0.001', 'output_count': '0', 'input_count': '0', 'invocations': '1'}, 'dispatch.timeline': {'duration_secs': '47.979', 'invocations': '102'}, 'command.search.index': {'duration_secs': '0.001', 'invocations': '1'}, 'dispatch.preview': {'duration_secs': '0.101', 'invocations': '101'}, 'command.search.tags': {'duration_secs': '0.001', 'output_count': '0', 'input_count': '0', 'invocations': '1'}, 'command.search.filter': {'duration_secs': '0.001', 'invocations': '1'}, 'command.fields': {'duration_secs': '0.001', 'output_count': '0', 'input_count': '0', 'invocations': '1'}, 'command.search': {'duration_secs': '0.002', 'output_count': '0', 'input_count': '0', 'invocations': '2'}}
priority    5
remoteSearch    litsearch ( "sourcetype::bankapp" ) _time>=1306722600.000 _time<1306735200.000 | litsearch sourcetype="bankapp" _time>=1306722600.000 _time<1306735200.000 | fields keepcolorder=t * "*" "host" "index" "source" "sourcetype" "splunk_server"
reportSearch    None
request {'time_format': '%s.%Q', 'search': 'search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00', 'required_field_list': '*', 'max_count': '10000', 'ui_dispatch_app': 'search', 'latest_time': None, 'status_buckets': '300', 'ui_dispatch_view': 'flashtimeline', 'earliest_time': None, 'auto_cancel': '100'}
resultCount 10000
resultIsStreaming   True
resultPreviewCount  10000
runDuration 73.526
scanCount   375218
search  search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
searchEarliestTime  1306722600.000000000
searchLatestTime    1306735200.000000000
searchProviders ['splunk-tx-a1p', 'splunk-tx-a2p', 'splunk-tx-a3p', 'splunk-nc-a2p', 'splunk-nc-a3p', 'splunkn-nc-a1p']
sid 1306911674.727
statusBuckets   300
ttl 555
Server info: Splunk 4.1.3, splunksearch, Wed Jun 1 07:19:41 2011; User: user1`

Rest API search -

Splunk Atom Feed: search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
Updated: 2011-06-01T06:49:28.000+00:00 Splunk build:
search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
cursorTime 1970-01-01T00:00:00.000+00:00
delegate
diskUsage 0
doneProgress 1.00000
dropCount 0
eai:acl
app search
can_write true
modifiable true
owner user3
perms
read
user3
write
user3
sharing global
earliestTime 2011-05-30T02:30:00.000+00:00
eventAvailableCount 110902
eventCount 110902
eventFieldCount 0
eventIsStreaming 1
eventIsTruncated 0
eventSearch search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
eventSorting desc
isDone 1
isFailed 0
isFinalized 0
isPaused 0
isPreviewEnabled 0
isRealTimeSearch 0
isSaved 0
isSavedSearch 0
isZombie 0
keywords earliest::05/30/2011:02:30:00 latest::05/30/2011:06:00:00 sourcetype::bankapp
label
latestTime 2011-05-30T06:00:00.000+00:00
messages
info
Your timerange was substituted based on your search string
[splunk-nc-a1p] Your timerange was substituted based on your search string
[splunk-nc-a2p] Your timerange was substituted based on your search string
[splunk-nc-a3p] Your timerange was substituted based on your search string
[splunk-tx-a1p] Your timerange was substituted based on your search string
[splunk-tx-a2p] Your timerange was substituted based on your search string
[splunk-tx-a3p] Your timerange was substituted based on your search string
performance
command.fields
duration_secs 0.001
input_count 0
invocations 1
output_count 0
command.search
duration_secs 0.002
input_count 0
invocations 2
output_count 0
command.search.filter
duration_secs 0.001
invocations 1
command.search.index
duration_secs 0.001
invocations 1
command.search.tags
duration_secs 0.001
input_count 0
invocations 1
output_count 0
command.search.typer
duration_secs 0.001
input_count 0
invocations 1
output_count 0
dispatch.fetch
duration_secs 5.373
invocations 71
dispatch.timeline
duration_secs 3.267
invocations 71
priority 5
remoteSearch litsearch ( "sourcetype::bankapp" ) _time>=1306722600.000 _time<1306735200.000 | litsearch sourcetype="bankapp" _time>=1306722600.000 _time<1306735200.000 | fields keepcolorder=t "host" "index" "source" "sourcetype" "splunk_server"
reportSearch
request
search search sourcetype="bankapp" earliest=05/30/2011:02:30:00 latest=05/30/2011:06:00:00
resultCount 110902
resultIsStreaming 1
resultPreviewCount 110902
runDuration 17.015000
scanCount 110902
searchEarliestTime 1306722600.000000000
searchLatestTime 1306735200.000000000
searchProviders
splunk-nc-a1p
splunk-nc-a2p
splunk-nc-a3p
splunk-tx-a1p
splunk-tx-a2p
splunk-tx-a3p
splunkn-tx-a1p
sid 1306910951.708
statusBuckets 0
ttl 574
events - results - results_preview - timeline - summary - control:
2011-06-01T06:49:28.000+00:00 | user3

0 Karma

sideview
SplunkTrust
SplunkTrust

Got it. I found out what was going on and updated my answer. See above.

0 Karma

sideview
SplunkTrust
SplunkTrust

UPDATE:

in the end it's both quite simple and confusing. When you're using the REST API, if you're interested in the count of events and nothing more, you will have to tack on a " | stats count" on the end of your search. And when the job is done you have to hit the /results endpoint, and retrieve the value of the count field. Although the 'eventCount' property on the job looks like what you want, it will actually NOT BE ACCURATE. Once the job passes 100,000 events, and the search was submitted with the default of status_buckets=0, it knows that there is no point in continuing to run the search so it 'finalizes' the search. Yes, you might argue that the eventCount itself proceeding towards an accurate number amounts to meaningful progress so why not continue the search anyway. I guess the official answer is that properties on the job are really just meant to be internal debugging stuff, and for canonical answers you should use appropriate search language and get field values from the /results endpoints.

Anyway, when you run the same search in the flashtimeline, the reason that search does not quietly autofinalize when it passes 100,000 events, is that the UI submits the search with status_buckets=300. Whenever status_buckets is greater than 0, that means splunk has to summarize the field results (into at least one bucket), so in that case it doesnt let the search self-finalize and instead it runs to completion so that the summaries it's building will be accurate.

ORIGINAL ANSWER:

There's definitely shouldn't be a difference in the results. But there definitely is a difference in the arguments being used at some level, simply because the UI itself uses the REST API to dispatch its searches.

Unfortunately it's the POST that kicks off the jobs, otherwise the troubleshooting task would be very simple in that we could just go look in your splunkd_access log and read the arguments for ourselves.

I dont have any answers but I have more questions. 😃

Are either the first events or the last events the same in both search results?

Maybe somehow the timerange is being interpreted differently. If you go to 'inspect search job' in the UI or hit the jobs endpoint in the REST API, both jobs will have properties on them called earliestTime and latestTime. These represent the absolute-time equivalents of the time arguments you specified. Check that they are the same. Incidentally it's not best practice to set your earliest and latest in the search string when you're using the rest API. You can use the earliest and latest API args instead.

How long do the searches take to complete? It's possible that somehow a lower default threshold is being set to auto_finalize the search in the API.

Is there anything special about that sourcetype? Was this sourcetype ever renamed? Does it happen with other sourcetypes as well?

Incidentally how are you determining the eventCount for both searches?

jdunlea_splunk
Splunk Employee
Splunk Employee

I am also using the Splunk REST API to do summary indexing (adding a " | collect index=" at the end of my search.... Does what your saying, mean that when my search runs, and encounters more than 100,000 rows from which it is THEN to summarize and populate the SI with, that it will stop searching for events after 100,000 rows, and only summarize the first 100,000 into the SI???

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...