Archive

is there a limit when using |loadjob?

ng87
Path Finder

Im coming across what i find unusual. When using the |loadjob sid events=true i find that this does not load all the events from the previous search . For example i had a search that had around 2mil results . When i use |loadjob sid events=true it only loads up around 25K instead of the full result list. In other tests i got varying numbers of results loaded from 10K up to 100K but never the full result list.

Am i doing something wrong or could this limit have been set by the Splunk admins?

Tags (1)
0 Karma
1 Solution

inventsekar
Super Champion

For any given search, Splunk will only retain a limited number of raw events, i.e., actual event data pulled out of the index by the search command. The results of a search -- what is produced by evaluating the entire search string -- are completely preserved along side of field summary and timeline information. The data contained in the *.csv.gz files is capped by settings in limits.conf.

To illustrate the necessity for this, consider this simple example:

search source="*apache_access.log" 200

If your index contains 2 billion events that match the search, storing all 2 billion events every time you run that search would consume the entire storage system in no time. Generally speaking, the 2 billion row data set is not what you're after -- it's the summarized or transformed version that is of interest.

Note that the limitation described here does not mean that Splunk cannot handle lots of events. The search language will process all events asked of it, but will abide by these practical safety controls and not cache all of the raw data.

For perspective, search for a word like computer in Google. Even though Google reports that there are 705,000,000 hits, you will only ever be able to access up to 1,000 results because it is implausible to store that result in its entirely (aside from the fact that nobody is actually going to look at 705M links).

------ from a similar post

View solution in original post

inventsekar
Super Champion

For any given search, Splunk will only retain a limited number of raw events, i.e., actual event data pulled out of the index by the search command. The results of a search -- what is produced by evaluating the entire search string -- are completely preserved along side of field summary and timeline information. The data contained in the *.csv.gz files is capped by settings in limits.conf.

To illustrate the necessity for this, consider this simple example:

search source="*apache_access.log" 200

If your index contains 2 billion events that match the search, storing all 2 billion events every time you run that search would consume the entire storage system in no time. Generally speaking, the 2 billion row data set is not what you're after -- it's the summarized or transformed version that is of interest.

Note that the limitation described here does not mean that Splunk cannot handle lots of events. The search language will process all events asked of it, but will abide by these practical safety controls and not cache all of the raw data.

For perspective, search for a word like computer in Google. Even though Google reports that there are 705,000,000 hits, you will only ever be able to access up to 1,000 results because it is implausible to store that result in its entirely (aside from the fact that nobody is actually going to look at 705M links).

------ from a similar post

View solution in original post

ng87
Path Finder

That makes sense, thanks a lot

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!