I have to admit not knowing much about your environment and your search, nor the exact particulars of these inner workings, so I need to hedge and note that this is a bit of an educated guess:
When I go to a url such as https://mysplunk/en-US/app/myapp/search?sid=searchId
Splunk simply has to go to disk, pull the set of cached results and display them to me. I suspect this is probably even optimized to pull only the displayed page size set of results back to me as the end user. A relatively simple operation with all the results on disk already.
Comparatively, if I execute a search such as | loadjob searchId Splunk would have to setup a search pipeline, copy the millions of results that you said you had into the pipeline, write the millions of results as output of the pipeline, tear down the pipeline, and then display the results back to me, which would be a function that seems like it would be very dependent on the read/write disk speed, and the memory of your search head. As you have more results, and more fields per result in the original search, the memory and disk requirements grow, but ideally you are doing some pre-statistics in your original search so that you are saving time (but of course that's a function of what your original and add-on searches are. (Specifying by saved search, means there's just a lookup to pull the results from the latest executed set of that search, which shouldn't be much additional time).
Ideally the cost of running | loadjob is cheaper than going back to the indexers and pulling search results, but that's dependent on how you're splitting your original and add-on searches, and size and dimensions of the data set you're pulling using loadjob compared to the original. Does that seem to make sense? (and does anyone have more specifics than this handwavyness that I'm doing right here?)