Splunk Search

Sub-Search in a join is truncating returned events at 50000

justinfranks
Path Finder

Hello Peoples..

I have this issue with a search, here is the search that I am performing.

source="*playbackinit.log" s=139 | join type=inner pid [search source="*playbackevent.log" s=139 e="start"]

I get 49,614 events returned and I would expect to see more. The job shows this message:

 [subsearch]: Subsearch produced 50000 results, truncating to maxout 50000

I have tried to change this behaviour in limits.conf, here is an excerpt from my limits.conf file:

[join]
subsearch_maxout = 500000
subsearch_maxtime = 120
subsearch_timeout = 180

Nothing I change seems to change this limit. Are there any suggestions here I could try?

Regards,

Justin.

Tags (3)
0 Karma
1 Solution

cmerriman
Super Champion

i agree with @lguinn and @gvmorley in the sense that joins aren't the best method if you can avoid them

 source="*playbackinit.log" s=139 OR (source="*playbackevent.log" e="start")|stats values(*) as * by pid

something like this might get you what you want.

View solution in original post

cmerriman
Super Champion

i agree with @lguinn and @gvmorley in the sense that joins aren't the best method if you can avoid them

 source="*playbackinit.log" s=139 OR (source="*playbackevent.log" e="start")|stats values(*) as * by pid

something like this might get you what you want.

justinfranks
Path Finder

Hi cmerriman,

Thanks for the reply. I don't think that will give me what I want.

The logs are coming from streaming servers. The playbackinit log registers when a customer clicks on a video and a player ID (pid) is generated. If they actually click play, it is logged in the playbackevent log.

The trick to it, is that the customer info is only logged in the playbackinit log, so in order to see who has clicked play on a video, i have done an inner join.

If there is a better way to do it, i'd love to hear it.

Cheers,

Justin

0 Karma

cmerriman
Super Champion

I might not be fully understanding, so sorry if I'm not. Both playbackint.log and playbackevent.log have a pid that connects the two? the first log stores initializing information and the second stores playback information, basically. you're only wanting to show the customers that have actually started content. my stats command up top should give you everything. every field in both sources for every pid. all you would need to do is add in a dc(source) as sourcesinto the stats command and then |search sources=2 and it should give you only the events where the pid was in both sources.

source="*playbackinit.log" s=139 OR (source="*playbackevent.log" e="start")|stats dc(source) as sources values(*) as * by pid|search sources=2

justinfranks
Path Finder

Thanks for this! You are a life saver.. this appears to have worked perfectly.

0 Karma

justinfranks
Path Finder

Thanks for following up!

I am having troubles getting this to work. Splunk does not seem to be adding in the results from the OR section.

Running just the initial search yields 13,748 events over the last 30 days;

source="*playbackinit.log" s=139 OR (source="*playbackevent.log" s=139 e="start")

if I run them separately I get 13,748 and 8156 respectively.

Any idea where I am going wrong?

Cheers,

Justin

0 Karma

starcher
SplunkTrust
SplunkTrust

This? I think your logical grouping may be off. (A AND B) OR (C AND D) vs A AND B OR ( C AND D).

( source="*playbackinit.log" s=139) OR (source="*playbackevent.log" s=139 e="start")

justinfranks
Path Finder

Ahh you are correct. That fixed the operator issue (ie me) 😛

0 Karma

lguinn2
Legend

As @gvmorley said, what exactly are you trying to see?

Will this search give you what you want?

(source="*playbackinit.log") OR ( source="*playbackevent.log" e="start") s=139
| sort pid _time
0 Karma

justinfranks
Path Finder

Hi Iguinn,

I don't think it will, let me explain.

The logs are coming from streaming servers. The playbackinit log registers when a customer clicks on a video and a player ID (pid) is generated. If they actually click play, it is logged in the playbackevent log.

The trick to it, is that the customer info is only logged in the playbackinit log, so in order to see who has clicked play on a video, i have done an inner join.

If there is a better way to do it, i'd love to hear it.

Cheers,

Justin

0 Karma

gvmorley
Contributor

Hi Justin,

Not an answer, just curious what you're trying to achieve.

On the face of it, it would appear that your search (in the join) would simply be a subset of your main search. The addition of "e=start" and the use of an inner join feels like a refinement of the first search.

Could you simply filter your main search in some way to get the results that you're looking for?

If you've got some example data and what results you're trying to get to, maybe there's a different way to resolve this one without the join at all?

0 Karma

justinfranks
Path Finder

Hi gvmorley,

I'd love to be able to simplify the search but I don't think I can.

The logs are coming from streaming servers. The playbackinit log registers when a customer clicks on a video and a player ID (pid) is generated. If they actually click play, it is logged in the playbackevent log.

The trick to it, is that the customer info is only logged in the playbackinit log, so in order to see who has clicked play on a video, i have done an inner join.

If there is a better way to do it, i'd love to hear it.

Cheers,

Justin

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...