Splunk Search

Am I bumping into limits issue with subsearch results?

castle1126
Communicator

Hi,

I have come across an issue similar to this link on Answers: (http://answers.splunk.com/questions/3092/cant-get-past-subsearch-limit). If I run a simple search (sourcetype=smtp | fields messageid) over a set amount of time I get over 15,000 results. When I change this search to make it a subsearch

index=smtp sourcetype=smtp [search index=smtp sourcetype=smtp rule=x | fields + messageid] | transaction messageid | search NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject

I end up getting the error "Error in 'UnifiedSearch': Unable to parse the 'The specified search is too large. Please try to simplify your search.' search."

The contents of the limits.conf are: [subsearch] maxout = 100000

[format] maxresults = 100000

After reading the link above I've tried adding the pipe to the subsearch for "format maxresults=20000", but I continue to get this error.

The server is running Splunk 4.1.5 (64 bit on RHEL 5.4). What do you think I'm missing?

Thanks!

Tags (2)
0 Karma
1 Solution

sideview
SplunkTrust
SplunkTrust

Well what's happening is that the 15,000 rows are being turned into a gigantic OR clause for

( messageid=A OR messageid=B OR messageid=C OR ....)

and there's 15,000 of them. Since I dont see a dedup or a stats command in your search you might just be specifying the same messageid's over and over again. At any rate at some rather large length, splunkd's search language parser gets a bit unhappy.

Tacking a | dedup messageid on the end of the subsearch might help a lot. Or it might help only a little. Or if they're already unique it wont help at all.

UPDATE: well from our comments below, you actually do have a ton of messageids, so you are indeed coming up against the subsearch limits. An alternative is to have a scheduled search running all the time that puts the rule="x"

My apologies for my previous suggestions which were off base (and which I've deleted from this answer), and this one may be equally unhelpful, but it seems you could use stats to create transactions for everything, and then do the rule="x" filtering afterwards.

index=smtp sourcetype=smtp | stats values(ip) as ip values(envfrom) as envfrom values(envto) as envto values(rule) as rule values(subject) as subject values(score) as score by messageid | search rule="x" NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject

View solution in original post

sideview
SplunkTrust
SplunkTrust

Well what's happening is that the 15,000 rows are being turned into a gigantic OR clause for

( messageid=A OR messageid=B OR messageid=C OR ....)

and there's 15,000 of them. Since I dont see a dedup or a stats command in your search you might just be specifying the same messageid's over and over again. At any rate at some rather large length, splunkd's search language parser gets a bit unhappy.

Tacking a | dedup messageid on the end of the subsearch might help a lot. Or it might help only a little. Or if they're already unique it wont help at all.

UPDATE: well from our comments below, you actually do have a ton of messageids, so you are indeed coming up against the subsearch limits. An alternative is to have a scheduled search running all the time that puts the rule="x"

My apologies for my previous suggestions which were off base (and which I've deleted from this answer), and this one may be equally unhelpful, but it seems you could use stats to create transactions for everything, and then do the rule="x" filtering afterwards.

index=smtp sourcetype=smtp | stats values(ip) as ip values(envfrom) as envfrom values(envto) as envto values(rule) as rule values(subject) as subject values(score) as score by messageid | search rule="x" NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject

castle1126
Communicator

Nick, I've tried using the search that you added to your updated answer, but still no luck. I do not get the information required. Any other ideas?

0 Karma

castle1126
Communicator

Nick, unfortunately neither search works for me. The mail gateway we are using does not put all values in one record/line for me. Each line has a different value I need (from, to, ip, subject, etc). The line that has the rule is a line unto itself with no other fields I'm looking for. So the results of the first search "rule=x" never returns an IP, subject, etc. This is why I run the initial search as a subsearch, then feed the messageid field back to then do the transaction with which then gives me the fields I'm looking for - from, to, etc.

0 Karma

sideview
SplunkTrust
SplunkTrust

OK. Well backing up even further I realized I cant even see why you need a subsearch at all. 😃 Check out my updated answer which may help.

0 Karma

castle1126
Communicator

Hey Nick, what actually is returned from the subsearch is tens of thousands of rows - each row different from the others, with a unique messageid, different timestamps, etc. So running DEDUP won't help in this situation, unfortunately.

I'm wondering if I'm relegated to having to run the search over a smaller timeframe. In my case not run the search over a month window, but run one search for each day of the month. Not the most efficient way of getting the output, but not sure what else to do.

0 Karma

sideview
SplunkTrust
SplunkTrust

No no, if there's literally only one messageid coming out, but tens of thousands of ROWS with that messageid, then the dedup will help IMMENSELY. You're asking splunkd to run a search that's just (messageid=A OR messageid=A OR messageid=A OR messageid=A OR ....). The dedup should solve your problem completely.

0 Karma

castle1126
Communicator

As you can see from the inner search, I'm only grabbing the messageid where the rule=x. So the messageid field coming out of the subsearch is unique, so a dedup won't help much.

One thing I've noticed is that by running this search over a 30 day window (which is what the users here are looking for) causes the error. But if I break the search down to 4 hours or a day, things run OK.

0 Karma

castle1126
Communicator

Hey Nick, here's the search I was running:

index=smtp sourcetype=smtp [search index=smtp sourcetype=smtp rule=x | fields + messageid] | transaction messageid | search NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...