Hi,
I have come across an issue similar to this link on Answers: (http://answers.splunk.com/questions/3092/cant-get-past-subsearch-limit). If I run a simple search (sourcetype=smtp | fields messageid
) over a set amount of time I get over 15,000 results. When I change this search to make it a subsearch
index=smtp sourcetype=smtp [search index=smtp sourcetype=smtp rule=x | fields + messageid] | transaction messageid | search NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject
I end up getting the error "Error in 'UnifiedSearch': Unable to parse the 'The specified search is too large. Please try to simplify your search.' search."
The contents of the limits.conf are: [subsearch] maxout = 100000
[format] maxresults = 100000
After reading the link above I've tried adding the pipe to the subsearch for "format maxresults=20000", but I continue to get this error.
The server is running Splunk 4.1.5 (64 bit on RHEL 5.4). What do you think I'm missing?
Thanks!
Well what's happening is that the 15,000 rows are being turned into a gigantic OR clause for
( messageid=A OR messageid=B OR messageid=C OR ....)
and there's 15,000 of them. Since I dont see a dedup or a stats command in your search you might just be specifying the same messageid's over and over again. At any rate at some rather large length, splunkd's search language parser gets a bit unhappy.
Tacking a | dedup messageid
on the end of the subsearch might help a lot. Or it might help only a little. Or if they're already unique it wont help at all.
UPDATE: well from our comments below, you actually do have a ton of messageids, so you are indeed coming up against the subsearch limits. An alternative is to have a scheduled search running all the time that puts the rule="x"
My apologies for my previous suggestions which were off base (and which I've deleted from this answer), and this one may be equally unhelpful, but it seems you could use stats to create transactions for everything, and then do the rule="x" filtering afterwards.
index=smtp sourcetype=smtp | stats values(ip) as ip values(envfrom) as envfrom values(envto) as envto values(rule) as rule values(subject) as subject values(score) as score by messageid | search rule="x" NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject
Well what's happening is that the 15,000 rows are being turned into a gigantic OR clause for
( messageid=A OR messageid=B OR messageid=C OR ....)
and there's 15,000 of them. Since I dont see a dedup or a stats command in your search you might just be specifying the same messageid's over and over again. At any rate at some rather large length, splunkd's search language parser gets a bit unhappy.
Tacking a | dedup messageid
on the end of the subsearch might help a lot. Or it might help only a little. Or if they're already unique it wont help at all.
UPDATE: well from our comments below, you actually do have a ton of messageids, so you are indeed coming up against the subsearch limits. An alternative is to have a scheduled search running all the time that puts the rule="x"
My apologies for my previous suggestions which were off base (and which I've deleted from this answer), and this one may be equally unhelpful, but it seems you could use stats to create transactions for everything, and then do the rule="x" filtering afterwards.
index=smtp sourcetype=smtp | stats values(ip) as ip values(envfrom) as envfrom values(envto) as envto values(rule) as rule values(subject) as subject values(score) as score by messageid | search rule="x" NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject
Nick, I've tried using the search that you added to your updated answer, but still no luck. I do not get the information required. Any other ideas?
Nick, unfortunately neither search works for me. The mail gateway we are using does not put all values in one record/line for me. Each line has a different value I need (from, to, ip, subject, etc). The line that has the rule is a line unto itself with no other fields I'm looking for. So the results of the first search "rule=x" never returns an IP, subject, etc. This is why I run the initial search as a subsearch, then feed the messageid field back to then do the transaction with which then gives me the fields I'm looking for - from, to, etc.
OK. Well backing up even further I realized I cant even see why you need a subsearch at all. 😃 Check out my updated answer which may help.
Hey Nick, what actually is returned from the subsearch is tens of thousands of rows - each row different from the others, with a unique messageid, different timestamps, etc. So running DEDUP won't help in this situation, unfortunately.
I'm wondering if I'm relegated to having to run the search over a smaller timeframe. In my case not run the search over a month window, but run one search for each day of the month. Not the most efficient way of getting the output, but not sure what else to do.
No no, if there's literally only one messageid coming out, but tens of thousands of ROWS with that messageid, then the dedup will help IMMENSELY. You're asking splunkd to run a search that's just (messageid=A OR messageid=A OR messageid=A OR messageid=A OR ....). The dedup should solve your problem completely.
As you can see from the inner search, I'm only grabbing the messageid where the rule=x. So the messageid field coming out of the subsearch is unique, so a dedup won't help much.
One thing I've noticed is that by running this search over a 30 day window (which is what the users here are looking for) causes the error. But if I break the search down to 4 hours or a day, things run OK.
Hey Nick, here's the search I was running:
index=smtp sourcetype=smtp [search index=smtp sourcetype=smtp rule=x | fields + messageid] | transaction messageid | search NOT ip=127.0.0.1 score<50 | table messageid ip envfrom envto subject