Splunk Search

Enriching Proxy Log data containing userid and IP address with DNS & Active Directory attributes, how do we handle subsearch result limits?

ltrand
Contributor

So, fun problem:

We're wanting to do some data enrichment so that we can build good reports. What we want to do is take proxy logs which contain a userid and an IP address and resolve the userid against AD to get the business group and resolve the IP against DNS to get the proper hostname.

Issue:

1: Millions of proxy events per hour.
2: over 300k accounts
3: over 200k endpoints

Because of this, subsearches are failing (limits reached), an inline ldapsearch fails, and inline dnslookup fails because of max events. I've thought of chunking it, but I still get back to the fact that the sub data is too large for the limits. I almost would like to just append this data to the events in the proxy logs index, but I don't know if that is possible.

Current search logic:

sourcetype=proxy_logs user!="" 
| fields user category, src, dest, http_referrer, url
| join user [search sourcetype="ActiveDirectory" | fields sAMAccountName, displayName, company, department | rename sAMAccountName AS user ]
| stats values(displayName) AS "Display Name" values(company) AS Company values(department) AS Dept values(category) AS Category values(src) AS Source values(dest) AS DEST count(_raw) AS "URL Count" values(http_referrer) AS Referrer values(url) AS URL by user

I've thought about doing the proxy search as the subsearch, but the logs can get just as large and fail on max returns. I figure if I solve the AD problem then the DNS enrichment will use the same logic.

So, anyone have thoughts on how to solve this gordian knot? Do I just need to live with increasing the limits? Or am I thinking about the problem wrong? I figure whatever it is I'll have to chunk it and write it to a summary index to actually do any kind of reporting, but I need to get the fields in there first.

Thanks everyone!

0 Karma
1 Solution

woodcock
Esteemed Legend

Try this:

(sourcetype=proxy_logs user!="" ) OR (sourcetype="ActiveDirectory") | eval user=coalesce(user, sAMAccountName) | fields user category src, dest http_referrer url displayName company department | stats values(displayName) AS "Display Name" values(company) AS Company values(department) AS Dept values(category) AS Category values(src) AS Source values(dest) AS DEST count(_raw) AS "URL Count" values(http_referrer) AS Referrer values(url) AS URL by user

View solution in original post

woodcock
Esteemed Legend

Try this:

(sourcetype=proxy_logs user!="" ) OR (sourcetype="ActiveDirectory") | eval user=coalesce(user, sAMAccountName) | fields user category src, dest http_referrer url displayName company department | stats values(displayName) AS "Display Name" values(company) AS Company values(department) AS Dept values(category) AS Category values(src) AS Source values(dest) AS DEST count(_raw) AS "URL Count" values(http_referrer) AS Referrer values(url) AS URL by user

View solution in original post

ltrand
Contributor

That works awesome! Thanks!

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!