We use SA-ldapsearch to pull Active Directory data into the ES Assets & Identity framework. We do not currently ingest DHCP logs, but the IP address last seen for an AD computer is pulled in as part of the ldapsearch lookup gen search (below). Having recently updated to ES 6 and Splunk 8, I'm noticing that workstations are being combined in the Asset KV stores (assets_by_str) if they share an IP address. Since IP addresses change at different times and many of our users work from home with or without VPN, this is a common occurrence. This leads to ridiculous results in investigation in which the "source_hostname" ends up being mapped from the source (DHCP) IP address in the search result to an MV field of 50-60 hostnames all of which at some point or another in history had that IP address.
I know that I can turn Asset correlation OFF in the ES configuration for Data Enrichment, but I don't want that, since hostnames are accurately resolved to user identities in many cases; also, old data is better than no data. I have considered conditionally eliminating IP addresses from our DHCP ranges by simply conditionally removing the IP record from the lookup gen search (below), but what I'm really looking for is a best practice. Is Splunk ES 6 designed to handle DHCP in some other way I'm not seeing? If not, this change seems asinine. No one could ever want the asset data for DHCP endpoints to be handled in this way.
| ldapsearch domain=default search="(&(objectClass=computer))" | eval city="" | eval country="US" | eval priority="medium" | eval category="normal" | eval dns=dNSHostName | eval owner=description | rex field=sAMAccountName mode=sed "s/\$//g" | eval nt_host=sAMAccountName | makemv delim="," dn | rex field=dn "(OU|CN)\=(?<bunit>.+)" | eval requires_av="true" | eval should_update="true" | lookup dnslookup clienthost as dns OUTPUT clientip as ip | join managedBy [| ldapsearch search="(&(objectClass=user))" | rename distinguishedName AS managedBy, sAMAccountName AS managed_by_user | table managedBy managed_by_user] | table ip,mac,nt_host,dns,owner,managed_by_user,priority,lat,long,city,country,bunit,category,pci_domain,is_expected,should_timesync,should_update,requires_av | outputlookup ad_assets.csv
Hi. Sorry for the double post. Did you find any good solution to this problem? If I understand correctly from your example search above, you still include
ip in the result table, but the intention was to exclude IP?
I think that in our case, the problem is that the IP field is populated from AD, not an actual DHCP server. It's just the AD server's internal DNS record for the AD hostname, and since these are DHCP IPs managed by another service, there are often duplicates, and since
ip is a key field, the new behavior is for ES to indiscriminately combine records with the same key fields. My monster correlated host records came from this process repeating over and over for several months.
I got a tip from someone at Splunk out of band that this might get cleaned up in 6.1.1, so I have largely set the issue aside until we can upgrade.
If it turns out this is just the new behavior for asset correlation, I'm just going to add some logic to the lookup generator search which uses ldapsearch to drop the ip field from any asset in a DHCP subnet.
This sound very much like the issue we are experiencing as well. Thanks for the tip. I'll see if the asset lists improve after upgrading to 6.1.1. Fingers krossed!
I would assume this is a problem a lot of people are having, especially now when so many work from home. Hopefully it will be addressed by Splunk soon.
Actually, upgrading to Splunk ES 6.1.1 seems to have solved the problem for us, at least partially. The huge multivalue asset rows in
asset_lookup_by_str are gone, but there still are some smaller multivalue rows. However, I suspect that these remaing rows are caused by something else than DHCP, but perhaps issues in the CMDB.
Did upgrading to 6.1.1 work for you as well?
@stroud_bc - depending on the version of ES you're running we optimized some of the merge code recently, but if any of the "key" fields match across rows we would squash that record into one by default. That said, we noticed some source lookup files were filling in "null" values with strings which caused inadvertant merges. Double check that any of the key fields (nt_host, ip, mac and dns) indeed have empty strings rather than placeholder values like "n/a", "none", etc.
We seem to be having the same problem as OP. At first we though that the assets where merged due to the mac field being an empty string, and not actually NULL, but as far as I can get from your post this is intentional. Fields with no value in the asset list is supposed to have an empty string, and not NULL?
If so, probably our problem is due to DHCP, and not empty strings.