All Apps and Add-ons

How do you use a wildcard source and the join command to consolidate fields with common values?

versi
New Member

I'm new to Splunk and I'm trying to figure out how to merge five different fields, containing an IP address, as the only value together. I want it to overwrite the duplicate data but retain any unique data when consolidating the rows.

My source data is using a wildcard, I've looked at the join function, but I cant seem to quite get my head around how to use it with a wildcard as the source.

This is what I have so far, it dedups my IP addresses but it also needs to merge the data between the different IP address fields together. Otherwise I get too many matches.

source="dataset:*" | dedup device_ip_address keepempty=true | dedup end_host_ip_address keepempty=true | dedup switch_ip_address keepempty=true | dedup ip_address keepempty=true | dedup IPADDR keepempty=true | table device_ip_address device_model end_host_ip_address end_host_mac_address end_host_name ip_address interface_admin_status interface_description IPADDR  switch_interface switch_ip_address switch_model switch_name

I want to just consolidate the data using the ip address fields as the common denominator. Also there may be more ip addresses in one field than another, i want to retain that data too.

Hope I was clear enough!

0 Karma

versi
New Member

Here are two solutions I got with Daljeanis and maciep help.

First solution

source="devices:*"
| fields IP_field1, IP_field2, IP_field3 switch_name interface_status
| eval myIPlist = mvdedup(mvappend(IP_field1, IP_field2, IP_field3))
| eval myMAC = mvdedup(mvappend(MAC1, MAC2))
| fields myIPlist switch_name interface_status
| stats values(myMAC) as myMAC, latest(*) as * by myIPlist
| table myIPlist myMAC switch_name interface_status

And here is the second solution

source="devices:*"
| eval myIPlist = coalesce(IP_field1, IP_field2, IP_field3)
| eval myMAC = coalesce(MAC1, MAC2)
| stats values(myMAC) as myMAC, latest(*) as * by myIPlist
| table myIPlist myMAC switch_name interface_status

Chose the one that works best for you

0 Karma

DalJeanis
Legend

You can't necessarily do the dedup on multiple fields without losing data. Try something like this...

your search that gets all the records
| fields ip1 ip2 ip3 ip4... data1 data2 data3 data4 ...
| eval myIPlist = mvdedup(mvappend(ip1,ip2,ip3,ip4))
| fields myIPlist ... whatever other non-IP fields you need to keep ...
| stats count values(*) as * by myIPlist

In the above, ip1...ip4 represent any field you might have an IP address in, whatever their names might be, and data1... data4 represent any other data you want to keep, whatever their names might be.

0 Karma

versi
New Member

This seems to be the same as doing what maciep suggested
| eval myIPlist = coalesce(ip1 ip2 ip3 ip4)

0 Karma

maciep
Champion

it is similar to what i was doing. however, if you have more than one type of IP field in a single event, then i think this will handle it better than coalesce does.

0 Karma

maciep
Champion

it would be helpful if you could specify exactly what you'd like the end result to be. Do you only want to see a single ip field and then other fields associated with the ip? Or do you still need to know what "type" of ip it is? Also, could one event have more than one type of IP address in it? And if so, how would you expect that to show up in the end result?

In general too, be careful with the dedup command. It works, but you may end up removing data that you want. Maybe try stats instead. A very simple, but maybe not useful example, would be to "normalize" the ip to a single field and the do stats on that field. The coalesce command will return the first non-null value in the list.

source = dataset:*
| eval my_ip = coalesce(device_ip_address, end_host_ip_address, switch_ip_address, ip_address, IPADDR,"n/a")
| stats values(*) as * by my_ip

versi
New Member

Yes end result should see only one IP field with all associated fields attached to it, I am trying to export the entire database presenting all fields of data while removing duplicates and consolidating the fields together.

I am not at the office right now, but I will try the coalesce option and see if it gets me any closer to my end goal. Thanks for helping!

Edited*
That command worked, however now I have an issue where I get multiple values in a signal field. It's a last discovered date field and it is pulling multiple dates into the field, I only want to see the earliest date. Any idea how to do that? I searched around and tried

convert mktime(end_host_last_discovered) as date timeformat="%Y %m %d %H:%M:%S" 
| eval Createdate=now()+(60*60*24*365) 
| foreach date[eval Createdate= if (typeof('<<FIELD>>')=="Invalid", CreatedateEpoch, min(CreatedateEpoch, '<<FIELD>>'))] 
| eval Createdate = strftime(CreatedateEpoch, "%Y-%m-%d %H:%M:%S") 

But It effectively did nothing so I know I'm doing something wrong.

0 Karma

versi
New Member

This is my current output for the "end_host_last_discovered" field,

end_host_last_discovered
2018-10-09 08:28:59 
2018-10-09 09:20:55 
2018-10-09 09:21:34 
2018-10-09 09:31:38
etc...

This needs to just show the latest date, how do I do that?
Side note, this field will grow in size as the search runs.

0 Karma

maciep
Champion

you should be able to get the latest value of the field with the stats command as well...something like this maybe

| stats latest(discovered_date) as discovered_date, values(*) as * by my_ip
0 Karma

versi
New Member

Doesn't seem to work

source="devices:*" 
| eval my_ip = coalesce(device_ip_address, end_host_ip_address, switch_ip_address, ip_address, IPADDR,"n/a")
| stats values(*) as * by my_ip
| stats latest(end_host_last_discovered) as end_host_last_discovered, values(*) as * by my_ip
| table my_ip end_host_last_discovered

Will still return multiple dates in the end_host_last_discovered field, any other ideas?

0 Karma

maciep
Champion

ah...nope just update the existing stats command you were using, don't pipe to it again. i should have been more specific.

 source="devices:*" 
 | eval my_ip = coalesce(device_ip_address, end_host_ip_address, switch_ip_address, ip_address, IPADDR,"n/a")
 | stats latest(end_host_last_discovered) as end_host_last_discovered, values(*) as * by my_ip
 | table my_ip end_host_last_discovered
0 Karma

versi
New Member

That did not work either, also i think i might have to backtrack what i said about it working earlier. As I ran the scan I started to notice I was getting a lot of additional values in no particular order to almost all of my fields not just the end_host_last_discovered field. Making it difficult to know what I am looking at if I see some 20 mac addresses associated to one ip address 😕

Sorry for making the problem difficult to figure out, but I do really appreciate you helping.
I feel like I'm bashing my head against a wall trying to figure out the best way to export meaningful data in a readable format.

0 Karma

maciep
Champion

Well, that's what the values() aggregate does - gives you a list of unique values for a field. If that's not what you want, how would you like to consolidate multiple field values associated to one IP address? Or would you? Would just want the latest? Does matter if it's a device_ip or a switch_ip, etc? Would that be the same case for all of the other fields?

It would probably be helpful to figure out how you would tackle this manually w/o splunk - which fields/values to keep and what that end result would look like...so that then you would have a goal to work toward with Splunk.

0 Karma

versi
New Member

Thought I did, but I see your point. I will look over this more and see if I can come up with a better plan to organize the data.

0 Karma

versi
New Member

After sleeping on it, it now makes sense how stats works. I was able to get the latest data for all of my fields using

| stats latest(*) as * by my_ip

But there are a few fields where I would like to actually see all the values, like the MAC address field ( can help in finding duplicate IP's ). Is there a way to have a stats command do something like this?

| stats latest(*) values(MAC, hostname) as * by my_IP

Where it excludes "MAC" and "hostname" from the wildcard and instead does the values function, for those two fields?

0 Karma

maciep
Champion

yep, pretty sure something like this should do it...i think...hopefully.

| stats values(MAC) as MACs, values(hostname) as hosts, latest(*) as * by my_IP
0 Karma

versi
New Member

That worked! Thanks a million maciep!

0 Karma

maciep
Champion

good to hear!

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...