Good day,
I'm hoping someone smarter than me can help me figure this out. In the search below, I'm trying to correlate a set of events from two different indexes. IndexA has network switch connection logs, and IndexB has dhcp hostname mappings. I want to combine the information from both.
IndexA has a unique SessionID value that I'm using to differeniate individual connection attempts, and I want to have my stats table summarize by this field only so I can see informatino per a connection attempt. Index B does not have this field, however.
For reference, in the narrow time range I'm working within, there are only two SessionID's for the same MAC/IP address pair. Likewise, there's only a single set of unique DHCP logs. This was done deliberately by unplugging/plugging a computer into the network twice to generate two connection attempts while keeping the same DHCP information.
Because the SessionID field is not in the second index, I ran the first stats command summarize my events on the two fields the two indexes did share: the MAC and IP Address.
With the individual events now gone and everything summarized, I ran the second stats command to then summarize by the Session_ID. This does work except for one flaw. As stated above, there are only two Session_ID's contained within two events, each with their own _time field. Because I use the values() function of stats, both timestamps are printed as a multi-value field in each row of the stats table, and you're not able to tell which timestamp belongs to which Session_ID.
I've tried different permutations of things, such as using mvexpand between the stats command to split the time_indexA(I'm not interested in charting the time for events from indexB) field back into individual events. I've also tried summarizing by time in the first stats command alongside the MAC/IP address. I attempted using eventstats as well, but it's not a command I'm very familiar with, so that didn't work either.
And finally, when I do manage to make some progress with correlating each timestamp to its own event, so far, I've alwasy lost the hostname field from indexB as a byproduct.
I've attached a picture of the table when run in case my explanation is subpar.
(index=indexA) OR (index=indexB)
| rex field=text "AuditSessionID (?<SessionID>\w+)"
| rex field=pair "session-id=(?<SessionID>\w+)"
| eval time_{index}=strftime(_time,"%F %T")
| eval ip_add=coalesce(IP_Address, assigned_ip), mac_add=coalesce(upper(src_mac), upper(mac))
| eval auth=case(CODE=45040, "True", true(), "False")
| stats values(host_name) as hostname values(networkSwitch) as Switch values(switchPort) as Port values(auth) as Auth values(SessionID) as Session_ID values(time_indexA) as time by mac_add, ip_add
| stats values(time) as time values(hostname) as hostname values(Switch) as Switch values(Port) as Port values(Auth) as Auth values(ip_add) as IP_Address values(mac_add) as MAC_Address by Session_ID
When trying to collect more than one field into a MV field, the problem of correlating one entry against the entries in another field can be solved in a number of ways. stats values() will always sort/dedup the values, hence the loss of order, so using stats list() CAN be a solution if you have knowledge of your data - it will collect up to 100 max items in the list but in event sequence order, so will retain correlation between each MV field.
Making composite fields is another way, as you have done with mvzip. You can make this work with any number of fields getting as complex as needed, e.g.
| eval c=mvzip(mvzip(mvzip(mvzip(A, B, "##"), C, "##"), D, "##"), E, "##")
In your case, I would suggest the practive of PRE-pending time, not POST-pending, as there is an immediate benefit to the output from stats value() in that the output results will be sorted in ascending time order.
It has a useful benefit in that you can use mvmap to iterate results in a known order.
Also, always a good idea to remove fields BEFORE mvexpand. If you don't need a field, remove it before incurring the memory cost of mvexpand.
Another improvement would be to
| eval session_time=mvdedup(session_time)
before you mvexpand - there's no point in expanding stuff you will discard.
If I am understanding this the common field between the two sets of events is Ip Address ?
Try | stats values(*) AS * by ip_add.
Then after the stats command you can do your renaming.
Sorry I wasn't clear enough.
There are two shared fields: mac_add and ip_add. However, I need to be able summarize by the Session_ID field. Because the field isn't shared, I first summarize by mac_add and ip_add in the first stats command. Then in the second, I summarize by Session_ID. The issue is that the time field becomes a multi-value field with the time stamps for each of the events summarized rather than a unique timestamp for each Sesion_ID.
Mhmm....maybe I can mvzip a Session_ID and it's time field together to keep the pair together between stats and split them apart further down the pipeline....
---------------------
The answer to that question is YES! I can do exactly that, and it fixes the problem. What I did was use mvzip to combine the Session_ID and time for into a new field session_time after the first stats command. Then, after the second stats summarizing by the Session_ID field, I split apart the session_time field with mvexpand to get individual events pairing a session time with its time. I then used rex to split that pair into two new fields, a session and time field. Finally, a dedup to clean out the duplicates, and it was done! This is the command now.
(index=indexA) OR (index=indexB)
| rex field=text "AuditSessionID (?<SessionID>\w+)"
| rex field=pair "session-id=(?<SessionID>\w+)"
| eval time_{index}=strftime(_time,"%F %T")
| eval ip_add=coalesce(IP_Address, assigned_ip), mac_add=coalesce(upper(src_mac), upper(mac))
| eval auth=case(CODE=45040, "True", true(), "False")
| stats values(host_name) as hostname values(networkSwitch) as Switch values(switchPort) as Port values(auth) as Auth values(SessionID) as Session_ID values(time_indexA) as time by mac_add, ip_add
| eval session_time=mvzip(Session_ID, time)
| stats values(time) as time values(hostname) as hostname values(Switch) as Switch values(Port) as Port values(Auth) as Auth values(ip_add) as IP_Address values(mac_add) as MAC_Address by Session_ID
| mvexpand session_time
| fields - time Session_ID
| rex field=session_time "(?<Session_ID>\w+),(?<Time>.+)" | fields - sesison_time
| dedup Session_ID Time
| table Time Hostname MAC_Address IP_Address Switch Port Auth Session_ID
When trying to collect more than one field into a MV field, the problem of correlating one entry against the entries in another field can be solved in a number of ways. stats values() will always sort/dedup the values, hence the loss of order, so using stats list() CAN be a solution if you have knowledge of your data - it will collect up to 100 max items in the list but in event sequence order, so will retain correlation between each MV field.
Making composite fields is another way, as you have done with mvzip. You can make this work with any number of fields getting as complex as needed, e.g.
| eval c=mvzip(mvzip(mvzip(mvzip(A, B, "##"), C, "##"), D, "##"), E, "##")
In your case, I would suggest the practive of PRE-pending time, not POST-pending, as there is an immediate benefit to the output from stats value() in that the output results will be sorted in ascending time order.
It has a useful benefit in that you can use mvmap to iterate results in a known order.
Also, always a good idea to remove fields BEFORE mvexpand. If you don't need a field, remove it before incurring the memory cost of mvexpand.
Another improvement would be to
| eval session_time=mvdedup(session_time)
before you mvexpand - there's no point in expanding stuff you will discard.
Okay, reporting back. Your advice was sound. I managed to drop the mvexpand all together by substituting in this bit of logic at the end.
| stats values(dhcp_host_name) as Hostname values(Switch) as Switch values(Port) as Port values(ip_add) as IP_Address values(mac_add) as MAC_Address list(session_time) as session_time by Session_ID, Auth
| eval Time=mvmap('session_time', if(substr('session_time', 1, len('Session_ID'))=='Session_ID', substr(session_time, len('Session_ID')+2), null()))
| table Time Hostname IP_Address MAC_Address Switch Port Auth
| sort -Time
Just so you have an example of what my data might look like for one event.
Session_ID= "7D5A007C1B294E"
session_time= "7D5A007C1B294E,2025-02-11 12:56:51"
"9DE81CAB15DD46,2025-02-06 15:22:13"
By using mvmap, I itterate through session_time and check each value to find which equals that events Session_ID. Then I extract the Time field from that value.
Excellent - mvmap is very powerful when handling MV fields.
One small point I noticed on your use of single quotes to wrap field names. Using single quotes is essential where field names start with numbers or have spaces or other non-ascii characters, although some characters are ok, e.g. underscore, otherwise the search just won't work.
Here, you have one use substr(session_time, ...) where you are not using single quotes, and it works because it's not needed for either of the two fields names here, so it's good to be aware of when it's necessary on the right hand side of an eval.
When you know the field name, you can make the call, but a command that is very useful is foreach and when using templated fields <<FIELD>>, it's generally good practice to always wrap that usage.
Thanks! I must have missed that one field by not wrapping it. Glad to know it didn't matter in this case, though lol. I'll make sure to look into how I can use FOREACH going forward.
You've been a great help!
Thank you! There's a lot of good stuff here I'll try and incorporate into the search. I will say, however, that session_time doesn't actually have duplicates. It contains a each unique timestamp for each Session_ID.....however, as every event has its own copy of session_time, when I do mvexpand, it leads to multiple duplicates that I need to take out with dedup.
Ideally, it'd be nice if each entry of session_time within each event only had the time and session_id for that specific event and not all events....
I'm going to see if maybe I can use mvmap to check each events Session_ID against the session_id part of session_time.
Sorry, I know it's convoluded. Still, I'm making progress thanks to your suggestions~
I'll keep this question open another day or so. I'm thrilled I managed to solve the issue, but I'll admit......the solution isn't exactly as clean and efficient as I'd like it. If anyone smarter than me wants to propose a better soltion, I'm happy to hear it.
It is often much easier for volunteers to provide answers (particularly to search/SPL questions) if you post sample events in their raw format so that we can attempt to simulate your situation and design solutions to meet your needs. We do not have the benefit of access to your data so you need to give us something to work with.