As @PickleRick notes, if time is critical for correlation, bin is risky. This is one of the use cases where transaction is appropriate. But you cannot use span alone. Using index alone with _time is also unsafe. From the context implied in mock data, you want transaction by user. It is very important that you describe these critical logic clearly, explicitly, and without help from SPL.
The only obstacle is the field names "username" and "Users_Name"; this is so easily overcome with coalesce. (It is never a good idea to illustrate mock data with inaccuracy. If the field name is Users_Name, you should consistently illustrate it as Users_Name and not usersname.) One element that distracted people is the servtime and logtime conversion in initial illustrated SPL. These fields adds no value to the use case.
This is the code that should get you started
index=printserver OR index=printlogs
| eval username = coalesce(username, Users_Name)
| fields - usersname
| transaction username maxspan=3s
| table _time prnt_name username location directory file
Using your corrected mock data, the above gives
_time | prnt_name | username | location | directory | file |
2024-11-04 12:20:56 | Printer2 | tim.allen | FrontDesk | c:/documents/files/ | document2.xlsx |
2024-11-04 11:05:32 | Printer1 | jon.doe | Office | c:/desktop/prints/ | document1.doc |
Here is an emulation of your mock data. Play with it and compare with real data
| makeresults format=csv data="_time, prnt_name, username, location, _raw
2024-11-04 11:05:32, Printer1, jon.doe, Office, server event 1
2024-11-04 12:20:56, Printer2, tim.allen, FrontDesk, server event 2"
| eval index = "prntserver"
| append
[makeresults format=csv data="_time, Users_Name, directory, file, _raw
2024-11-04 11:05:33, jon.doe, c:/desktop/prints/, document1.doc, log event 1
2024-11-04 12:20:58, tim.allen, c:/documents/files/, document2.xlsx, log event 2"
| eval index = "printlogs"]
| eval _time = strptime(_time, "%F %T")
| sort - _time
``` the above emulates
index=printserver OR index=printlogs
```
Hope this helps
Hi there,
like in most cases a simple 'stats' will do the trick.
Try something like this
index=printserver OR index=printlogs
| bin _time span=3s
| stats values(*) AS * by _time username
| table _time prnt_name username location directory file
Hope this helps ...
cheers, MuS
I am going ot Rex the User_Name field from my print logs to match it with my printserver logs. This is what I am going to work with and see if I get the results I need. Thank you for your insight.
index=printserver OR index=printlogs
| rex field="User_Name" "(?<domain>\S)+\\\\(?<username>\S+)"
| bin _time span=3s | stats values(*) AS * by _time username
| table _time prnt_name username location directory file
That was my initial reaction but as usual - binning has issues when your events cross the bin boundary. For example - one event at :11, another at :14. They will be binned into separate buckets and won't match.
HaHA, I was answering based on the provided information not trying to work around every single possible outlier 😄
Actually it's not that much of an outlier. Assuming 2s split between first and second event you have 2/3 chance of splitting them into two separate bins.
Join on _time doesn't make sense if the time is off in one of data sets. You noticed it yourself.
This transaction doesn't make sense either since you don't have a field called src which could contain one of those strings.
If this is indeed all the data you have it's a very tricky problem in general. Because what if the same user requests two print jobs within a second? How can you tell which one went to which printer if the only common field is the username?
If you can make some assumptions about the data the problem can be solved one way or another. You can use transaction on username field and indeed maxspan=3s or something like that (maybe add a startswith="index=printserver" endswith="index=printlogs". But transaction is a relatively resource-intensive command and is best avoided if possible. So if you can make some other assumptions maybe the solution could be better.
Hey,
Thanks again for giving me your insight on this one. I did come across the bin command but thought the transaction might be better to try in this situation. As I am still learning the power and uses of many of the commands that can be used in Splunk, this does help me get a better understanding of how to use and when to use the transaction command.
As you pointed out and is my true problem in this case, there are only two common/semi common variables I have between my two indexes, that being "_time" and "username".
I have compared the raw logs from both indexes and it appears that at most, the print jobs are separated by 2 secs and I haven't seen any print jobs by the same user that have been closer than 10 seconds apart. But as to your point, I will make note that there could be some issue with my output if a user prints two jobs seconds apart from each other.
As always, appreciate your input and clarification on my questions.