I'm wondering if anyone could advise on how to best standardize a log of events with different fields. Basically, I have a log with about 50 transaction types (same source and sourcetype), and each event can have up to 20 different fields based on a specific field, ActionType.
Here are a few sample events with some sample/generated data:
2025-02-10 01:09:00, EventId="6", SessionId="123abc", ActionType="Logout"
2025-02-10 01:08:00, EventId="5", SessionId="123abc", ActionType="ItemPurchase", ItemName="Item2", Amount="200.00", Status="Failure"
2025-02-10 01:07:00, EventId="4", SessionId="123abc", ActionType="ItemPurchase", ItemName="Item1", Amount="500.00", Status="Success", FailureReason="Not enough funds"
2025-02-10 01:06:00, EventId="3", SessionId="123abc" ActionType="ProfileUpdate", ElementUpdated="Password", NewValue="*******", OldValue="***********", Status="Failure", FailureReason="Password too short"
2025-02-10 01:05:00, EventId="2", SessionId="123abc" ActionType="ProfileUpdate", ElementUpdated="Email", NewValue="NewEmail@somenewdomain.com", OldValue="OldEmail@someolddomain.com", Status="Success"
2025-02-10 01:04:00, EventId="1", SessionId="123abc", ActionType="Login", IPAddress="10.99.99.99", Location="California", Status="Success"
I'd like to put together a table with user-friendly EventDescription, like below:
Time: | SessionId | Action | EventDescription |
2025-02-10 01:04:00 | 123abc | LogIn | User successfully logged in from IP 10.99.99.99 (California). |
2025-02-10 01:05:00 | 123abc | ProfileUpdate | User failed to update password (Password too short) |
2025-02-10 01:06:00 | 123abc | ProfileUpdate | User successfully updated email from NewEmail@somenewdomain.com to OldEmail@someolddomain.com |
2025-02-10 01:07:00 | 123abc | ItemPurchase | User successfully purchased item1 for $500.00 |
2025-02-10 01:08:00 | 123abc | ItemPurchase | User failed to purchase item2 for $200.00 (insufficient funds) |
2025-02-10 01:09:00 | 123abc | LogOut | User logged out successfully |
Given that each action will have different fields, what's the best way to approach this, given that there could be about 50 different events (possibly more in the future). I was initially thinking this can be done using a series of case statements, like the one below. However, this approach doesn't seem too scalable or maintainable given the number of events and possible fields for each one:
eval EventDescription=case(EventId="LogIn", case(Status="Success", "User successfully logged in from IP ".IpAddress." (Location)", 1=1, "User failed to login"), EventId="Logout......etc
I was also thinking of using a macro to extract the field and compose an EventDescription, which would be easier to maintain since the code for each Action would be isolated, but I don't think execution 50 macros in one search is the best way to go. Is there a better way to do this? Thanks!
First, thank you for illustrate sample events and clearly state desired output and the logic. Before I foray into action, I'm deeply curious: Who is asking for this transformation in Splunk? Your boss? You be your own boss? Homework? If it's your boss, ask for a raise because semantic transformation is best done with real language transformers such as DeepSeek.😉 If it's homework, tell them they are insane.
This said, I have done a lot of limited-vocabulary, limited-grammar transformations to satisfy myself. The key to the solution is to study elements (both vocabulary and concepts) and linguistic constraints. Most limited-vocabulary, limited-grammar problems can be solved with lookups. In my code below, I use JSON structure for this purpose but lookups are easier to maintain, and result in more readable code. (Using inline JSON has the advantage of reducing the amount of lookups, as you will see.)
| fillnull Status value=Success ``` deal with lack of Status in Logout; this can be refined if blanket success is unwarranted ```
| eval status_adverb = json_object("Success", "succeeded to ", "Failure", "failed to ")
| eval action_verb = json_object("Login", "login from " . IPAddress . " (" . Location . ")", "Logout", "logout",
"ProfileUpdate", "update " . lower(ElementUpdated),
"ItemPurchase", "buy " . ItemName . " for " . Amount)
| eval EventDescription = mvappend("User " . json_extract(status_adverb, Status) . json_extract(action_verb, ActionType),
if(isnull(FailureReason), null(), "(" . FailureReason . ")"))
| table _time SessionId ActionType EventDescription
Output from your sample data is
_time | SessionId | ActionType | EventDescription |
2025-02-10 01:09:00 | 123abc | Logout | User succeeded to logout |
2025-02-10 01:08:00 | 123abc | ItemPurchase | User failed to buy Item2 for 200.00 (Not enough funds) |
2025-02-10 01:07:00 | 123abc | ItemPurchase | User succeeded to buy Item1 for 500.00 |
2025-02-10 01:06:00 | 123abc | ProfileUpdate | User failed to update password (Password too short) |
2025-02-10 01:05:00 | 123abc | ProfileUpdate | User succeeded to update email |
2025-02-10 01:04:00 | 123abc | Login | User succeeded to login from 10.99.99.99 (California) |
Here, instead of jumping between indefinite and adverb forms, I adhere to indefinite for both success and failure.
Note: If the sample events are as you have shown, you shouldn't need to extract any more field. Splunk should have extracted everything I referred to in the code. Here is an emulation of the samples. Play with it and compare with real data. (Also note that you misplaced purchase failure to the success event. Below emulation corrects that.)
| makeresults
| fields - _time
| eval data = mvappend("2025-02-10 01:09:00, EventId=\"6\", SessionId=\"123abc\", ActionType=\"Logout\"",
"2025-02-10 01:08:00, EventId=\"5\", SessionId=\"123abc\", ActionType=\"ItemPurchase\", ItemName=\"Item2\", Amount=\"200.00\", Status=\"Failure\", FailureReason=\"Not enough funds\"",
"2025-02-10 01:07:00, EventId=\"4\", SessionId=\"123abc\", ActionType=\"ItemPurchase\", ItemName=\"Item1\", Amount=\"500.00\", Status=\"Success\"",
"2025-02-10 01:06:00, EventId=\"3\", SessionId=\"123abc\" ActionType=\"ProfileUpdate\", ElementUpdated=\"Password\", NewValue=\"*******\", OldValue=\"***********\", Status=\"Failure\", FailureReason=\"Password too short\"",
"2025-02-10 01:05:00, EventId=\"2\", SessionId=\"123abc\" ActionType=\"ProfileUpdate\", ElementUpdated=\"Email\", NewValue=\"NewEmail@somenewdomain.com\", OldValue=\"OldEmail@someolddomain.com\", Status=\"Success\"",
"2025-02-10 01:04:00, EventId=\"1\", SessionId=\"123abc\", ActionType=\"Login\", IPAddress=\"10.99.99.99\", Location=\"California\", Status=\"Success\"")
| mvexpand data
| rename data as _raw
| extract
| rex "^(?<_time>[^,]+)"
``` data emulation above ```
Thanks for the detailed response. To clarify, this is meant as an audit trail for a few users with very limited technical expertise, and I agree with your sentiments. I'm doing this as an exploratory exercise, although I'm leaning towards this being a maintenance nightmare and am exploring other solutions for providing data. I'll play around with the json string and/or lookups as in your examples. thanks!