Splunk Search

Combining multiple text strings and fields into one common field

apiprek2
Explorer

I'm wondering if anyone could advise on how to best standardize a log of events with different fields. Basically, I have a log with about 50 transaction types (same source and sourcetype), and each event can have up to 20 different fields based on a specific field, ActionType.

Here are a few sample events with some sample/generated data:

2025-02-10 01:09:00, EventId="6", SessionId="123abc",  ActionType="Logout"

2025-02-10 01:08:00, EventId="5", SessionId="123abc", ActionType="ItemPurchase", ItemName="Item2",  Amount="200.00", Status="Failure"

2025-02-10 01:07:00, EventId="4", SessionId="123abc", ActionType="ItemPurchase", ItemName="Item1", Amount="500.00", Status="Success", FailureReason="Not enough funds"

2025-02-10 01:06:00, EventId="3", SessionId="123abc" ActionType="ProfileUpdate", ElementUpdated="Password", NewValue="*******", OldValue="***********", Status="Failure", FailureReason="Password too short"

2025-02-10 01:05:00, EventId="2", SessionId="123abc" ActionType="ProfileUpdate", ElementUpdated="Email", NewValue="NewEmail@somenewdomain.com", OldValue="OldEmail@someolddomain.com", Status="Success"

2025-02-10 01:04:00, EventId="1", SessionId="123abc", ActionType="Login", IPAddress="10.99.99.99", Location="California", Status="Success"

I'd like to put together a table with user-friendly EventDescription, like below:

Time:SessionIdActionEventDescription
2025-02-10 01:04:00123abcLogInUser successfully logged in from IP 10.99.99.99 (California).
2025-02-10 01:05:00123abcProfileUpdateUser failed to update password (Password too short)
2025-02-10 01:06:00123abcProfileUpdate

User successfully updated email from NewEmail@somenewdomain.com to OldEmail@someolddomain.com

2025-02-10 01:07:00123abcItemPurchaseUser successfully purchased item1 for $500.00
2025-02-10 01:08:00123abcItemPurchaseUser failed to purchase item2 for $200.00 (insufficient funds)
2025-02-10 01:09:00123abcLogOutUser logged out successfully

 

Given that each action will have different fields, what's the best way to approach this, given that there could be about 50 different events (possibly more in the future).  I was initially thinking this can be done using a series of case statements, like the one below.  However, this approach doesn't seem too scalable or maintainable given the number of events and possible fields for each one:

eval EventDescription=case(EventId="LogIn", case(Status="Success", "User successfully logged in from IP ".IpAddress." (Location)", 1=1, "User failed to login"), EventId="Logout......etc

I was also thinking of using a macro to extract the field and compose an EventDescription, which would be easier to maintain since the code for each Action would be isolated, but I don't think execution 50 macros in one search is the best way to go.  Is there a better way to do this?  Thanks!

Labels (3)
Tags (4)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

First, thank you for illustrate sample events and clearly state desired output and the logic.  Before I foray into action, I'm deeply curious: Who is asking for this transformation in Splunk?  Your boss?  You be your own boss?  Homework?  If it's your boss, ask for a raise because semantic transformation is best done with real language transformers such as DeepSeek.😉  If it's homework, tell them they are insane.

This said, I have done a lot of limited-vocabulary, limited-grammar transformations to satisfy myself.  The key to the solution is to study elements (both vocabulary and concepts) and linguistic constraints.  Most limited-vocabulary, limited-grammar problems can be solved with lookups.  In my code below, I use JSON structure for this purpose but lookups are easier to maintain, and result in more readable code. (Using inline JSON has the advantage of reducing the amount of lookups, as you will see.)

 

| fillnull Status value=Success ``` deal with lack of Status in Logout; this can be refined if blanket success is unwarranted ```
| eval status_adverb = json_object("Success", "succeeded to ", "Failure", "failed to ")
| eval action_verb = json_object("Login", "login from " . IPAddress . " (" . Location . ")", "Logout", "logout",
  "ProfileUpdate", "update " . lower(ElementUpdated),
  "ItemPurchase", "buy " . ItemName . " for " . Amount)
| eval EventDescription = mvappend("User " . json_extract(status_adverb, Status) . json_extract(action_verb, ActionType),
  if(isnull(FailureReason), null(), "(" . FailureReason . ")"))
| table _time SessionId ActionType EventDescription

 

Output from your sample data is

_timeSessionIdActionTypeEventDescription
2025-02-10 01:09:00123abcLogoutUser succeeded to logout
2025-02-10 01:08:00123abcItemPurchase
User failed to buy Item2 for 200.00
(Not enough funds)
2025-02-10 01:07:00123abcItemPurchaseUser succeeded to buy Item1 for 500.00
2025-02-10 01:06:00123abcProfileUpdate
User failed to update password
(Password too short)
2025-02-10 01:05:00123abcProfileUpdateUser succeeded to update email
2025-02-10 01:04:00123abcLoginUser succeeded to login from 10.99.99.99 (California)

Here, instead of jumping between indefinite and adverb forms, I adhere to indefinite for both success and failure.

Note: If the sample events are as you have shown, you shouldn't need to extract any more field.  Splunk should have extracted everything I referred to in the code.  Here is an emulation of the samples.  Play with it and compare with real data. (Also note that you misplaced purchase failure to the success event.  Below emulation corrects that.)

 

| makeresults
| fields - _time
| eval data = mvappend("2025-02-10 01:09:00, EventId=\"6\", SessionId=\"123abc\",  ActionType=\"Logout\"",
"2025-02-10 01:08:00, EventId=\"5\", SessionId=\"123abc\", ActionType=\"ItemPurchase\", ItemName=\"Item2\",  Amount=\"200.00\", Status=\"Failure\", FailureReason=\"Not enough funds\"",
"2025-02-10 01:07:00, EventId=\"4\", SessionId=\"123abc\", ActionType=\"ItemPurchase\", ItemName=\"Item1\", Amount=\"500.00\", Status=\"Success\"",
"2025-02-10 01:06:00, EventId=\"3\", SessionId=\"123abc\" ActionType=\"ProfileUpdate\", ElementUpdated=\"Password\", NewValue=\"*******\", OldValue=\"***********\", Status=\"Failure\", FailureReason=\"Password too short\"",
"2025-02-10 01:05:00, EventId=\"2\", SessionId=\"123abc\" ActionType=\"ProfileUpdate\", ElementUpdated=\"Email\", NewValue=\"NewEmail@somenewdomain.com\", OldValue=\"OldEmail@someolddomain.com\", Status=\"Success\"",
"2025-02-10 01:04:00, EventId=\"1\", SessionId=\"123abc\", ActionType=\"Login\", IPAddress=\"10.99.99.99\", Location=\"California\", Status=\"Success\"")
| mvexpand data
| rename data as _raw
| extract
| rex "^(?<_time>[^,]+)"
``` data emulation above ```

 

 

apiprek2
Explorer

Thanks for the detailed response.   To clarify, this is meant as an audit trail for a few users with very limited technical expertise, and I agree with your sentiments.   I'm doing this as an exploratory exercise, although I'm leaning towards this being a maintenance nightmare and am exploring other solutions for providing data.   I'll play around with the json string and/or lookups as in your examples.  thanks!

0 Karma
Get Updates on the Splunk Community!

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...

Splunk AppDynamics Agents Webinar Series

Mark your calendars! On June 24th at 12PM PST, we’re going live with the second session of our Splunk ...

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2025 SplunkTrust is officially open! If you ...