Hi Splunkers,
I am faced with another problem where the logs I have contain only 3 fields with Start_Loading_Time, _Event_Reference, Event_Name.
An example of this log is shown below in the dummy data:
11:00:31:800,3200,ABCDeposit;11:00:33:940,3201,ABCSelectAmount;11:00:35:320,3202,ABCSelectAccount;11:00:42:670,3203,ABCConfirm;11:00:50:350,3204,ACBSuccessfulEnd
.......
.......
.......
I have used the split function to split the above record by ";", which will give me below:
11:00:31:800,3200,ABCDeposit
11:00:33:940,3201,ABCSelectAmount
11:00:35:320,3202,ABCSelectAccount
11:00:42:670,3203,ABCConfirm
11:00:50:350,3204,ACBSuccessfulEnd
I have then used the below regex to capture the two fields I'm after:
(?Start_Loading_Time[^\,]+)\,\d*\,(?Event_Name\w+[^\n]+)
What I am trying to create is to get "11:00:33:940" -1milisecond as End_Loading_Time for ABCDeposit and use "11:00:33:940" as Start_Loading_Time for ABCSelectAmount similarly I want to capture "11:00:35:320" -1milisecond as End_Loading_Time for ABCSelectAmount and use "11:00:35:320" Start_Loading_Time for ABCSelectAccount and so on.
Any suggestion or help would be much appreciated.
Many Thanks in advance!
So basically, you want to take the start time of the next step (minus 1 ms) as the end time of the current step? One thing you could do is duplicate the timestamp before splitting. So (first 2 lines are just to generate a sample event):
| makeresults
| eval event = "11:00:31:800,3200,ABCDeposit;11:00:33:940,3201,ABCSelectAmount;11:00:35:320,3202,ABCSelectAccount;11:00:42:670,3203,ABCConfirm;11:00:50:350,3204,ACBSuccessfulEnd"
| rex field=event mode=sed "s/;([^,]+)/,\1;\1/g"
| eval event = split(event,";")
| mvexpand event
| rex field=event "(?<Start_Loading_Time>[^,]+),\d*,(?<Event_Name>[^,]+),?(?<End_Loading_Time>.+)?"
| eval End_Loading_Time = strftime(strptime(End_Loading_Time,"%H:%M:%S:%3N")-0.001,"%H:%M:%S:%3N")
The rex sed command on line 3 changes your data into: 11:00:31:800,3200,ABCDeposit,11:00:33:940;11:00:33:940,3201,ABCSelectAmount,11:00:35:320;11:00:35:320,3202,ABCSelectAccount,11:00:42:670;11:00:42:670,3203,ABCConfirm,11:00:50:350;11:00:50:350,3204,ACBSuccessfulEnd effectively duplicating the timestamp from the next step as an extra field to the previous step.
So basically, you want to take the start time of the next step (minus 1 ms) as the end time of the current step? One thing you could do is duplicate the timestamp before splitting. So (first 2 lines are just to generate a sample event):
| makeresults
| eval event = "11:00:31:800,3200,ABCDeposit;11:00:33:940,3201,ABCSelectAmount;11:00:35:320,3202,ABCSelectAccount;11:00:42:670,3203,ABCConfirm;11:00:50:350,3204,ACBSuccessfulEnd"
| rex field=event mode=sed "s/;([^,]+)/,\1;\1/g"
| eval event = split(event,";")
| mvexpand event
| rex field=event "(?<Start_Loading_Time>[^,]+),\d*,(?<Event_Name>[^,]+),?(?<End_Loading_Time>.+)?"
| eval End_Loading_Time = strftime(strptime(End_Loading_Time,"%H:%M:%S:%3N")-0.001,"%H:%M:%S:%3N")
The rex sed command on line 3 changes your data into: 11:00:31:800,3200,ABCDeposit,11:00:33:940;11:00:33:940,3201,ABCSelectAmount,11:00:35:320;11:00:35:320,3202,ABCSelectAccount,11:00:42:670;11:00:42:670,3203,ABCConfirm,11:00:50:350;11:00:50:350,3204,ACBSuccessfulEnd effectively duplicating the timestamp from the next step as an extra field to the previous step.
Thank you very much for the quick response, one should have mentioned that within this log I have another pair of logs that contains as below (Please bear in mind that below data is dummy, the time and action names vary):
"11:00:31:800,3200,ABCDeposit, Selected_Action;11:00:33:940,3201,ABCSelectAmount,Selected_Amount;11:00:35:320,3202,ABCSelectAccount,Selected_Account,;11:00:42:670,3203,ABCConfirm,Selected_Button;11:00:50:350,3204,ACBSuccessfulEnd,Confirmed"
And another one:
"11:00:31:800,3200,ABCDeposit, Selected_Action;11:00:33:940,3201,ABCSelectAmount,0;11:00:35:320,3202,ABCSelectAccount,0;11:00:42:670,3203,ABCConfirm,0;11:00:50:350,3204,ACBSuccessfulEnd,0"
How do I get the | rex field=event mode=sed for the above logs?
I tried to analyse your code but failed. 😞
Thanks a million in advance!
The code I gave should apply just fine to those other logs as well, right? All it does, is find each ;, captures any tekst that follows, until the first , (ie. captures the timestamp). And then replaces that by a ,, followed by a copy of the timestamp, followed by the ; followed by the captured timestamp again. So it just duplicates the timestamp to the left side of the ;.
As an example, it replaces ;11:00:33:940 by ,11:00:33:940;11:00:33:940. That way, when you then split the data by ;, you have the timestamp from the next item also as an extra field at the end of the previous item.
It basically (after splitting) changes this:
11:00:31:800,3200,ABCDeposit
11:00:33:940,3201,ABCSelectAmount
11:00:35:320,3202,ABCSelectAccount
11:00:42:670,3203,ABCConfirm
11:00:50:350,3204,ACBSuccessfulEnd
Into this:
11:00:31:800,3200,ABCDeposit,11:00:33:940
11:00:33:940,3201,ABCSelectAmount,11:00:35:320
11:00:35:320,3202,ABCSelectAccount,11:00:42:670
11:00:42:670,3203,ABCConfirm,11:00:50:350
11:00:50:350,3204,ACBSuccessfulEnd
Did you try it and ran into issues?
Thank you very much @FrankVl much appreciated mate. I had to update the next regex command to match the criteria for them and it is working as I was expecting.
While I will accept your solution as correct I was wondering to know if you can post me some good sites where I can learn more about Regex specifically the one that teaches the " | rex field=event mode=sed"
I have known the regex101 and www.udemy.com but never thought regex will have this functionality.
Once again thank you and Regards,
It is not so much a feature of regular expressions. It is using the sed utility to perform string manipulations. Generic info on the sed utility: https://linux.die.net/man/1/sed
Note: Splunk only supports a very limited set of sed functionalities, namely replace (s) and character substitution (y). See also props.conf spec:
SEDCMD-<class> = <sed script>
* Only used at index time.
* Commonly used to anonymize incoming data at index time, such as credit
card or social security numbers. For more information, search the online
documentation for "anonymize data."
* Used to specify a sed script which Splunk software applies to the _raw
field.
* A sed script is a space-separated list of sed commands. Currently the
following subset of sed commands is supported:
* replace (s) and character substitution (y).
* Syntax:
* replace - s/regex/replacement/flags
* regex is a perl regular expression (optionally containing capturing
groups).
* replacement is a string to replace the regex match. Use \n for back
references, where "n" is a single digit.
* flags can be either: g to replace all matches, or a number to
replace a specified match.
* substitute - y/string1/string2/
* substitutes the string1[i] with string2[i]
You are a legend! thank you for the info mate.