Splunk Search

How to extract a field in a search and table it with the timestamp of the event?

Communicator

Hi Experts,

I'm getting below output in a PDF report from Splunk,

2014-10-10 09:58:27 EDT (Framework:INFO) [RID:526] - [sthisisencryptedpassword:firstname_lastname@
serial.mobile.com] - User authenticated

I only need the report to contain,
Time
firstname_lastname

Can I do this on Splunk or do I need to script on the log file before it is sent to Splunk.

Thanks.

1 Solution

Communicator

I'm assuming you have configured time extraction so you have a proper time variable. If so you can add the following onto your search:

<your current search here> | rex "-\s+\[[^:]+:(?<firstname>[^_]+)_(?<lastname>\w+)@[^]]+\] - User authenticated$" | table _time firstname lastname

The rex command uses regular expressions to do the extraction of a first name and last name. (I'll talk more about the regex below).

We then take the output for the rex command and send it to the table command so we can output the time, first name, and last name fields.

Regular Expression

If you view the regular expression in regex101 you can see an interactive explanation of what its doing (http://regex101.com/r/nR1gK8/1), but a quick rundown is below:

-\s+\[[^:]+:(?<firstname>[^_]+)_(?<lastname>\w+)@[^]]+\] - User authenticated$
  • The - is a literal -
  • \s matches space characters and the + means one or more
  • \[ matches a literal [ character -- usually brackets mean to start a character class so the backslash \ is used to escape the opening bracket
  • [^:]+ speaking of a character class, start a character class [ and ], the ^ in a character class means not, and the : is a literal colon. So while complicated looking it says not a colon one or more times -- this should match sthisisencryptedpassword
  • : is a literal colon
  • (?...) - this is a capturing group so anything matched in the ... part will be saved in a field called first name
  • [^_]+ is one or more characters that are not underscores
  • _ is a literal underscore character
  • (?...) - another capturing group for last name
  • \w+ is one or more word characters
  • @ is a literal @
  • [^]]+ is one or more characters that are not ]
  • \] match a literal ] -- again like the opening bracket, usually brackets mean to start/end a character class, but we want a literal ] so we use a backslash \ to escape it
  • - User authenticated is just literally matching those characters
  • $ is anchoring to the end of the line to try and make it more accurate

Edited to break out the \[ as I incorrectly included it with the explanation below and I added a bit more explanation on the first character class.

View solution in original post

Communicator

Eventually this is what I wanted,

-\s+\[[^:]+:(?<email>[^]]+)] - User authenticated

Thank you triest!

Communicator

I'm assuming you have configured time extraction so you have a proper time variable. If so you can add the following onto your search:

<your current search here> | rex "-\s+\[[^:]+:(?<firstname>[^_]+)_(?<lastname>\w+)@[^]]+\] - User authenticated$" | table _time firstname lastname

The rex command uses regular expressions to do the extraction of a first name and last name. (I'll talk more about the regex below).

We then take the output for the rex command and send it to the table command so we can output the time, first name, and last name fields.

Regular Expression

If you view the regular expression in regex101 you can see an interactive explanation of what its doing (http://regex101.com/r/nR1gK8/1), but a quick rundown is below:

-\s+\[[^:]+:(?<firstname>[^_]+)_(?<lastname>\w+)@[^]]+\] - User authenticated$
  • The - is a literal -
  • \s matches space characters and the + means one or more
  • \[ matches a literal [ character -- usually brackets mean to start a character class so the backslash \ is used to escape the opening bracket
  • [^:]+ speaking of a character class, start a character class [ and ], the ^ in a character class means not, and the : is a literal colon. So while complicated looking it says not a colon one or more times -- this should match sthisisencryptedpassword
  • : is a literal colon
  • (?...) - this is a capturing group so anything matched in the ... part will be saved in a field called first name
  • [^_]+ is one or more characters that are not underscores
  • _ is a literal underscore character
  • (?...) - another capturing group for last name
  • \w+ is one or more word characters
  • @ is a literal @
  • [^]]+ is one or more characters that are not ]
  • \] match a literal ] -- again like the opening bracket, usually brackets mean to start/end a character class, but we want a literal ] so we use a backslash \ to escape it
  • - User authenticated is just literally matching those characters
  • $ is anchoring to the end of the line to try and make it more accurate

Edited to break out the \[ as I incorrectly included it with the explanation below and I added a bit more explanation on the first character class.

View solution in original post

Communicator

Let's say my PDF output has the following,
2014-10-10 09:58:27 EDT (Framework:INFO) [RID:526] - [randompassword123:bob.dole@
politics.com] - User authenticated
2014-10-10 09:59:31 EDT (Framework:INFO) [RID:526] - [sthisisencryptedpassword:tiger_woods@
golf.com] - User authenticated
2014-10-10 09:59:37 EDT (Framework:INFO) [RID:526] - [anotherpassword:roger@
tennis.com] - User authenticated

You can see the e-mail addresses are different. 1. has a period, 2. has an underscore and 3. doesn't have a lastname.

0 Karma

Communicator

You can modify it to just use one field (I called it user) and you can use [^@] to match any character except @ so that you can easily pick up things with _, - etc.

-\s+\[[^:]+:(?.*)@[^]]+\] - User authenticated

Communicator

Hi Again - The requirement changed a bit and they need firstname@emailaddress. I tried to define email like how we did for first name like this,

-\s+\[[^:]+:(?<firstname>[^@])+(?<email>)@[^]]+] - User authenticated | table _time firstname@email

However, splunk doesn't get it and only displays the last letter of the firstname. How can I include the firstname@email in the report.

This is my working expression. As you can see there is no lastname.

rex "-\s+\[[^:]+:(?<firstname>[^@]+)@[^]]+] - User authenticated" | table _time firstname
0 Karma

Communicator

Finally got it,

-\s+\[[^:]+:(?<email>[^]]+)] - User authenticated

Thanks!

0 Karma

Communicator

This is fantastic. Is there a manual where you can learn all of this?
One more question - What if I want multiple users instead of firstname_lastname and some with only a firstname, without an underscore etc.. I tried to do a * in place of the firstname but it errors out.

0 Karma

Communicator

For Splunk search commands, if you click on the links for rex and table, there's really good online documentation about the various commands.

For the regular expressions, there are lots of online tutorials. Its really a matter of just copying and pasting and slowly learning. Pre-college I did them a little bit and then in college I had a student position where I edited lots of Perl scripts. The regex101 site is really helpful for testing regular expressions as it can really help you understand why its matching.

I'm not sure what you mean by multiplier users, can you give an example?

WARNING: I did not actually test this, but it should work (hopefully there aren't typo's)
In terms of the "some with only a first name, without an underscore" you were on the right track to look at * The problem of changing + to a * is the \w no longer must match a character, but you still have the _ listed so you would need an underscore to match. The easiest way is to modify what I had is with put (?: .... )? around _(?\w+)

(?:...) - while the parenthesis make it look like a capturing group, the ?: make it a non-capturing group. The ? at the end makes it optional. Thus we're saying the part inside the paren may or may not exist, match either way.

 -\s+\[[^:]+:(?<firstname>[^_]+)(?:_(?<lastname>\w+))?@[^]]+\] - User authenticated$