Splunk Search

How do I create a field that contains only specific values to disregards events that could be considered a match?

New Member

I am trying to create new fields to search across multiple sources. I have two problems:

  1. When searching for data of source1, and selecting "create new field", I create a field using regex (I highlight the portion that should be considered a value). SPLUNK takes all the events and applies the field label, but sometimes those are not a match. I need to be able to include only the values that I am interested in, and create a field out of those.
  2. When searching across various data sources (say source1 and source2), the values are also mixed up because the columns width vary for different events. I need to exclude some of these values. Basically my problem is my previous question with one added level of complexity.

Thanks much!

0 Karma

SplunkTrust
SplunkTrust

Hello @ivonnepena , welcome to Splunk Answers

I answered this exact same question yesterday, I'll paste my response below and provide the link too

As for your second question, are you referring to fixing the length of your values so they look neat in the column?

when extracting a permanent field, you could either use the built in field extractor which is kind of crappy or you can write your own regular expression. It sounds like you've tried using the built in filed extractor. The reason I say it is crappy is because it builds a sloppy regular expression which does not work across the board. The point of a regular expression is to match patterns even though the value will vary.

If you had the following text and wanted to capture the value between the StatusCode tags, you would need to write a regular expression which will capture the values between the tags.. Also notice how the values will vary (200, Yes, This is a Status Code)

 <StatusCode>200</StatusCode>
 <StatusCode><Yes</StatusCode>
 <StatusCode> This is a Status Code</StatusCode>

If you used the Splunk built in filed extractor then it may only capture the first value but miss all the other ones. So in my opinion, its better to write your own regular expression so you can capture 100% of the values. The way you can pick up regex is by going to www.regex101.com and practicing. It took me about a month before getting to a very skilled level.

So back to your question, after clickingExtract New Fields, you will then be asked what sourcetype you want to use if you have multiple sourcetypes, if you have 1 sourcetype then it will skip this step. If you need to use a field over multiple sourcetypes, then you will need to extract a field for each sourcetype. After this step, there will be something that says I'd prefer to write this regular expression myself.. Click this and enter in the regular expression below, then hit preview. This will let you see what values were extracted. I like to click non-matches to see what didn't match (Usually this part is blank since everything matched), I then click matches and scroll through a dozen events to make sure the right value was extracted. Then you hit save and go take a look at your new field

https://answers.splunk.com/answers/439145/field-extraction-problem-1.html#answer-438320

0 Karma

Legend

Please share some sample events that show match as well as non-match.

0 Karma

New Member

Using the create new field option and regex I get:

2016-07-12 21:47:49 Kernel.Warning 152.7.5.35 Jul 13 04:42:08 EIS-BR kernel: [55214.077676] id=TAC pri=6 func=wrlog_logger line=181 ctx=bump0 msg="Unknown Identity: no enabled identity for token: 198.7.100.26:52675 -> 8.96.3.3:3389 act(DISCARD:)

"Unknown identity" is the value for my field msg_1

Then I want to add another value ("Deny udp") in msg_1:

2016-07-29 10:21:27 Local4.Warning 7.7.7.7 Jul 29 2016 15:25:16 Ent-FW : %ASA-4-106023: Deny udp src inside:198.4.1.10/514 dst outside:8.8.8.8/514 by access-group "insideaccessin" [0x0, 0x0]

But I get instead a lot of values as a match for that field which are not intended.

Please note that these two search strings are located in different columns of the event, as you can see.

The unmatched values are for example (this is intended to be a screen shot of the top values of the field msg_1. As you see Unknown identity is there, but there are many other values included that we don't want):

Top 10 Values Count %

Trusted Host insert 219,786 43.123%

Protected Resource accept 140,836 27.633%

Unknown Identity 84,839 16.646%
insideaccessin" [0x0, 0x0] 43,739 8.582%

hunsberger" 703 0.138%

2016-07-24T12 468 0.092%

2016-07-23T12 430 0.084%

outsideaccessin" [0x0, 0x0] 418 0.082%

2016-07-24T06 382 0.075%

2016-07-24T07

Thanks!

0 Karma