I am trying to create new fields to search across multiple sources. I have two problems:
When searching for data of source1, and selecting "create new field", I create a field using regex (I highlight the portion that should be considered a value). SPLUNK takes all the events and applies the field label, but sometimes those are not a match. I need to be able to include only the values that I am interested in, and create a field out of those.
When searching across various data sources (say source1 and source2), the values are also mixed up because the columns width vary for different events. I need to exclude some of these values. Basically my problem is my previous question with one added level of complexity.
I answered this exact same question yesterday, I'll paste my response below and provide the link too
As for your second question, are you referring to fixing the length of your values so they look neat in the column?
when extracting a permanent field, you could either use the built in field extractor which is kind of crappy or you can write your own regular expression. It sounds like you've tried using the built in filed extractor. The reason I say it is crappy is because it builds a sloppy regular expression which does not work across the board. The point of a regular expression is to match patterns even though the value will vary.
If you had the following text and wanted to capture the value between the StatusCode tags, you would need to write a regular expression which will capture the values between the tags.. Also notice how the values will vary (200, Yes, This is a Status Code)
<StatusCode> This is a Status Code</StatusCode>
If you used the Splunk built in filed extractor then it may only capture the first value but miss all the other ones. So in my opinion, its better to write your own regular expression so you can capture 100% of the values. The way you can pick up regex is by going to www.regex101.com and practicing. It took me about a month before getting to a very skilled level.
So back to your question, after clickingExtract New Fields, you will then be asked what sourcetype you want to use if you have multiple sourcetypes, if you have 1 sourcetype then it will skip this step. If you need to use a field over multiple sourcetypes, then you will need to extract a field for each sourcetype. After this step, there will be something that says I'd prefer to write this regular expression myself.. Click this and enter in the regular expression below, then hit preview. This will let you see what values were extracted. I like to click non-matches to see what didn't match (Usually this part is blank since everything matched), I then click matches and scroll through a dozen events to make sure the right value was extracted. Then you hit save and go take a look at your new field
But I get instead a lot of values as a match for that field which are not intended.
Please note that these two search strings are located in different columns of the event, as you can see.
The unmatched values are for example (this is intended to be a screen shot of the top values of the field msg_1. As you see Unknown identity is there, but there are many other values included that we don't want):