Hello all,
I have data like this
reason="abc";appName=....
reason="xyz";ERServer=...
reason="dfg",ClientBob=...
How to extract only abc,xyz and dfg
note that abc, xyz and dfg might contain character ";" or "," or "=" or doublequotes or single quote.
Basically it is really dynamic and can contain any kind of character. The only consistent pattern are the ending words as mentioned above
Thanks
Okay, given the examples you provided for @cpetterborg above, and your statement that only the 3 mentioned keywords above could mark the end fo your event, a RegEx that would match looks like this:
rex field=data "reason=\"(?<reason>.*)\".(?:appName|ERServer|ClientBob)"
You can add more delimiting keywords in the second (non-capturing) group, seperated by pipes. Keep in mind however that this is quite an "expensive" regex, which could signifcantly impact your search performance. Bot in your case it might be the only way to achive what you need!
Okay, given the examples you provided for @cpetterborg above, and your statement that only the 3 mentioned keywords above could mark the end fo your event, a RegEx that would match looks like this:
rex field=data "reason=\"(?<reason>.*)\".(?:appName|ERServer|ClientBob)"
You can add more delimiting keywords in the second (non-capturing) group, seperated by pipes. Keep in mind however that this is quite an "expensive" regex, which could signifcantly impact your search performance. Bot in your case it might be the only way to achive what you need!
This is exactly something I want, at least logically. However it doesn't work. Splunk only extract all the text between "reason=" and "appName"
But it ignores ERServer and ClientBob. Seems the "OR" statement is not recognized properly, is this a bug?
It works perfectly now after changing to this rex below, thanks a lot mate!
| rex field=_raw "reason=\"(?.*)(appName|ERServer|ClientBob)"
You can try this,
your index | rex field=_raw "reason\=.(?P<field_name>[^\.]\w+)"
Thanks for the help mate, but it doesn't work for me. It only extracts single word after "reason="
If the first and last charakter for the reason
field always will be a double quote and contains no equal-sign, you could try to use a greedy match like this:
rex field=data "reason=\"(?<reason>.*)\"\S+="
This reges will try to match as many charakters as possible until the last double-quote which then is followed by non-whitespace-chars and a equal-sign.
For my set of test data this worked perfectly, even if the reason contained one or more commas, semicolons or double quotes.
I' m really really sorry, the text i want to extract also contains equal sign so this won't work as well. My bad, my description about the issue was not clear enough. I updated original post with full range of character that might be included in the text i want to extract
Here is a run anywhere search that shows the rex
command that will pick out the field as you have provided the data for in your question:
| makeresults |
eval data="reason=\"abc\";appName=....
reason=\"xyz\";ERServer=...
reason=\"dfg\",ClientBob=..." |
makemv delim="
" data |
rex field=data "reason=\"(?<reason>[^\"]*)"
I'm not sure if this meets your requirements, but it can be run in any Splunk search bar and produce the results you have requested. The last line is the only one that is really doing any of the work for that purpose. The other lines are only setting up the data that simulates the events as portrayed above.
Sorry this will not work, as the contents I want to extract will also contain "double quote". Sorry I forgot to mentioned in original post (I have updated it now)
As long as there is something like ;wordchars=
after the quoted data you want to extract, the following will probably work for you:
| makeresults |
eval data="reason=\"a\\\"b\"c\";appName=....
reason=\"x;yz\";ERServer=...
reason=\"df,g\",ClientBob=..." |
makemv delim="
" data |
rex field=data "reason=\"(?<reason>.*?)\"[;,]\w+="
I' m really really sorry, the text i want to extract also contains equal sign so this won't work as well. My bad, my description about the issue was not clear enough. I updated original post with full range of character that might be included in the text i want to extract
Unless the double quotes inside the field are escaped (for example with a backslash) you are pretty much screwed because there doesn't seem to be enough regularity to the string to make extracting it properly an option. If you give an exact example (doesn't have to contain real data, just valid data with all the possibilities, so clean it up for public consumption), it might be possible to help you on this.
Yes you're right that makes sense. 3 examples of the data are as below:
reason="AAABBB";Client="112233",source="aassdd";server="IIHHSS";appName="ooiiuu"
reason="NNCCSA";Network="asdasasd";NextHop="asda",data="asdasasd";Subnet="10.12.12.12,24";RemoteIP="12.12.12.12,mask=255.255.255.0";ClientBob="aabbcc"
reason="dgfsdd";External="asdasas";Policy="asdasasda";Domain="asdasdas";ClientVersion=12312312321";Path="hop1=1213,hop2=23432,hop3=23432,hop4=2343";ERServer=asdadasda"
You still need the examples to have the additional characters, the way they would appear IRL, or at least the =
and "
characters, because the rex
I presented above works on this data.
No it doesn't. From the example above, let's take this
reason="AAABBB";Client="112233",source="aassdd";server="IIHHSS";appName="ooiiuu"
The text I want to extract is everything between reason= and appName=, which is
AAABBB";Client="112233",source="aassdd";server="IIHHSS
The reason I want all of this together is because
1. There are duplicate fields. For example Splunk already has its own field "source" and I don't want to create another
2. Yes It's possible to separate everything into different columns so we will have multiples fields like reason, Client, source, server. But as you can see the texts are really dynamic, the column are not always the same. The above examples are only 3 among many other scenarios. I would need thousands of eval statement to join the fields together which take too much effort. Those texts are error messages and they are only meaningful when joined together.
do they always end in a double quotes?
also, looks like you have key value pairs using "=" and separated with ";" OR ","
take a look at the extract
command
This will not work, as the contents I want to extract will also contain ";" OR "," (as I mentioned in original post)