Splunk Search

Regular expression - need to extract a field

Communicator

I am trying to do named extraction for the field sample for each event but failing for some reason. Please help! here are the events :

2017-12-06T11:57:03.744000 POSITION 0 lang=Albanian sample="Unë mund të ha qelq dhe nuk më gjen gjë." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>

2017-12-06T11:40:03.744000 POSITION 1 lang=Arabic sample="أنا قادر على أكل الزجاج و هذا لا يؤلمني." odd=1 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>

2017-12-06T11:23:03.744000 POSITION 2 lang=Armenian sample="Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։" constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>

2017-12-06T11:06:03.744000 POSITION 3 lang=Chinese sample=" 我能吞下玻璃而不傷身體" odd=3 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>

2017-12-06T10:49:03.744000 POSITION 4 lang=Danish sample="Jeg kan spise glas, det gør ikke ondt på mig." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T10:32:03.744000 POSITION 5 lang=Euro sample="€." odd=5 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T10:15:03.744000 POSITION 6 lang=French sample="Je peux manger du verre, ça ne me fait pas de mal." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T09:58:03.744000 POSITION 7 lang=Georgian sample="მინას ვჭამ და არა მტკივა." odd=7 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T09:41:03.744000 POSITION 8 lang=Greek sample="Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T09:24:03.744000 POSITION 9 lang=Hawaiian sample="Hiki iaʻu ke ʻai i ke aniani; ʻaʻole nō lā au e ʻeha." odd=9 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T09:07:03.744000 POSITION 10 lang=Hebrew sample="אני יכול לאכול זכוכית וזה לא מזיק לי." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T08:50:03.744000 POSITION 11 lang=Hindi sample="मैं काँच खा सकता हूँ और मुझे उससे कोई चोट नहीं पहुंचती." odd=11 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T08:33:03.744000 POSITION 12 lang=Hindi sample="मैं काँच खा सकता हूँ, मुझे उस से कोई पीडा नहीं होती." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T08:16:03.744000 POSITION 13 lang=Icelandic sample="Ég get etið gler án þess að meiða mig." odd=13 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T07:59:03.744000 POSITION 14 lang=Japanese sample="私はガラスを食べられます。それは私を傷つけません" constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T07:42:03.744000 POSITION 15 lang=Korean sample="나는 유리를 먹을 수 있어요. 그래도 아프지 않아요" odd=15 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T07:25:03.744000 POSITION 16 lang=Macedonian sample="Можам да јадам стакло, а не ме штета." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T07:08:03.744000 POSITION 17 lang=Mongolian sample="Би шил идэй чадна, надад хортой биш" odd=17 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T06:51:03.744000 POSITION 18 lang=Old Norse sample="Ek get etið gler án þess að verða sár." constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
2017-12-06T06:34:03.744000 POSITION 19 lang=Polish sample="Mogę jeść szkło, i mi nie szkodzi." odd=19 constant="double quotes" 'single quotes' \slashes\ `~!@#$%^&*()-_=+{}|;:<>,./? [brackets] <script>alert("raw event unescaped!")</script>
Tags (1)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi @saurabh_tek11,

Can you please add below configuration in props.conf and check ??

EXTRACT-sample = sample=\"(?<sample>.*?)\"

You can also check by executing below search.

YOUR_SEARCH | rex field=_raw "sample=\"(?<sample>.*?)\"" | table _time sample

Happy Splunking

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Hi @saurabh_tek11,

Can you please add below configuration in props.conf and check ??

EXTRACT-sample = sample=\"(?<sample>.*?)\"

You can also check by executing below search.

YOUR_SEARCH | rex field=_raw "sample=\"(?<sample>.*?)\"" | table _time sample

Happy Splunking

View solution in original post

0 Karma

Communicator

@kamlesh_vaghela - Thanks. It works on splunk. But i am trying to extract this one on https://regex101.com.

0 Karma

SplunkTrust
SplunkTrust

Hi @saurabh_tek11,

Please check this link for https://regex101.com/r/QWdG0g/2 .

Is it ok?

0 Karma

Communicator

Close but still cracking for lang=Hebrew and Arabic. I am trying to understand above both regexes -
"sample=\"(?<sample>[^\"]+)\"" and "sample=\"(?<sample>.*?)\"" so is it that anything after closing angle bracket> is the body of regular expression and in second regex, what is meaning of the optionality after .*

In "sample=\"(?<sample>[^\"]+)\"" does the ^ within character class signifies - a negative (of ", or until last " is found)
OR
start of regex (looking for first " - if yes then what is \"(?<sample this " in the beginning doing) ?

Please enlighten me.

0 Karma

SplunkTrust
SplunkTrust

Hi @saurabh_tek11,

It is difficult for lang=Hebrew and Arabic. I'm able to extract sample value but with ".

https://regex101.com/r/QWdG0g/5

0 Karma

Communicator

@kamlesh_vaghela -This works on splunk. Thank you. And you have enlightened me how swiftly we can get the named regex extraction done in splunk using erex.

Out of curiosity, Can you help me in extracting this on https://regex101.com ?

0 Karma

SplunkTrust
SplunkTrust

Depending on the data, I think following regex would be better

Match everything except double quotes: "sample=\"(?<sample>[^\"]+)\""

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

Communicator

After some more study, i am understanding - that the meaning on [^\"]+ in

"sample=\"(?<sample>[^\"]+)\""

is that it will keep looking until an literal " is matched.
is this correct - @niketnilay ?

0 Karma

SplunkTrust
SplunkTrust

Yes that is correct 🙂 regex101.com also has explanation of this!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma