Solved: Re: unique search string from json

sfreudiger · ‎03-16-2016

hello there,

i am trying to analyze json data that contains a lot of fields.
here i want to first search for a string where one part of it is static and one part of it is variable and then get a count of how many time each string was found.

example:

field1: some text mystr-555 more tex33t
field2: other text mystr-555 more textg5
field3: foobaar mystr-555 bar bar foo
field4: xyz mystr-222 foo 98432
field5: random numbers and text mystr-222 more text

so i search for "** *mystr-* **" and get 5 different results (since it found 5 different fields).
i'd like to somehow pass only the found string further so i can do further analysis with it, but i fail miserably 😕

i tried dedup, eval and return but most definitely i used them in a wrong way.

i am very new to working in this field (normally i use grep awk and sed) and i am aware that i am asking newbie things, but i know it is possible and not that hard. maybe some one of you have the five minutes to help me out here.

any input would be appreciated,

regards,
sam

DMohn · ‎03-16-2016

Disregard my previous comment, I missed the first part of your question...

You can use the rex command to do the field extraction, and then count by values of your field.

 base_search | rex max_match=100 "(?<myextraction>mystr-\d+)"  | stats count by myextraction

The max_match is needed to tell the rex command to not stop after the first match, but to create a multi-value extraction. If you presumably have more than 100 hits in your events, you need to adjust this accordingly.

View solution in original post

DMohn · ‎03-16-2016

Disregard my previous comment, I missed the first part of your question...

You can use the rex command to do the field extraction, and then count by values of your field.

 base_search | rex max_match=100 "(?<myextraction>mystr-\d+)"  | stats count by myextraction

The max_match is needed to tell the rex command to not stop after the first match, but to create a multi-value extraction. If you presumably have more than 100 hits in your events, you need to adjust this accordingly.

sfreudiger · ‎03-16-2016

not allowed to have more posts since i only have a rep of 40, hence here my reply to dmohn:

dear dmohn,

thank you for your fast reply!
it looks good but i get an "No results found."

what does the d+ do?

sfreudiger · ‎03-16-2016

ignore my previous post, i was able to get it working.

my example i gave here was not correct, i managed to get what i want with this search:

host=twitter | rex max_match=100 "(?mystr-[0-9][0-9][0-9][0-9]-\d+)" | stats count by myextraction

thank you a lot!!

DMohn · ‎03-16-2016

@sfreudiger:
Glad you managed it anyway!

Just for your information: the regex \d+ translates to 'one or more digits [0-9]' - So you could simplify your extraction to mystr-\d{4}-\d+

To debug regex extractions, have a look at https://regex101.com - I gained a ton of regex knowledge there!

sfreudiger · ‎03-16-2016

thank you again!

cant award any points yet, once i have enough i'll come back here and give you some karma 😉

sfreudiger · ‎03-16-2016

for clarification: i would like my result to be something like this:

mystr-555: count:3
mystr-222: count:2

.. and then i want to go on and add date/time information, but that's next and nothing i worry about now

unique search string from json

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!