hello there,
i am trying to analyze json data that contains a lot of fields.
here i want to first search for a string where one part of it is static and one part of it is variable and then get a count of how many time each string was found.
example:
field1: some text mystr-555 more tex33t
field2: other text mystr-555 more textg5
field3: foobaar mystr-555 bar bar foo
field4: xyz mystr-222 foo 98432
field5: random numbers and text mystr-222 more text
so i search for "** *mystr-* **" and get 5 different results (since it found 5 different fields).
i'd like to somehow pass only the found string further so i can do further analysis with it, but i fail miserably 😕
i tried dedup, eval and return but most definitely i used them in a wrong way.
i am very new to working in this field (normally i use grep awk and sed) and i am aware that i am asking newbie things, but i know it is possible and not that hard. maybe some one of you have the five minutes to help me out here.
any input would be appreciated,
regards,
sam
Disregard my previous comment, I missed the first part of your question...
You can use the rex
command to do the field extraction, and then count by values of your field.
base_search | rex max_match=100 "(?<myextraction>mystr-\d+)" | stats count by myextraction
The max_match
is needed to tell the rex command to not stop after the first match, but to create a multi-value extraction. If you presumably have more than 100 hits in your events, you need to adjust this accordingly.
Disregard my previous comment, I missed the first part of your question...
You can use the rex
command to do the field extraction, and then count by values of your field.
base_search | rex max_match=100 "(?<myextraction>mystr-\d+)" | stats count by myextraction
The max_match
is needed to tell the rex command to not stop after the first match, but to create a multi-value extraction. If you presumably have more than 100 hits in your events, you need to adjust this accordingly.
not allowed to have more posts since i only have a rep of 40, hence here my reply to dmohn:
dear dmohn,
thank you for your fast reply!
it looks good but i get an "No results found."
what does the d+ do?
ignore my previous post, i was able to get it working.
my example i gave here was not correct, i managed to get what i want with this search:
host=twitter | rex max_match=100 "(?mystr-[0-9][0-9][0-9][0-9]-\d+)" | stats count by myextraction
thank you a lot!!
@sfreudiger:
Glad you managed it anyway!
Just for your information: the regex \d+ translates to 'one or more digits [0-9]' - So you could simplify your extraction to mystr-\d{4}-\d+
To debug regex extractions, have a look at https://regex101.com - I gained a ton of regex knowledge there!
thank you again!
cant award any points yet, once i have enough i'll come back here and give you some karma 😉
for clarification: i would like my result to be something like this:
mystr-555: count:3
mystr-222: count:2
.. and then i want to go on and add date/time information, but that's next and nothing i worry about now