Solved: Re: unique search string from json

sfreudiger · ‎03-16-2016

hello there,

i am trying to analyze json data that contains a lot of fields.
here i want to first search for a string where one part of it is static and one part of it is variable and then get a count of how many time each string was found.

example:

field1: some text mystr-555 more tex33t
field2: other text mystr-555 more textg5
field3: foobaar mystr-555 bar bar foo
field4: xyz mystr-222 foo 98432
field5: random numbers and text mystr-222 more text

so i search for "** *mystr-* **" and get 5 different results (since it found 5 different fields).
i'd like to somehow pass only the found string further so i can do further analysis with it, but i fail miserably 😕

i tried dedup, eval and return but most definitely i used them in a wrong way.

i am very new to working in this field (normally i use grep awk and sed) and i am aware that i am asking newbie things, but i know it is possible and not that hard. maybe some one of you have the five minutes to help me out here.

any input would be appreciated,

regards,
sam

DMohn · ‎03-16-2016

Disregard my previous comment, I missed the first part of your question...

You can use the rex command to do the field extraction, and then count by values of your field.

 base_search | rex max_match=100 "(?<myextraction>mystr-\d+)"  | stats count by myextraction

The max_match is needed to tell the rex command to not stop after the first match, but to create a multi-value extraction. If you presumably have more than 100 hits in your events, you need to adjust this accordingly.

View solution in original post

DMohn · ‎03-16-2016

Disregard my previous comment, I missed the first part of your question...

You can use the rex command to do the field extraction, and then count by values of your field.

 base_search | rex max_match=100 "(?<myextraction>mystr-\d+)"  | stats count by myextraction

The max_match is needed to tell the rex command to not stop after the first match, but to create a multi-value extraction. If you presumably have more than 100 hits in your events, you need to adjust this accordingly.

sfreudiger · ‎03-16-2016

not allowed to have more posts since i only have a rep of 40, hence here my reply to dmohn:

dear dmohn,

thank you for your fast reply!
it looks good but i get an "No results found."

what does the d+ do?

sfreudiger · ‎03-16-2016

ignore my previous post, i was able to get it working.

my example i gave here was not correct, i managed to get what i want with this search:

host=twitter | rex max_match=100 "(?mystr-[0-9][0-9][0-9][0-9]-\d+)" | stats count by myextraction

thank you a lot!!

DMohn · ‎03-16-2016

@sfreudiger:
Glad you managed it anyway!

Just for your information: the regex \d+ translates to 'one or more digits [0-9]' - So you could simplify your extraction to mystr-\d{4}-\d+

To debug regex extractions, have a look at https://regex101.com - I gained a ton of regex knowledge there!

sfreudiger · ‎03-16-2016

thank you again!

cant award any points yet, once i have enough i'll come back here and give you some karma 😉

sfreudiger · ‎03-16-2016

for clarification: i would like my result to be something like this:

mystr-555: count:3
mystr-222: count:2

.. and then i want to go on and add date/time information, but that's next and nothing i worry about now

unique search string from json

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Best Practices: Splunk auto adjust pipeline queue

Request for Professional Development: Attending .conf26

Join the Conversation