Splunk Search

Convert categorical string to number

zeophlite
New Member

I have a field in my events that is a string (but does not translate to a number directly)

Is there a way to convert this string to an integer consistently (value does not matter), such as using a hash function? The functions available, such as md5 convert strings to strings, but is there a way to convert this back to an integer? An example is as follows:

user     favorite_fruit     fruit_number
bob      Apple                   1
jane     Pear                    2
pete     Apple                   1

Where user and favorite_fruit are known at index-time, and fruit_number is calculated at search-time. The actual value of fruit_number is arbitrary and doesn't need to be sequential.

I can't use a lookup, as the list of favorite_fruit's is arbitrary.

0 Karma
1 Solution

renjith_nair
Legend

Try something similar. You can use different by clause in streamstats and eventstats based on requirement.

 |stats count|eval fruit="apple,orange,apple,apple,cherry"|eval user="bob" | makemv delim="," fruit| makemv delim="," user|mvexpand fruit|streamstats count|eventstats first(count) as fruit_number by fruit|fields - count

Just add |streamstats count|eventstats first(count) as fruit_number by fruit|fields - count to your original search

---
What goes around comes around. If it helps, hit it with Karma 🙂

View solution in original post

0 Karma

renjith_nair
Legend

Try something similar. You can use different by clause in streamstats and eventstats based on requirement.

 |stats count|eval fruit="apple,orange,apple,apple,cherry"|eval user="bob" | makemv delim="," fruit| makemv delim="," user|mvexpand fruit|streamstats count|eventstats first(count) as fruit_number by fruit|fields - count

Just add |streamstats count|eventstats first(count) as fruit_number by fruit|fields - count to your original search

---
What goes around comes around. If it helps, hit it with Karma 🙂
0 Karma

zeophlite
New Member

Hi Renjith, apologies, I've updated my question to give an example

0 Karma

renjith_nair
Legend

Ok got it.

Try something similar. You can use different by clause in streamstats and eventstats based on requirement.

|stats count|eval fruit="apple,orange,apple,apple,cherry"|eval user="bob" | makemv delim="," fruit| makemv delim="," user|mvexpand fruit|streamstats count|eventstats first(count) as fruit_number by fruit|fields - count

Just add |streamstats count|eventstats first(count) as fruit_number by fruit|fields - count to your original search

---
What goes around comes around. If it helps, hit it with Karma 🙂

zeophlite
New Member

Works great, please edit this into your answer

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

In cybersecurity, defenders respond to threats. Architects design the systems that stop them.    As ...

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...