Splunk ITSI

I wanted to do one hot encoding on categorical variables for Machine Learning . How can I do that in Splunk?

jcvytla
New Member

How to do label encoding on categorical variables in splunk. I'm new to splunk and trying to explore hidden features. Can I also know how to split the fields as done in excel.

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

When using the Machine Learning toolkit - it will actually convert your categorical variables into indicator variables - columns of 0's and 1's automatically, behind the scenes. Pretty nifty! It uses panda's get_dummies to do this.

If you want to do so manually, you can try using eval:

| eval {fieldToEncode} = 1
| fillnull

The way this works: the { } around the field mean to take the field's value, and use that as the name for the field. Then we assign that to 1, and fill in all the blanks with zeros.

View solution in original post

aljohnson_splun
Splunk Employee
Splunk Employee

When using the Machine Learning toolkit - it will actually convert your categorical variables into indicator variables - columns of 0's and 1's automatically, behind the scenes. Pretty nifty! It uses panda's get_dummies to do this.

If you want to do so manually, you can try using eval:

| eval {fieldToEncode} = 1
| fillnull

The way this works: the { } around the field mean to take the field's value, and use that as the name for the field. Then we assign that to 1, and fill in all the blanks with zeros.

jcvytla
New Member

Hi @aljohnson

If I want to encode Server_name column , should I use
| eval {Server_name} =1 ?

and extension of that question : suppose i have app_id column , which have values like 1234456122.xxxx

Now I want only the numerical part ,and want to remove the xxxx part...how do I do it in splunk

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Thats one way you could do it - sure. You'd need to add the fillnull command in order to get the zeros in the columns though, too.

For your second question, you can use eval or rex or many other search commands to do that:

http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/Eval
http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/rex
http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/replace

0 Karma

aeapen
New Member

@aljohnson_splunk, can label encoding done the same way? I want to categorical variables into numbers with some sort of ranking to it. Is there any method for this?

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

You could use a lookup if the values are static and you know the categories before hand. Otherwise I'd think you need to add a custom algorithm - https://docs.splunk.com/Documentation/MLApp/3.2.0/API/Overview

0 Karma

aeapen
New Member

thank you for the answer.

0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

At Cisco, purpose isn’t a tagline—it’s a commitment. Cisco’s FY25 Purpose Report outlines how the company is ...

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join us for a live Demo Day at the Cisco Store on January 21st 10:00am - 11:00am PST In the fast-paced world ...