Splunk Search

Converting alphanumeric field to numeric values (39.6K:39600)

burras
Communicator

I've seen numerous questions out there that touch on this topic but haven't found an answer that actually meets my specific use case. I have data from several sources that report numeric data (such as bandwidth or other datatypes) but instead of returning the value as a number (such as 39600) it returns in this format: 39.6K. I'm able to ingest those values but Splunk, unsurprisingly, doesn't know how to handle that - it treats it as text instead of a number.

Long story short, I need a way to translate the following data points into numeric values, either at ingest time or at search time:

Congestion  39.6K  55.3K  41.2K  40.2K  39.9K  38.9K  40.9K

We only need to return the first value after "Congestion" - the 39.6K value. The other values are previous poll results and we're collecting that already. The output should end up looking like:

Congestion 39600

This specific data set should never go above "K", but I have other datasets that might go into M or G, etc., so I need something as flexible as possible. I've tried using rex and sed but I've not had any success yet with it. If anyone can provide any help, it'd be greatly appreciated as it will solve multiple issues for us...

0 Karma
1 Solution

niketn
Legend

@burras, you can try the following run anywhere search. Ideally once tested with your data, you should move the same as either macro or Calculated Field Splunk Knowledge Object for easy reuse and maintenance. Following takes care of Kilo, Mega, Giga and Tera number units.

| makeresults
| eval data="123.23K"
| eval dataInt=case( match(data,"K"),tonumber(replace(data,"([^K]+)K","\1"))*1000,
                     match(data,"M"),tonumber(replace(data,"([^M]+)M","\1"))*10000,
                     match(data,"G"),tonumber(replace(data,"([^G]+)G","\1"))*100000,
                     match(data,"T"),tonumber(replace(data,"([^T]+)T","\1"))*1000000)

PS: Assuming you are doing simple math (i.e. 1000 multiplier) and not binary (i.e. 1024). In either case you can change as per your need/use case. First two pipes with makeresults and eval need to be replaced with the actual query you have. You can also test different type/precision of data by changing the value in second eval.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

s2_splunk
Splunk Employee
Splunk Employee

alt text

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Sorry, just read your other comment. You'll need to know the unit if you want to do any kind of math on the converted number. So you'll have to do something similar to what niketnilay suggested above and normalize all values to bytes.

0 Karma

niketn
Legend

@burras, you can try the following run anywhere search. Ideally once tested with your data, you should move the same as either macro or Calculated Field Splunk Knowledge Object for easy reuse and maintenance. Following takes care of Kilo, Mega, Giga and Tera number units.

| makeresults
| eval data="123.23K"
| eval dataInt=case( match(data,"K"),tonumber(replace(data,"([^K]+)K","\1"))*1000,
                     match(data,"M"),tonumber(replace(data,"([^M]+)M","\1"))*10000,
                     match(data,"G"),tonumber(replace(data,"([^G]+)G","\1"))*100000,
                     match(data,"T"),tonumber(replace(data,"([^T]+)T","\1"))*1000000)

PS: Assuming you are doing simple math (i.e. 1000 multiplier) and not binary (i.e. 1024). In either case you can change as per your need/use case. First two pipes with makeresults and eval need to be replaced with the actual query you have. You can also test different type/precision of data by changing the value in second eval.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

burras
Communicator

Thanks niketnilay - I ran with this and it looks like it's creating a new field called dataInt that has the new values in it. However, I noticed that any original values that didn't contain one of the alphanumeric identifiers are completely excluded (i.e. cases where congest=0 or any other value under 10K). Is there a way to add another statement to the case where if it doesn't match any of those you just multiply by 1?

0 Karma

niketn
Legend

@burras, new field is for us to test and compare the outputs original field data and new field dataInt.
In order to handle no Suffix in Congestion field value, you need to add a final default case using true() and leave the value as it is.

For your use case it would be:

<YourBaseSearch>
| eval Congestion=case( match(Congestion,"K"),tonumber(replace(Congestion,"([^K]+)K","\1"))*1000,
                      match(Congestion,"M"),tonumber(replace(Congestion,"([^M]+)M","\1"))*10000,
                      match(Congestion,"G"),tonumber(replace(Congestion,"([^G]+)G","\1"))*100000,
                      match(Congestion,"T"),tonumber(replace(Congestion,"([^T]+)T","\1"))*1000000,
                      true(),Congestion)
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

burras
Communicator

Adding the true() statement definitely seems to fix the no suffix issue. However, I'm still seeing invalid results when the suffix if "M". I hate not being able to upload images from work - it would make this a lot easier.

Using the eval statement provided above and an event of:
Network Congestion 2.11M 29.2K 27.4K 27.4K 27.4K 27.5K 37.4K

I get the following results:

Congestion
21100

network_congestion
2.11M

With the same eval statement and a data set of:
Network Congestion 21.7K 8531 8699 8755 8602 9862 0 0 0 0

I get the following results:

Congestion
21700

network_congestion
21.7K

0 Karma

niketn
Legend

@burras, 2.11M is correctly evaluating to 21100. I hope you are treating M as Mega and not Million. Just to ensure that I am not misinterpreting your expectation, what it the result you are expecting for 2.11M?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

burras
Communicator

In this particular case K actually does equate to thousands while M equates to millions - not necessary kilo and mega. 21.7K should value out to 21700 while 2.11M should value out to 2110000. We don't see any values higher than an M ever. I adjusted the M to multiply by 1000000 and that fixed the problem. Thanks for all your help with this and I'll definitely be able to apply to my others use cases!

0 Karma

burras
Communicator

I also noticed that while it works for values of "K", it does not appear to work for values of "M". We just had one come in that was congestion=2.05M and the dataInt field value ended up as 20500 which would mean it's being captured by the "K" case.

0 Karma

niketn
Legend

I tested with two values in my run anywhere example

| eval data="2.05K"
Which returned dataInt as 2050. And

| eval data="2.05M"
Which returned dataInt as 20500. These values seem to be correct as per me. Please validate again. Dont forget that you have a decimal with two digits precision.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

s2_splunk
Splunk Employee
Splunk Employee

If you have your Congestion value in a field called "Congestion", you can use the convert command with the rmunit() function.
If you have multiple values in a single event, the solution looks a little different.
What exactly do your events look like?

0 Karma

burras
Communicator

We've already extracted the value out into a field that contains the alpha identifier (i.e. congestion=39.6K). The line included above is part of a much larger event - but it will always look like the above and will only ever have a single value (no mv fields for this one).

0 Karma

burras
Communicator

Also of note, the values will not always have a "K" in them. We do see events that come in like this:

Congestion 0 55.3K 1836 40.2K 39.9K 38.9K 40.9K

Whatever solution is devised needs to be able to account for situations where the "K" is not present - I tried the convert memk() command but it didn't handle the entries that didn't have the "K" properly (didn't convert them to K base units).

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...