I've seen numerous questions out there that touch on this topic but haven't found an answer that actually meets my specific use case. I have data from several sources that report numeric data (such as bandwidth or other datatypes) but instead of returning the value as a number (such as 39600) it returns in this format: 39.6K. I'm able to ingest those values but Splunk, unsurprisingly, doesn't know how to handle that - it treats it as text instead of a number.
Long story short, I need a way to translate the following data points into numeric values, either at ingest time or at search time:
Congestion 39.6K 55.3K 41.2K 40.2K 39.9K 38.9K 40.9K
We only need to return the first value after "Congestion" - the 39.6K value. The other values are previous poll results and we're collecting that already. The output should end up looking like:
Congestion 39600
This specific data set should never go above "K", but I have other datasets that might go into M or G, etc., so I need something as flexible as possible. I've tried using rex and sed but I've not had any success yet with it. If anyone can provide any help, it'd be greatly appreciated as it will solve multiple issues for us...
@burras, you can try the following run anywhere search. Ideally once tested with your data, you should move the same as either macro
or Calculated Field
Splunk Knowledge Object
for easy reuse and maintenance. Following takes care of Kilo, Mega, Giga and Tera number units.
| makeresults
| eval data="123.23K"
| eval dataInt=case( match(data,"K"),tonumber(replace(data,"([^K]+)K","\1"))*1000,
match(data,"M"),tonumber(replace(data,"([^M]+)M","\1"))*10000,
match(data,"G"),tonumber(replace(data,"([^G]+)G","\1"))*100000,
match(data,"T"),tonumber(replace(data,"([^T]+)T","\1"))*1000000)
PS: Assuming you are doing simple math (i.e. 1000 multiplier) and not binary (i.e. 1024). In either case you can change as per your need/use case. First two pipes with makeresults
and eval
need to be replaced with the actual query you have. You can also test different type/precision of data by changing the value in second eval.
Sorry, just read your other comment. You'll need to know the unit if you want to do any kind of math on the converted number. So you'll have to do something similar to what niketnilay suggested above and normalize all values to bytes.
@burras, you can try the following run anywhere search. Ideally once tested with your data, you should move the same as either macro
or Calculated Field
Splunk Knowledge Object
for easy reuse and maintenance. Following takes care of Kilo, Mega, Giga and Tera number units.
| makeresults
| eval data="123.23K"
| eval dataInt=case( match(data,"K"),tonumber(replace(data,"([^K]+)K","\1"))*1000,
match(data,"M"),tonumber(replace(data,"([^M]+)M","\1"))*10000,
match(data,"G"),tonumber(replace(data,"([^G]+)G","\1"))*100000,
match(data,"T"),tonumber(replace(data,"([^T]+)T","\1"))*1000000)
PS: Assuming you are doing simple math (i.e. 1000 multiplier) and not binary (i.e. 1024). In either case you can change as per your need/use case. First two pipes with makeresults
and eval
need to be replaced with the actual query you have. You can also test different type/precision of data by changing the value in second eval.
Thanks niketnilay - I ran with this and it looks like it's creating a new field called dataInt that has the new values in it. However, I noticed that any original values that didn't contain one of the alphanumeric identifiers are completely excluded (i.e. cases where congest=0 or any other value under 10K). Is there a way to add another statement to the case where if it doesn't match any of those you just multiply by 1?
@burras, new field is for us to test and compare the outputs original field data and new field dataInt.
In order to handle no Suffix in Congestion field value, you need to add a final default case using true()
and leave the value as it is.
For your use case it would be:
<YourBaseSearch>
| eval Congestion=case( match(Congestion,"K"),tonumber(replace(Congestion,"([^K]+)K","\1"))*1000,
match(Congestion,"M"),tonumber(replace(Congestion,"([^M]+)M","\1"))*10000,
match(Congestion,"G"),tonumber(replace(Congestion,"([^G]+)G","\1"))*100000,
match(Congestion,"T"),tonumber(replace(Congestion,"([^T]+)T","\1"))*1000000,
true(),Congestion)
Adding the true() statement definitely seems to fix the no suffix issue. However, I'm still seeing invalid results when the suffix if "M". I hate not being able to upload images from work - it would make this a lot easier.
Using the eval statement provided above and an event of:
Network Congestion 2.11M 29.2K 27.4K 27.4K 27.4K 27.5K 37.4K
I get the following results:
Congestion
21100
network_congestion
2.11M
With the same eval statement and a data set of:
Network Congestion 21.7K 8531 8699 8755 8602 9862 0 0 0 0
I get the following results:
Congestion
21700
network_congestion
21.7K
@burras, 2.11M is correctly evaluating to 21100. I hope you are treating M as Mega
and not Million
. Just to ensure that I am not misinterpreting your expectation, what it the result you are expecting for 2.11M?
In this particular case K actually does equate to thousands while M equates to millions - not necessary kilo and mega. 21.7K should value out to 21700 while 2.11M should value out to 2110000. We don't see any values higher than an M ever. I adjusted the M to multiply by 1000000 and that fixed the problem. Thanks for all your help with this and I'll definitely be able to apply to my others use cases!
I also noticed that while it works for values of "K", it does not appear to work for values of "M". We just had one come in that was congestion=2.05M and the dataInt field value ended up as 20500 which would mean it's being captured by the "K" case.
I tested with two values in my run anywhere example
| eval data="2.05K"
Which returned dataInt as 2050. And
| eval data="2.05M"
Which returned dataInt as 20500. These values seem to be correct as per me. Please validate again. Dont forget that you have a decimal with two digits precision.
If you have your Congestion value in a field called "Congestion", you can use the convert command with the rmunit() function.
If you have multiple values in a single event, the solution looks a little different.
What exactly do your events look like?
We've already extracted the value out into a field that contains the alpha identifier (i.e. congestion=39.6K). The line included above is part of a much larger event - but it will always look like the above and will only ever have a single value (no mv fields for this one).
Also of note, the values will not always have a "K" in them. We do see events that come in like this:
Congestion 0 55.3K 1836 40.2K 39.9K 38.9K 40.9K
Whatever solution is devised needs to be able to account for situations where the "K" is not present - I tried the convert memk() command but it didn't handle the entries that didn't have the "K" properly (didn't convert them to K base units).