Splunk Search

How can I manipulate an extracted field with a numerical component and a text component?

spencers
Explorer

I have a nightly backup process that provides me with the total amount of data that the process offloads in a syslog message, send to my Splunk server. The data is properly indexed in Splunk and all of the default fields (host, sourcetype, and source) are correctly generated. I'd like to produce a graph that shows the amount of data that is backed up over the last week (or month or other arbitrary time frame) that I can add to a dashboard to display to other system administrators. Here are some examples of my data:

07/29/2010 07:30:37 Total number of bytes transferred:     1.20 TB
08/17/2010 07:30:37 Total number of bytes transferred:     2.00 GB
08/18/2010 01:30:37 Total number of bytes transferred:     151.0 MB
08/19/2010 03:20:37 Total number of bytes transferred:     4.15 GB
08/20/2010 03:38:37 Total number of bytes transferred:     654.40 MB

As you might notice, the units of my data can be MB, GB, or TB (and probably KB as well.) This is why I'm soliciting for help. I'm pretty sure I can solve my problem by using two extracted fields here, one for the numerical value of the data (byte_number), and another for the unit value of the data (byte_unit) and perform manipulations on each field, i.e.:

byte_number = 1.20, 2.00, 151.0, 4.15, 654.40, etc.
byte_unit = TB, GB, MB, GB, MB, etc.

However, I'd like to know if it's possible to extract just one field (bytes_transferred), and use "eval" or some other Splunk function to interpret and graph my data like this:

bytes_transferred  = 1.20 TB, 2.00 GB, 151.0 MB, 4.15 GB, 654.40 MB, etc.

I realize that this creates an extracted field with both numerical values and text values, and I'm not sure how well Splunk's built-in functions can handle this.

Any ideas on how I should proceed with my chart if I want to work with just one extracted field?

Tags (2)
0 Karma

vskoryk_splunk
Splunk Employee
Splunk Employee

Workaround for memk() functon lacking TB, PB, etc support. Create two macros:

[vmemk(1)]
args = field
definition = eval $field$=case(\
like($field$,"%B"), tonumber(rtrim($field$,"B"))/1024,\
like($field$,"%K"), tonumber(rtrim($field$,"K")),\
like($field$,"%M"), tonumber(rtrim($field$,"M"))*pow(1024,1), \
like($field$,"%G"), tonumber(rtrim($field$,"G"))*pow(1024,2), \
like($field$,"%T"), tonumber(rtrim($field$,"T"))*pow(1024,3), \
like($field$,"%P"), tonumber(rtrim($field$,"P"))*pow(1024,4)\
)
iseval = 0

[memk2(1)]
args = fields
definition = foreach $fields$ [`vmemk(<<FIELD>>)`]
iseval = 0

Use them as:

| metadata type=hosts | head 1 | fields _raw | eval _raw="description=\"current behavior\" field_B=512.0B field_K=512.0K field_M=512.0M field_G=5.0G field_T=5.0T field_P=5.0P" | extract

| append [ |metadata type=hosts | head 1 | fields _raw | eval _raw="description=\"1field behavior\" field_B=512.0B field_K=512.0K field_M=512.0M field_G=5.0G field_T=5.0T field_P=5.0P" | extract | `vmemk(field_B)` | `vmemk(field_K)` | `vmemk(field_M)`| `vmemk(field_G)`| `vmemk(field_T)`| `vmemk(field_P)`]
| append [| metadata type=hosts | head 1 | fields _raw | eval _raw="description=\"forloop multi-field behavior\" field_B=512.0B field_K=512.0K field_M=512.0M field_G=5.0G field_T=5.0T field_P=5.0P" | extract | `memk2(field_*)`]
0 Karma

ziegfried
Influencer

You could extract it with an eval command like this:

< your search > | rex field=bytes_transferred "(?<trans_amount>[\d\.]+) (?<trans_unit>\w+)" | eval kb_transferred=case(trans_unit=="KB",trans_amount, trans_unit=="MB",trans_amount*1024, trans_unit=="GB",trans_amount*1048576, trans_unit=="TB",trans_amount*1073741824) 

Explained:

Split the bytes_transferred field into a field containing the amount and a field containing the unit:

| rex field=bytes_transferred "(?<trans_amount>[\d\.]+) (?<trans_unit>\w+)" 

Evaluate the amount in KB based on the unit:

| eval kb_transferred=case(trans_unit=="KB",trans_amount, trans_unit=="MB",trans_amount*1024, trans_unit=="GB",trans_amount*1048576, trans_unit=="TB",trans_amount*1073741824) 

gkanapathy
Splunk Employee
Splunk Employee

I don't know what you're looking for. One or another, you have to separate the numbers from the units if you want them to be using the same units. Whether that is done with auto extraction, rex or eval doesn't matter, but there's not magical way to pretend you didn't separate them.

0 Karma

ziegfried
Influencer

I guess there is no easy way to do it without splitting the field into its parts. You could remove those extra fields after the calculation... Using the convert command like | convert memk(bytes_transferred) isn't possible because it only detects formats like 10.3g or 5m and doesn't detect TB. A custom search command would be an option...

0 Karma

spencers
Explorer

Thanks for the quick response, ziegfried. Your solution uses two extracted fields. However -- like I said in my post -- I'd like to try to use just one extracted field.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!