Getting Data In

Transforms field/value extract not fully working

pgullette
Explorer

I have a log that has multiple fields and values and each event has a different set of fields and values. To handle that, I'm using a transforms stanza with a REGEX to separately extract the field and value at search time. The transform seems to be working as expected as Splunk shows all of my fields on the left side with all of their values. But that's where it stops working.

When I try to actually use one of the extracted fields in search, I get very odd behavior. If I do a search with field=value, I get no results, even if I use Splunk's built-in extraction from the field list on the left side to construct my search string. However, if I add an asterisk (*) to the end of the field=value search, then I get results. This makes me think my REGEX is extracting a bit more than it should, but I can't see any extra characters or non-printable ones.

Here is a sample event that is being extracted correctly:

2014-03-12 11:26:32,389 INFO  SSID:AA87309DKj9911FFFFACDD [pool-10251-thread-1] SERVICE_KEY=5688 SERVICE=myService INDEX_POS=0 APPLICATION_ID==APPID~ACCOUNT_NUM==123456789~CUST_SUBTYPE==R~CUST_TYPE==I~ENV_CODE==ENV~MARKET_CODE==123~OPERATOR_ID==123456~ORIGIN_SYSTEM==APP~PSUBMKTGRP_ROW_COUNT==12~RUN_DATE==20140312~SUBMKT_SUB_MARKET_CODE==ABC~TRANSACTION_MODE==O~

And here is my stanza from transforms:

REGEX = ([A-Z0-9_]*?)==([^~]*?)~
FORMAT = $1::$2

In this case, Splunk properly pulls out all the field names (APPLICATION_ID, ACCOUNT_NUM, CUST_SUBTYPE, etc), and the values are also correct as the left side list of fields shows. But if my search is something like APPLICATION_ID=APPID, I'll get no results. However, simply making the search APPLICATION_ID=APPID* will work.

Because Splunk is able to properly extract field names and values in the left side in verbose mode, but then fails in search mode, this makes me think this could be a bug in Splunk. And potentially it's related to my data having double equal signs. The reason for the double equal signs is to prevent Splunk from trying to auto extract since in some cases these fields can contain an equal sign as part of the value.

Hopefully that's enough information for someone to give me some pointers. Thanks.

Tags (1)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Regardless of the field extraction, searching for the word APPID doesn't find your event while searching for APPID* does, right?

If that's the case, you're being hampered by a performance optimization Splunk makes. It assumes field values are indexed tokens, which yours is not. You can stop Splunk from making that assumption in fields.conf, see http://docs.splunk.com/Documentation/Splunk/6.0.2/admin/fieldsconf for reference.
You could set INDEXED_VALUE=false for your field, forcing Splunk to do a fulltext search for your value...
Or, you could use the fact that your values are preceded by an equals sign so they are the start of an indexed token - I believe you might get away with setting INDEXED_VALUE=s/$/*/. The benefit of this is that Splunk will still utilize the indexed values for performance gains, your users just don't need to add the asterisk themselves.

Some background: http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

fvegdom
Path Finder

thanks for this, solved my problem too

0 Karma

pgullette
Explorer

So you're saying that I have to exhaustively list all fields that I want to be able to search in fields.conf? I need to be able to search any of them which makes this a daunting task.

Let's say I have 200 fields. So I'll have to put 200 different field definitions in fields.conf. If I'm going to do that, couldn't I just define all the possible REGEX patterns in props or transforms as a regular extract with a field name? And if my assumption is correct, is making them normal extractions better from a performance perspective in terms of them being indexed according to normal heuristics?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You can check that yourself, see the blog post I included in the answer for background.

0 Karma

pgullette
Explorer

Oh, and one other thing to add that might make a difference. If I search for a dynamically extracted field whose value is a single character, then my search will work. Based on my example event above, if I search for CUST_SUBTYPE=R, then that will work.

Does this behavior still match what you referred to above?

And thanks for the help.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Nah, only for fields where you don't want INDEXED_VALUE=true which is the default. Basically, only touch that value if you run into the problem you describe, and only touch it for those specific fields.

0 Karma

pgullette
Explorer

One thing that doesn't seem to make much sense is the [] stanza in fields.conf. That makes it seem like I have to define the INDEXED_VALUE property for each field that my dynamic REGEX is extracting. If that's the case, then I'm no better off because I still have to define every possible field in fields.conf. The reason I'm using the REGEX is so I didn't have to extract each possible field. The dynamic REGEX is extracting hundreds of different field names and values.

Will I need to define the INDEXED_VALUE property for each of these?

0 Karma

pgullette
Explorer

Yes, if I search for APPID, I also get no results found.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...