Splunk Search

Unable to search for regex extracted fields in fixed length file format

Communicator

Hi,

I'm seeing some very unusual behavior when extracting fields in Splunk 6.2. Basically I can see the fields are extracted successfully, but I can't use them to search. I have the following sample data:

101STUS NVLGCCCPDRf4cc5a8023ce40e28c9f260c376dabe9032134120864                      032000123456789                   191820013550000000000000000ESBtesSSP191820013550000000000000000abcdefSSD00071468C4875691F2CC0000000102763400095    C02763400095    20150721112211485002KO-001HIHI12345                         6_1                                                                     ABCD02                    20150721102147122201507211754000000000000000007400  AU S2015072120150721CRN                                MH48A                       0201 ACSP    20150721112211485ACSC    2015072111221148511215201121520                             BIS                               0000000000                                                                                  00000000                                             

This is a fixed length field log file (from mainframe), with no field separators. Therefore, I am using the following regular expression to extract the fields, which basically just extracts them from their position in the log file:

.{11}(?<Type>.{4}).{1}(?<Direction>.{2}).{32}(?<InputMainID>.{6})(?<InputSecondaryID>.{28})(?<OutputMainID>.{6})(?<OutputSecondaryID>.{28})(?<BusinessID>.{36})(?<TransID>.{36})(?<SystemID>.{20})(?<SequenceNo>.{8})(?<CustomerKey>.{15}).{33}(?<BusinessChannel>.{3})(?<SourceChannel>.{6}).{132}(?<SubmitTime>.{17}).{17}0*(?<Figure>\d{1,16}).{90}(?<OverallStatus>.{4}).{4}(?<UpdateTime>.{17})(?<Status>.{4}).{4}(?<TransUpdateTime>.{17})

Now when I search for the log in Splunk, I can see all the fields created with the correct values.

index=main sourcetype=mytype

However, if I try to add the fields to the search string, I am unable to see any results. For example:

index=main sourcetype=mytype Type=GCCC

I've found that if I put a * on either side of the field value, it does find them, which I find strange:

index=main sourcetype=mytype Type=*GCCC*

This indicates that there may be whitespace around the value, but it doesn't appear that way when I look at the values. I've also found that I can successfully search for the fields if I add it as an extra search function after the main search:

index=main sourcetype=mytype | search Type=GCCC

This looks like it doesn't run the field extraction until after the main search, however I can see in a lot of other sourcetypes I have that this isn't the case, as I can search for those.

I've also tried a number of other things to try to get this working:

  • Separating the regexes so each field has it's own extractions. I still get the same issue.
  • Extracting only 1 field from the data to simplify it. I still get the same issue.
  • Adding the following Calculated Field. This works, but I don't want to add an EVAL for every field as I'm sure there will be performance implications

    [mytype]
    EVAL-Status = Status
    Has anyone seen this before? I've played around a lot with the regex, but could there be a problem with this? Is there a better way to extract the fields for a fixed length file?

I suspect that it's partly because the fields have no separators, therefore Splunk isn't able to do keyword searches on partial matches, can anyone confirm?

Thanks in advance.

Ashley

0 Karma
1 Solution

Esteemed Legend

You are running in to this well-known problem:

http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

The solution is to put this into fields.conf in the same directory that you have your field extractions (where props.conf is):

[MyField]
INDEXED_VALUE = false

View solution in original post

Esteemed Legend

You are running in to this well-known problem:

http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

The solution is to put this into fields.conf in the same directory that you have your field extractions (where props.conf is):

[MyField]
INDEXED_VALUE = false

View solution in original post

Communicator

Thanks! Yes this is exactly the issue, thanks for spotting it. I'd be curious as to what would perform better, if I use the INDEXED_VALUE setting or if I was to do a transform on the source data to separate the fields at index time (such as adding commas) to allow it to easily find the values. This data is going to be heavily searched upon so I'm very conscious of performance.

0 Karma

Esteemed Legend

It depends on how heavy the demand for the field is. If 75% of your searches need this field, then surely it would be better to do it at index-time. Most Splunk infrastructures are very broadly shared and so generally any particular set of fields is used very fractionally overall so I have always just used the fields.conf option. Don't forget to click "Accept" to close the question.

0 Karma

Champion

Did you define your field extraction with rex or are they in props.conf? Have you tried the other if it's one of them?

0 Karma

Communicator

Hi Jeff, I've defined them in props.conf. Yes I've used rex to test them, but obviously that works OK as I have to search on the fields after the rex command - eg. | rex " | search field=value
The issue only appears when it's searching for the fields within the original search, if i search for them after the first pipe then it's fine.

0 Karma

SplunkTrust
SplunkTrust

Have you tried quoting the value? index=main sourcetype=mytype Type="*GCCC*"

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Communicator

Yeah I've also tried with quotes, but it's the same result. If I use Type=*GCCC* it does actually work (with or without quotes), but I need to be able to search for the complete string without the wildcards as it makes it very inefficient.

0 Karma