Hi,
I'm seeing some very unusual behavior when extracting fields in Splunk 6.2. Basically I can see the fields are extracted successfully, but I can't use them to search. I have the following sample data:
101STUS NVLGCCCPDRf4cc5a8023ce40e28c9f260c376dabe9032134120864 032000123456789 191820013550000000000000000ESBtesSSP191820013550000000000000000abcdefSSD00071468C4875691F2CC0000000102763400095 C02763400095 20150721112211485002KO-001HIHI12345 6_1 ABCD02 20150721102147122201507211754000000000000000007400 AU S2015072120150721CRN MH48A 0201 ACSP 20150721112211485ACSC 2015072111221148511215201121520 BIS 0000000000 00000000
This is a fixed length field log file (from mainframe), with no field separators. Therefore, I am using the following regular expression to extract the fields, which basically just extracts them from their position in the log file:
.{11}(?<Type>.{4}).{1}(?<Direction>.{2}).{32}(?<InputMainID>.{6})(?<InputSecondaryID>.{28})(?<OutputMainID>.{6})(?<OutputSecondaryID>.{28})(?<BusinessID>.{36})(?<TransID>.{36})(?<SystemID>.{20})(?<SequenceNo>.{8})(?<CustomerKey>.{15}).{33}(?<BusinessChannel>.{3})(?<SourceChannel>.{6}).{132}(?<SubmitTime>.{17}).{17}0*(?<Figure>\d{1,16}).{90}(?<OverallStatus>.{4}).{4}(?<UpdateTime>.{17})(?<Status>.{4}).{4}(?<TransUpdateTime>.{17})
Now when I search for the log in Splunk, I can see all the fields created with the correct values.
index=main sourcetype=mytype
However, if I try to add the fields to the search string, I am unable to see any results. For example:
index=main sourcetype=mytype Type=GCCC
I've found that if I put a * on either side of the field value, it does find them, which I find strange:
index=main sourcetype=mytype Type=*GCCC*
This indicates that there may be whitespace around the value, but it doesn't appear that way when I look at the values. I've also found that I can successfully search for the fields if I add it as an extra search function after the main search:
index=main sourcetype=mytype | search Type=GCCC
This looks like it doesn't run the field extraction until after the main search, however I can see in a lot of other sourcetypes I have that this isn't the case, as I can search for those.
I've also tried a number of other things to try to get this working:
Adding the following Calculated Field. This works, but I don't want to add an EVAL for every field as I'm sure there will be performance implications
[mytype]
EVAL-Status = Status
Has anyone seen this before? I've played around a lot with the regex, but could there be a problem with this? Is there a better way to extract the fields for a fixed length file?
I suspect that it's partly because the fields have no separators, therefore Splunk isn't able to do keyword searches on partial matches, can anyone confirm?
Thanks in advance.
Ashley
You are running in to this well-known problem:
http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/
The solution is to put this into fields.conf in the same directory that you have your field extractions (where props.conf is):
[MyField]
INDEXED_VALUE = false
You are running in to this well-known problem:
http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/
The solution is to put this into fields.conf in the same directory that you have your field extractions (where props.conf is):
[MyField]
INDEXED_VALUE = false
Thanks! Yes this is exactly the issue, thanks for spotting it. I'd be curious as to what would perform better, if I use the INDEXED_VALUE setting or if I was to do a transform on the source data to separate the fields at index time (such as adding commas) to allow it to easily find the values. This data is going to be heavily searched upon so I'm very conscious of performance.
It depends on how heavy the demand for the field is. If 75% of your searches need this field, then surely it would be better to do it at index-time. Most Splunk infrastructures are very broadly shared and so generally any particular set of fields is used very fractionally overall so I have always just used the fields.conf
option. Don't forget to click "Accept" to close the question.
Did you define your field extraction with rex
or are they in props.conf? Have you tried the other if it's one of them?
Hi Jeff, I've defined them in props.conf. Yes I've used rex to test them, but obviously that works OK as I have to search on the fields after the rex command - eg. | rex " | search field=value
The issue only appears when it's searching for the fields within the original search, if i search for them after the first pipe then it's fine.
Have you tried quoting the value? index=main sourcetype=mytype Type="*GCCC*"
Yeah I've also tried with quotes, but it's the same result. If I use Type=*GCCC*
it does actually work (with or without quotes), but I need to be able to search for the complete string without the wildcards as it makes it very inefficient.