Howdy! So I've been playing around with splunk and all of a sudden something that was working Friday afternoon has stopped working Monday morning.
Running on Windows, 4.1.6. The source in question is something like C:\foo\bar\host\date\port13.stat.
As well as the host, I want to take the 'port' number as well and make it an easily searchable field, rather than doing something based on 'source=' every time.
So I set up a field extraction applying to the sourcetype: "\port(?\d+).stat in source"
This has been working perfectly fine. If I search by that source file I will get all my results. Checking the available fields column on the left, I can also click the little button next to 'port' and see 100% of these results have "13" as their 'port'. I can also see all 7000+ of them.
However, if I then click on the '13', (so my search now looks like source="C:\foo\bar\host1\december\port13.stat" port="13") I get only 500 results. Oddly there is about one day every week where I seem to get a few hundred results, wheras the rest are just one or two.
Any ideas? I can't think for the life of me what I have changed to break this and have tried 'clean eventdata' and reindexing.
You can solve that by adding the following to your $SPLUNK_HOME/etc/system/local/fields.conf:
[port] INDEXED_VALUE = false
to tell splunk its index dows not contain port's values.
Here's why... by default, when you specify a search such as:
port=13 | ...
to increase performance Splunk will automatically translate that to:
which means "search for all events having a "13" in their index, then filter upon them to just pick those having the field "port" equal to 13, and discard the others (which must have a "13" in some other part of the text).
So, the basic -default- assumption is "field values are present in the index". In your case, this is just not true as the "source" field is not in the event content, so it has not been indexed. By modifying fields.conf you can change the default behaviour so that Splunk' search will just be "extract everything, then compute the port fields and pick only the port=13 event"
Just to give another example, if your events was something like
2011-01-23 13:05:51 ERROR723 Desc="asdasdasddas"
and you had a field extraction like
Than again, searching for errcode=723 would give you no results as the Splunk index only contains ERROR723 and not 723 alone. Modifying fields.conf again would solve the problem.
I'm afraid that hasn't worked.
I also tried naming it 'EXTRACT-port', restarting, rebuilding indexes etc.
FYI incase it is of relevance, I have two extracts for 'port' from different sourcetypes. The other one is from inside the file/event, rather than from the source.
Ops, sorry, I've mispelled the property name: it is INDEXED_VALUE that you are interested in setting to false. I am modifying my answer accordingly.... Sorry for that.
Aha awesome, fixed!
Apologies if this is a common issue, I have a funny feeling I saw this explained elsewhere but dismissed it due to never having used 'INDEXED_VALUE' before. I wonder what I changed to break things...
Now go check everything else is ok.
Edited to add new answer. Original answer is below, is still valid, but incomplete:
The below workaround works, but in newer versions (5.0 and later), you can also set up a fields.conf setting:
[port] INDEXED_VALUE = source::*port<VALUE>*
In this case, you are telling Splunk what to look for in the index when a the field
port has a given
<VALUE>. Here, you want to look in the indexed field
The advantage of this is that it preserves the standard search syntax and thus allows default chart drilldowns to work.
The explanation that Paolo gives is correct. The problem is due to the fact that Splunk assumes that your field value is a separate token in the event, so that a search for
port=13 internally turns in a search for
port=13 AND 13. This searches for items containing the token
13, i.e., it needs to have segmenter characters on each side of the string. By setting
INDEXED_VALUE = false for the port field, it will simply scan every event and check, i.e., it will basically be going a "grep" rather than using Splunk search.
My recommended workaround if this causes unacceptably slow performance is to create a macro:
[port(1)] args = p definition = (port$p$ AND port="$p$)
You would then, in your search string use `
port(13)` to search instead of
sourcetype=mysourcetype other "term" `port(13)` field1=value2 | stats count
The disadvantage though is that this is a new syntax that must be learned, but worse, clickthrus in the Splunk UI and charts do not know it and thus won't use the macro.
Aha thanks for that, could very well be useful.
Right now I'm only running through a few Mbs a day (with a backlog) and almost all data has this field from the source. It may be a performance issue in larger deployments but I'm not worried about that currently on my little laptop and regardless, yes that differing syntax isn't nice.
I'm going to wander off now and perhaps read up on that whole "search-time" vs "index time" argument...
If only I could have my cake and eat i.!