I am working with a custom application that generates log files and I think I need to create a new source type and then during the indexing phase extract the fields.
I know that they say that 99% of the time you should manage custom fields during search, but this does not make sense to me in this case. I have a custom log file and want to make it easy for folks to search on information by specific fields and doing the field extract at index time seems to make the most sense. Am I incorrect and if so, then what is the recommended process for working with a custom log file and extracting the fields during search (the regex for the entire entry is long) ?
Since I want to extract the fields at the indexing stage, I have updated/added a Props.conf, Transforms.conf and a Field.conf in the
C:\Program Files\Splunk\etc\system\local. Something is not working correctly and I am wondering if someone can tell me what I am doing wrong
a. I can see in my new sourcetype in the splunk UI that the new values I put in the props.conf are available. So I am assuming that is working ?
[APP_LOG]
DATETIME_CONFIG =
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
TRANSFORMS-aperf = CUSTOM_LOG
b. I have tested my transform in various tools and right now I am looking just to extract 2 fields before I go further. The transforms seems to work in Splunk Search and in regular expression test sites and does return values. I have tried with and without the format line - no change. When I import a file with this new sourcetype and do a search I do not see the "atimestamp" or "app" field available in the Splunk UI which is what I am expecting. Is my understanding of this process wrong ?
[CUSTOM_LOG]
REGEX = twofields=(?<atimestamp>[^\s]+)\s(?<app>[^\s]+)\s
##FORMAT = atimestamp::"$1" app::"$2"
WRITE_META = true
Sample Log Entry
2017-06-12T17:50:41.416+0000 SYS1 ERROR {User25|pO8_Z_xZcQPFd3YNzoEmc8fZM6q2aP9eShC8dKN0|} [default thread]com.xxx.QuoteService Required Information for Generating Quote Not Found : nul
c. My fields.conf looks like this
[atimestamp]
INDEXED=true
[app]
INDEXED=true
Any suggestions would be greatly appreciated
Thank You
I solved my problem
Root cause was typos - I can make the following suggestions to folks in case anyone reads this down the road.
transforms.conf
[PERF2]
REGEX= (?[^\s]+)\s(?[^\s]+)\s(?[^\s]+)\s{(?[^\s|}])|(?[^\s|}])|(?[^\s|}])}\s[(?[^]]+)](?[^\s]+)\s(?.)$
FORMAT = atimestamp::"$1" app::"$2" level::"$3" userid::"$4" sessionid::"$5" correlationid::"$6" thread::"$7" sender::"$8" message::"$9"
WRITE_META = true
I solved my problem
Root cause was typos - I can make the following suggestions to folks in case anyone reads this down the road.
transforms.conf
[PERF2]
REGEX= (?[^\s]+)\s(?[^\s]+)\s(?[^\s]+)\s{(?[^\s|}])|(?[^\s|}])|(?[^\s|}])}\s[(?[^]]+)](?[^\s]+)\s(?.)$
FORMAT = atimestamp::"$1" app::"$2" level::"$3" userid::"$4" sessionid::"$5" correlationid::"$6" thread::"$7" sender::"$8" message::"$9"
WRITE_META = true
"it is better to perform most knowledge-building activities, such as field extraction, at search time. Index-time custom field extraction can degrade performance at both index time and search time. When you add to the number of fields extracted during indexing, the indexing process slows. Later, searches on the index are also slower, because the index has been enlarged by the additional fields, and a search on a larger index takes longer."
http://docs.splunk.com/Documentation/Splunk/7.1.2/Indexer/Indextimeversussearchtime
I understand that is the recommended way, but I don't understand why ? When I was doing regular expressions in the search the search line was long and likely would not make some of our users happy. I keep thinking I must be missing something in the Splunk concept.
Ok, I figured that was going to be able to be done in a better way and likely would have cleaned that up once I got the fields extracted and indexed.
The app name is SYS1 - when you say extraction tool are you talking about the field extractions in the UI that you have choice of regular expression or delimiter ? If so, that is where I started and it works on the search fairly well, but that is a very long search line.
I will take a look at that
That "missing" part in the splunk concept is: you don't do the field extraction in every single search (although you could for ad-hoc extractions). You define the field extraction once and then use the field (name) you extracted directly inside your search. That way your search strings won't become (much) longer than without using a field at all.
Regarding "app": try '^[^\s]{28}\s(?[^\s]+)\s' as regex. If the log line really starts with a blank, put this in front of the first '['.
"4. List item" didn't belong to my answer. I just didn't see that one... Please ignore it.
thanks for the questions - from what I read when you are looking for the field extraction at index time you put them in the global space which is what I did (Splunk\etc\system\local)
I did check with btool to make sure it was following the path correctly.
You do raise another question - is it recommended to create a new app for a new sourcetype ? That I didn't do.
its best practice to not put anything in system/local you could even put it etc/apps/search if you so desired but more importantly just to make sure you put these on the indexer correct?
so here is where I am a bit confused. In my research I saw reference to an app called the indexer (or something like that), but I don't have it in my current version of Splunk and the documentation did seem to indicate that indexes should be handed at the global level.
So does your answer mean that I can put stuff in etc/apps/search/local that will allow the ingestion to be indexed ?
thanks
yes you could technically put them in search it would be better if you created a separate app for them if not just for organization. What i meant by indexer though is a splunk instance whose role is an indexer. If you are running a standalone my point is mute. Can you tell me what your environment looks like and i can tailor my answer a little better?
hello there,
if you are trying to extract at index time, why not use a structured file as your application log such as csv, json or other? write your files in that format, assign the prebuilt props for that particular sourcetype (make sure they are sen on the first full splunk instance, e.g Heavy Forwarder or Indexer) and enjoy the tstats ride
That is where I thought I would start with; however, the format of the log file which I have little control over is not setup well for one of the formats unless I am missing something. I tried a couple of types looking to see if I would get what I thought, but even that didn't break down the fields as I had hoped. In addition, I am trying to use meaningful names for the fields that are indexed to help the searches. This morning after posting the question, I did review the default set up for Splunk log files sourcetypes which are supported and still don't see what I am doing wrong. I will say I haven't looked at tstats, but that is my next thing to look at. thanks