Getting Data In

Using Splunk with a custom application log file

acarmack
Explorer

I am working with a custom application that generates log files and I think I need to create a new source type and then during the indexing phase extract the fields.

  1. I know that they say that 99% of the time you should manage custom fields during search, but this does not make sense to me in this case. I have a custom log file and want to make it easy for folks to search on information by specific fields and doing the field extract at index time seems to make the most sense. Am I incorrect and if so, then what is the recommended process for working with a custom log file and extracting the fields during search (the regex for the entire entry is long) ?

  2. Since I want to extract the fields at the indexing stage, I have updated/added a Props.conf, Transforms.conf and a Field.conf in the
    C:\Program Files\Splunk\etc\system\local. Something is not working correctly and I am wondering if someone can tell me what I am doing wrong

a. I can see in my new sourcetype in the splunk UI that the new values I put in the props.conf are available. So I am assuming that is working ?

[APP_LOG]
DATETIME_CONFIG = 
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
TRANSFORMS-aperf = CUSTOM_LOG

b. I have tested my transform in various tools and right now I am looking just to extract 2 fields before I go further. The transforms seems to work in Splunk Search and in regular expression test sites and does return values. I have tried with and without the format line - no change. When I import a file with this new sourcetype and do a search I do not see the "atimestamp" or "app" field available in the Splunk UI which is what I am expecting. Is my understanding of this process wrong ?

[CUSTOM_LOG]
REGEX = twofields=(?<atimestamp>[^\s]+)\s(?<app>[^\s]+)\s
##FORMAT = atimestamp::"$1" app::"$2"
WRITE_META = true 

Sample Log Entry

2017-06-12T17:50:41.416+0000 SYS1 ERROR {User25|pO8_Z_xZcQPFd3YNzoEmc8fZM6q2aP9eShC8dKN0|} [default thread]com.xxx.QuoteService Required Information for Generating Quote Not Found : nul

c. My fields.conf looks like this

[atimestamp]
INDEXED=true
[app]
INDEXED=true

Any suggestions would be greatly appreciated

Thank You

Tags (1)
1 Solution

acarmack
Explorer

I solved my problem

Root cause was typos - I can make the following suggestions to folks in case anyone reads this down the road.

  1. verify your regex carefully - I ended up using https://regex101.com/
  2. Confirm the naming rules in Splunk - I was not careful enough. Keep it simple for names to start and then get complex down the road.
  3. I did use the Format command in the transforms.conf so it looked like this

transforms.conf

[PERF2]

REGEX= (?[^\s]+)\s(?[^\s]+)\s(?[^\s]+)\s{(?[^\s|}])|(?[^\s|}])|(?[^\s|}])}\s[(?[^]]+)](?[^\s]+)\s(?.)$

FORMAT = atimestamp::"$1" app::"$2" level::"$3" userid::"$4" sessionid::"$5" correlationid::"$6" thread::"$7" sender::"$8" message::"$9"

WRITE_META = true

View solution in original post

acarmack
Explorer

I solved my problem

Root cause was typos - I can make the following suggestions to folks in case anyone reads this down the road.

  1. verify your regex carefully - I ended up using https://regex101.com/
  2. Confirm the naming rules in Splunk - I was not careful enough. Keep it simple for names to start and then get complex down the road.
  3. I did use the Format command in the transforms.conf so it looked like this

transforms.conf

[PERF2]

REGEX= (?[^\s]+)\s(?[^\s]+)\s(?[^\s]+)\s{(?[^\s|}])|(?[^\s|}])|(?[^\s|}])}\s[(?[^]]+)](?[^\s]+)\s(?.)$

FORMAT = atimestamp::"$1" app::"$2" level::"$3" userid::"$4" sessionid::"$5" correlationid::"$6" thread::"$7" sender::"$8" message::"$9"

WRITE_META = true

kmorris_splunk
Splunk Employee
Splunk Employee

"it is better to perform most knowledge-building activities, such as field extraction, at search time. Index-time custom field extraction can degrade performance at both index time and search time. When you add to the number of fields extracted during indexing, the indexing process slows. Later, searches on the index are also slower, because the index has been enlarged by the additional fields, and a search on a larger index takes longer."

http://docs.splunk.com/Documentation/Splunk/7.1.2/Indexer/Indextimeversussearchtime

0 Karma

rvany
Communicator
  1. Do extract fields at search time. That's the recommended way.
  2. There's no need to extract the timestamp via field extraction - you just define this in the sourcetype - see props.conf.spec under etc/system/README or in the admin manual on docs.splunk.com
  3. Creating a field extraction with the extraction tool should give you the app value in an easy way. What part of your sample log data is the app's name?
  4. List item
0 Karma

acarmack
Explorer
  1. I understand that is the recommended way, but I don't understand why ? When I was doing regular expressions in the search the search line was long and likely would not make some of our users happy. I keep thinking I must be missing something in the Splunk concept.

  2. Ok, I figured that was going to be able to be done in a better way and likely would have cleaned that up once I got the fields extracted and indexed.

  3. The app name is SYS1 - when you say extraction tool are you talking about the field extractions in the UI that you have choice of regular expression or delimiter ? If so, that is where I started and it works on the search fairly well, but that is a very long search line.

  4. I will take a look at that

0 Karma

rvany
Communicator

That "missing" part in the splunk concept is: you don't do the field extraction in every single search (although you could for ad-hoc extractions). You define the field extraction once and then use the field (name) you extracted directly inside your search. That way your search strings won't become (much) longer than without using a field at all.

Regarding "app": try '^[^\s]{28}\s(?[^\s]+)\s' as regex. If the log line really starts with a blank, put this in front of the first '['.

"4. List item" didn't belong to my answer. I just didn't see that one... Please ignore it.

0 Karma

CarsonZa
Contributor
  1. on what instance did you put the props and transforms in
  2. props and transforms should really be in ....etc/apps/yourapp/local/
0 Karma

acarmack
Explorer

thanks for the questions - from what I read when you are looking for the field extraction at index time you put them in the global space which is what I did (Splunk\etc\system\local)

I did check with btool to make sure it was following the path correctly.

You do raise another question - is it recommended to create a new app for a new sourcetype ? That I didn't do.

0 Karma

CarsonZa
Contributor

its best practice to not put anything in system/local you could even put it etc/apps/search if you so desired but more importantly just to make sure you put these on the indexer correct?

0 Karma

acarmack
Explorer

so here is where I am a bit confused. In my research I saw reference to an app called the indexer (or something like that), but I don't have it in my current version of Splunk and the documentation did seem to indicate that indexes should be handed at the global level.

So does your answer mean that I can put stuff in etc/apps/search/local that will allow the ingestion to be indexed ?

thanks

0 Karma

CarsonZa
Contributor

yes you could technically put them in search it would be better if you created a separate app for them if not just for organization. What i meant by indexer though is a splunk instance whose role is an indexer. If you are running a standalone my point is mute. Can you tell me what your environment looks like and i can tailor my answer a little better?

0 Karma

adonio
Ultra Champion

hello there,

if you are trying to extract at index time, why not use a structured file as your application log such as csv, json or other? write your files in that format, assign the prebuilt props for that particular sourcetype (make sure they are sen on the first full splunk instance, e.g Heavy Forwarder or Indexer) and enjoy the tstats ride

0 Karma

acarmack
Explorer

That is where I thought I would start with; however, the format of the log file which I have little control over is not setup well for one of the formats unless I am missing something. I tried a couple of types looking to see if I would get what I thought, but even that didn't break down the fields as I had hoped. In addition, I am trying to use meaningful names for the fields that are indexed to help the searches. This morning after posting the question, I did review the default set up for Splunk log files sourcetypes which are supported and still don't see what I am doing wrong. I will say I haven't looked at tstats, but that is my next thing to look at. thanks

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...