Getting Data In

Recommended way to consume Active Batch logs?

Path Finder

I want to consume log files generated by jobs running under Active Batch. I'm pretty new to splunk. What would be the best way to set this up be? (I could maybe even make an app for this...). I have several questions, any help greatly appreciated.

Which inputs.conf file should best hold this? I see that there are many choices for this... If I make an app, I guess the answer is obvious: in the app's 'local' folder. But if I don't make an app? Is it best to put it under etc/apps/search/local? Or etc/system/local? Or somewhere else?

Active batch generates log files in folders of this form (on Windows):


(the last part being some sort of unique number, followed by a date and timestamp, then a "sequence" number). Each run gets a separate log file. The contents of each log file is composed of whatever the job output is + a common section at the end describing where the process ran, when, exit code, etc (unclear whether something can be done to teach splunk to extract useful data from that common section). I've pasted an example of what that common section looks like at the bottom of this question.

My questions are (sorry, I read the documentation extensively, but can't figure out what the best approach here is)
- how to setup things so that 'source' and 'sourcetype' (and possibly 'host') get extracted from the path name?
- 'host' would correspond to 'MachineName' above in the path
- 'source' should be the full path up to the '_NNNN-date-...' I guess (don't think it's useful to keep that part)
- 'sourcetype' should correspond to the 'process_name' part of the path above

I got started by defining this for example as output for now in etc/apps/search/local/inputs.conf, it works but 'source' and 'sourcetype' aren't useful, and I have to keep repeating sections like these...

host = FFSVK05
disabled = false
followTail = 1
index = qa

Here's what the 'common section' I mentioned above looks like. This is always at the end of each file... Would be awesome if Splunk could be taught about it and make it so that it extracts the info in there in a meaningful way.

*         J O B     S T A T I S T I C S            *
*                                                  *
*          ActiveBatch (r) Version 7               *
*     The Enterprise Job Scheduling System         *
*                Engineered By                     *
*       Advanced Systems Concepts Inc              *
*                *

Job Id               : 33096
Job Name             : some_exe_name
Batch Id             : 33009
Command Line         : \\some_share\some_exe_name.exe some_command_line -foo=bar
Working Directory    : c:\temp
Client Machine       : MachineName
Submitted by         : DOMAIN\user
Job Start Time       : 1/24/2011 9:55:40 AM

Execution User       : DOMAIN\user
Execution Queue      : MachineName
Execution Machine    : MachineName
Job Scheduler        : QASCHED
Job Completed at     : 1/24/2011 10:02:16 AM

Elapsed Time         :      0 00:06:35.791
CPU Time             :      0 00:05:02.140
Completion Status    : 0 (0x0)
                     : (The operation completed successfully. )

-------------Job Object Statistics--------------
Total User Time      :      0 00:05:00.531
Total Kernel Time    :      0 00:00:01.609
Page Faults          : 97807
Process Count        : 1
Peak Process Memory  : 241967104
Peak Job Memory      : 241967104
Read Operations      : 875
Read Byte Count      : 3006947
Write Operations     : 140
Write Byte Count     : 9942
Other I/O Operations : 76578
Other I/O Byte Count : 6066282

  Note: except where specifically noted, all times  
        are based on the Execution Agent Machine.   
*********************End of Log*********************
0 Karma


An alternative to ingesting the log files; where in some cases can be quite large in size, is to get the summary status (the section at the bottom of the log file) by querying the ActiveBatch database directly using the Splunk DB Connect Add-on. Where have it so that the query below is ran every 15 mins.

I.ID AS AbatInstanceID,
I.BatchID AS AbatBatchID,
L.Name AS AbatJobName,
I.Name AS AbatTaskName,
J.Path AS AbatJobPath,
I.BeginExecutionTime AS AbatStartTime,
I.EndExecutionTime AS AbatEndTime,
(CAST(J.ElapsedHours AS varchar) + ':' + CAST(J.ElapsedMinutes AS varchar) + ':' + CAST(J.ElapsedSeconds AS varchar)) AS AbatElapsedTime,
J.StateText AS AbatStatus, 
I.QueueName AS AbatQueName,
J.JobLogFile AS AbatLogFile
FROM ActiveBatch.dbo.Instances AS I
JOIN ActiveBatch.dbo.Jobs AS J ON I.ID = J.JobID
JOIN ActiveBatch.dbo.LiteObjects AS L ON I.TemplatePID = L.ID
DATEPART(year,I.BeginExecutionTime) = DATEPART(year,GETDATE()) AND
DATEPART(month,I.BeginExecutionTime) = DATEPART(month,GETDATE()) AND
DATEPART(day,I.BeginExecutionTime) = DATEPART(day,GETDATE()) AND
DATEPART(hour,I.BeginExecutionTime) = DATEPART(hour,GETDATE()) AND
DATEPART(minute,I.BeginExecutionTime) >= DATEPART(minute,GETDATE() -15)
AND J.StateText IN ('Succeeded','Failed','Canceled')
AND I.ID != I.BatchID
ORDER BY  I.BeginExecutionTime DESC

New Member

The link above is incorrect. It should be ActiveBatch:

0 Karma


First, let the source default to the log file name. This will turn out to be useful later, for several things.

Second, Splunk can easily extract the host name from the log file path name. In inputs.conf, specify host_segment=1 to have Splunk take the first portion of the path name as the host name.
The sourcetype will take a bit more work. Start by setting sourcetype=ActiveBatch in inputs.conf

If you follow this advice so far, then you should only have 1 input stanza in inputs.conf for your ActiveBatch files:

[monitor://<path to the top level directory of this stuff>]

You don't need followTail if these files are created and written once, and then never written to again. You can add your monitor stanza to any inputs.conf. If you are going to edit inputs.conf by hand, I suggest that you edit etc/system/local/inputs.conf

Now, setting the sourcetype. You will need to use props.conf and transforms.conf to do this. Put both of these files whereever you put the corresponding inputs.conf, for consistency. In props.conf, you need the following stanza

[source::<same path you used for inputs.conf>]

In transforms.conf, you will have the following stanza to assign the sourcetype based on the source file.

SOURCE_KEY = MetaData:Source
DEST_KEY   = MetaData:Sourcetype
REGEX      = process_name_(\w+)-
FORMAT     = sourcetype::$1

There is no way for Splunk to set source, sourcetype or host based on data that appears at the end of the file, sorry. However, if you ONLY want Splunk to index that last block of
the file, you could do that by editing props.conf and telling Splunk that the event starts with Job Id and ends with End of Log. You would only create 1 event per log that way, though, and it wouldn't include any of the earlier output.

Hope that gets you started.

Path Finder

Thanks, that clarifies a lot 🙂 Initially I wanted followTail=1 because I want to skip the already existing GBs of log files that were generated before, but I found out that I can use something like ignoreOlderThan=2d which solves that pretty nicely. I see what you mean for sourcetype, it looks like it'll work out nicely. I'm giving it a try and will come back promptly to this answer. Thank you again for the answer.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!