Getting Data In

Spunk creates multiple indexes for a single batch file execution

anshumandas
New Member

We are forwarding a directory consisting of hundreds of batch job execution logs. However Splunk reindexes the logs buy splitting the logs into multiple events(3, 4. ...sometimes 10 events). As a result of this behaviour, the number of events and for that matter the volume of data is increasing exponentially. The nature/size of logs are not are distinct however the header and footer details are in similar formats. I have provided a snapshot of a sample log file and how splunk splits and indexes the data below:

Actual Log File:

===============================================================
= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Finished waiting for job
Job Status : (1)

Status code = 1
Job submitted successfully
MachineJob Ending.............

xyz.sh; Message; Program ended successfully at: 03/24/2016 00:44:54

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 1
= User Time (Seconds) : 0

= Thu 03/24/16 00:44:54 EDT

How Splunk indexes the log file:

Event-1:

3/24/16

12:43:18.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
DataStage Job Starting...........
Waiting for job...

Event-2:

3/24/16

12:43:18.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Event-3:

3/24/16

12:44:54.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Finished waiting for job
Job Status : (1)

Status code = 1
Job submitted successfully
MachineJob Ending.............

xyz.sh; Message; Program ended successfully at: 03/24/2016 00:44:54

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 1
= User Time (Seconds) : 0

= Thu 03/24/16 00:44:54 EDT

Question: We would like to have Splunk index the data as a single event instead of multiple events. Can you please help suggest the approach to deal in such a scenario? We were not able to find a solution reading through the blogs and would really love to hear from you.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

So what you are sending is considered a multiline event. You will need to setup line breaking and time stamp recognition for each sourcetype. In my past experience, events like this are most likely coming from AIX or mainframe sources. You will need to find all the different formats and create sourcetypes for each. (Splunk currently cant handle having a single file with multiple sourcetypes.)

Per those sourcetypes, youll need to configure the event breaking and time stamping. The GUI is a great way to start for this...

Here is a great place to start : http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Configureeventlinebreaking

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...