Getting Data In

Spunk creates multiple indexes for a single batch file execution

anshumandas
New Member

We are forwarding a directory consisting of hundreds of batch job execution logs. However Splunk reindexes the logs buy splitting the logs into multiple events(3, 4. ...sometimes 10 events). As a result of this behaviour, the number of events and for that matter the volume of data is increasing exponentially. The nature/size of logs are not are distinct however the header and footer details are in similar formats. I have provided a snapshot of a sample log file and how splunk splits and indexes the data below:

Actual Log File:

===============================================================
= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Finished waiting for job
Job Status : (1)

Status code = 1
Job submitted successfully
MachineJob Ending.............

xyz.sh; Message; Program ended successfully at: 03/24/2016 00:44:54

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 1
= User Time (Seconds) : 0

= Thu 03/24/16 00:44:54 EDT

How Splunk indexes the log file:

Event-1:

3/24/16

12:43:18.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
DataStage Job Starting...........
Waiting for job...

Event-2:

3/24/16

12:43:18.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Event-3:

3/24/16

12:44:54.000 AM

= JOB : ABCD[(0900 03/23/16),(0AAAAAAAAAAARCVF)].tttttt
= USER : deb Sponsor svvbnmn,SHELL=/bin/ksh
= JCLFILE : $HOME/jobs/xyz.sh
= Job Number: 20

= Thu 03/24/16 00:43:18 EDT

ABC for UNIX/ghcgv 11.2
HGF Starting /opt/app/hghj/dxdxfd/VCX/ghcgv $HOME/jobs/xyz.sh
Tivoli Workload Scheduler (UNIX)/ghcgv 11.2 (20130417)
Installed for user "dxdxfd".
Locale LANG set to the following: "en"
stty: : No such device or address
stty: : No such device or address
stty: : No such device or address
+------------------------------------------------------------+
xyz.sh; Message; Program started at: 03/24/2016 00:43:18
Machine Job Starting...........
Waiting for job...

Finished waiting for job
Job Status : (1)

Status code = 1
Job submitted successfully
MachineJob Ending.............

xyz.sh; Message; Program ended successfully at: 03/24/2016 00:44:54

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 1
= User Time (Seconds) : 0

= Thu 03/24/16 00:44:54 EDT

Question: We would like to have Splunk index the data as a single event instead of multiple events. Can you please help suggest the approach to deal in such a scenario? We were not able to find a solution reading through the blogs and would really love to hear from you.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

So what you are sending is considered a multiline event. You will need to setup line breaking and time stamp recognition for each sourcetype. In my past experience, events like this are most likely coming from AIX or mainframe sources. You will need to find all the different formats and create sourcetypes for each. (Splunk currently cant handle having a single file with multiple sourcetypes.)

Per those sourcetypes, youll need to configure the event breaking and time stamping. The GUI is a great way to start for this...

Here is a great place to start : http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Configureeventlinebreaking

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...