Getting Data In

What is the best way to index Excel sheet to Splunk?

Communicator

I have a raw file of .xlsx that is used in reporting and details are all over the spreadsheet.

One part of the s/s looks like:

Proj namE Subname Projdesc
Proj smsid Sub name desc

Start date       Mon01          Mon02           Mon03           Mon04          Mon05
1/1/2001    1/1/2001    2/1/2001    3/1/2001    4/1/2001    5/1/2001

***next lines are part of the spreadsheet next rows

Business1

onshore 0.5

offshore 0.6

Business2

onshore 0.2

offshore 0.3

I need to format the excel in CSV in this way:

Proj smsid, Proj namE, Projdesc, Start date, Subname,Sub name desc, phase, scope, On_Off, Elapse_month, value
    12345, P123456789, Project 12345, 1/1/2001, P12345Sname, P12345Sname Desc,  Business1, onshore, 01, 0.5
    67891, P987654321, Project 67891, 1/1/2001, P67891Sname, P67891Sname Desc,  Business1, offshore, 01, 0.6
    97542, P989754221, Project 97542, 1/1/2001, P97542Sname, P97542Sname Desc,  Business2, onshore, 01, 0.2
    13579, P981357921, Project 13579, 1/1/2001, P13579Sname, P13579Sname Desc,  Business2, onshore, 01, 0.3

My questions:

  1. what is the best way to ingest Excel file into Splunk?
  2. Is there a way to directly upload excel file to splunk and get the same format as the original in splunk?
  3. what is the best way to convert and format (as above) xlsx?
  4. what is the most appropriate way of relating the fields Business1 or Business2 to its subfields onshore and offshore to format it in CSV?
0 Karma
1 Solution

Splunk Employee
Splunk Employee

So because xls files are binary and not plain text, you cannot directly upload them into Splunk. There are some ways to exctract xml formatting out and get that in, a google search should provide helpful there. (But this isnt preferred.) Additionally, Splunk works best for time series based data, meaning events with specific times associated with them. Your data is somewhat in this format, so you can work with it, but you'll need to get familiar with time formatting in Splunk and time operators.

You best method will be to save the tables as CSV, and ingest each table as a different sourcetype. From there, you can use Splunk's SPL to join the sourcetypes or manipulate the data across those sourcetypes as you need to.

You can start reading here about ingesting these files : http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Extractfieldsfromfileheadersatindextime#Edit_...

Another option, depending on the size of the tables, is to use them as lookups in Splunk. Then you can join on fields, or just load the lookups and work on the data in that format.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

So because xls files are binary and not plain text, you cannot directly upload them into Splunk. There are some ways to exctract xml formatting out and get that in, a google search should provide helpful there. (But this isnt preferred.) Additionally, Splunk works best for time series based data, meaning events with specific times associated with them. Your data is somewhat in this format, so you can work with it, but you'll need to get familiar with time formatting in Splunk and time operators.

You best method will be to save the tables as CSV, and ingest each table as a different sourcetype. From there, you can use Splunk's SPL to join the sourcetypes or manipulate the data across those sourcetypes as you need to.

You can start reading here about ingesting these files : http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Extractfieldsfromfileheadersatindextime#Edit_...

Another option, depending on the size of the tables, is to use them as lookups in Splunk. Then you can join on fields, or just load the lookups and work on the data in that format.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Well, if all the csv's headers are the same, then you can keep them the same sourcetype. But where the headers are different, you'll need different source types.
From an organization point of view, it seems more logical to have different sourcetypes for each different function, but thats just me.

0 Karma

Communicator

Thanks esix! These data are all in the a single excel spreadsheet. Also, multiple files, with the same format, will be processed weekly. Are you saying to group the data and have them as different sourcetypes (sourcetype1 =the proj details, sourcetype2 =Startdate/Month details, sourcetype3 = Business1-offshore, Business1-onshore, etc) during ingestion? Is it advisable to have multiple sourcetype for every excel file processed?

0 Karma