I have a raw file of .xlsx that is used in reporting and details are all over the spreadsheet.
One part of the s/s looks like:
Proj namE Subname Projdesc
Proj smsid Sub name desc
Start date Mon01 Mon02 Mon03 Mon04 Mon05
1/1/2001 1/1/2001 2/1/2001 3/1/2001 4/1/2001 5/1/2001
***next lines are part of the spreadsheet next rows
Business1
onshore 0.5
offshore 0.6
Business2
onshore 0.2
offshore 0.3
I need to format the excel in CSV in this way:
Proj smsid, Proj namE, Projdesc, Start date, Subname,Sub name desc, phase, scope, On_Off, Elapse_month, value
12345, P123456789, Project 12345, 1/1/2001, P12345Sname, P12345Sname Desc, Business1, onshore, 01, 0.5
67891, P987654321, Project 67891, 1/1/2001, P67891Sname, P67891Sname Desc, Business1, offshore, 01, 0.6
97542, P989754221, Project 97542, 1/1/2001, P97542Sname, P97542Sname Desc, Business2, onshore, 01, 0.2
13579, P981357921, Project 13579, 1/1/2001, P13579Sname, P13579Sname Desc, Business2, onshore, 01, 0.3
My questions:
So because xls files are binary and not plain text, you cannot directly upload them into Splunk. There are some ways to exctract xml formatting out and get that in, a google search should provide helpful there. (But this isnt preferred.) Additionally, Splunk works best for time series based data, meaning events with specific times associated with them. Your data is somewhat in this format, so you can work with it, but you'll need to get familiar with time formatting in Splunk and time operators.
You best method will be to save the tables as CSV, and ingest each table as a different sourcetype. From there, you can use Splunk's SPL to join the sourcetypes or manipulate the data across those sourcetypes as you need to.
You can start reading here about ingesting these files : http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Extractfieldsfromfileheadersatindextime#Edit_...
Another option, depending on the size of the tables, is to use them as lookups in Splunk. Then you can join on fields, or just load the lookups and work on the data in that format.
So because xls files are binary and not plain text, you cannot directly upload them into Splunk. There are some ways to exctract xml formatting out and get that in, a google search should provide helpful there. (But this isnt preferred.) Additionally, Splunk works best for time series based data, meaning events with specific times associated with them. Your data is somewhat in this format, so you can work with it, but you'll need to get familiar with time formatting in Splunk and time operators.
You best method will be to save the tables as CSV, and ingest each table as a different sourcetype. From there, you can use Splunk's SPL to join the sourcetypes or manipulate the data across those sourcetypes as you need to.
You can start reading here about ingesting these files : http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Extractfieldsfromfileheadersatindextime#Edit_...
Another option, depending on the size of the tables, is to use them as lookups in Splunk. Then you can join on fields, or just load the lookups and work on the data in that format.
Well, if all the csv's headers are the same, then you can keep them the same sourcetype. But where the headers are different, you'll need different source types.
From an organization point of view, it seems more logical to have different sourcetypes for each different function, but thats just me.
Thanks esix! These data are all in the a single excel spreadsheet. Also, multiple files, with the same format, will be processed weekly. Are you saying to group the data and have them as different sourcetypes (sourcetype1 =the proj details, sourcetype2 =Startdate/Month details, sourcetype3 = Business1-offshore, Business1-onshore, etc) during ingestion? Is it advisable to have multiple sourcetype for every excel file processed?