Developing for Splunk Cloud Services

Use Splunk as Data Lake and report tool on Multiple Big Data

farbodkain
New Member

Hello,
I have multiple source data file with multiple structure, some of them is json files and another is simple text or xml.

  • i want to import these data into splunk and create and use one structure for all of my source that let me to search and get report.in another word parse my every complex data to something searchable and re-portable.(pipeline)
  • is it possible to create pivot table from complex big data and search or select specific attribute or node to get specific report and chart(or can integrated two table together and get report and chart in result).i know i can do these on log or events but my data have a lot of attributes and nodes.
  • splunk is it right choice to use as data lake?
  • splunk is right choice for me?
Tags (1)
0 Karma
1 Solution

woodcock
Esteemed Legend

Although Splunk does have a decent capability to transform data, it is FAR from an ETL tool, so IMHO, it would be a poor choice for this right now. I say "right now" because soon (possibly this year in version 8.0 which we all expect to be released at .conf 2019 in a few months), a significantly rearchitected Splunk will be released which should give developers the ability to add code to any portion of the input/index pipeline. Whenever this evolution is released (if it ever is), you will be able to do what you would need to do to transform your data on the way in. Until then, Splunk is not flexible enough to transform data in the way that you need.

If you WERE to select Splunk for this project and force this "uniform" data format mandate upon it, I would tell you to use HEC and select JSON as the output format. This method allows you to send multiple structured formats into Splunk where it will convert it into JSON for indexing: http://dev.splunk.com/view/event-collector/SP-CAAAE6P

View solution in original post

woodcock
Esteemed Legend

Although Splunk does have a decent capability to transform data, it is FAR from an ETL tool, so IMHO, it would be a poor choice for this right now. I say "right now" because soon (possibly this year in version 8.0 which we all expect to be released at .conf 2019 in a few months), a significantly rearchitected Splunk will be released which should give developers the ability to add code to any portion of the input/index pipeline. Whenever this evolution is released (if it ever is), you will be able to do what you would need to do to transform your data on the way in. Until then, Splunk is not flexible enough to transform data in the way that you need.

If you WERE to select Splunk for this project and force this "uniform" data format mandate upon it, I would tell you to use HEC and select JSON as the output format. This method allows you to send multiple structured formats into Splunk where it will convert it into JSON for indexing: http://dev.splunk.com/view/event-collector/SP-CAAAE6P

View solution in original post

woodcock
Esteemed Legend

It is exceedingly unclear what you need but if you are asking that if you go with Splunk, if it can give a summary of the data that it has, in a JSON output, then the answer is definitely yes. For example, you can do something like this:

index=* earliest=24h latest=now | fieldsummary
0 Karma

farbodkain
New Member

Let me explain in another way, suppose that we have a huge volume of multiple data file format with different structure (heterogeneous data) like csv, json, txt ... and all of them contained with the same data while having different and probably complex format.
Our requirements are to process and transform these data into uniform structure for generating reports and dashboards consequently.
would you please help us to find out how we can benefit from splunk in each steps of this process?

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.