Getting Data In

Adding Data TYPE/SOURCE

Communicator

What is the difference data type and data source.

When I used the manager to add a data type (and proceeding with "A file or directory of files" option), it starts to index the data and monitor the directory.

yet when I chose Data Source and clicked "From files and directories". specifying a directory after selecting sourcetypes etc, the data do not get indexed in my search app.

need some clarification regarding the difference between this 2.

Thanks for taking your time to help!

Tags (3)

Influencer

attgjh1,

My understanding of the difference is this...

Data Type section - This is a more guided approach for some of the different types of data Splunk can index. "A file or directory..." is not a great choice as there is it would just be a file or directory. But if you look at "Syslog" in the data type section you'll see 4 options on where to get syslog from, providing a more guided approach. Options are:

  • Consume any syslog files or directories on this Splunk server
  • Consume syslog over UDP
  • Consume syslog over TCP
  • Configure Splunk to listen for syslog data on any TCP port.

Data Source section - This is for "slightly" more experienced users, that know exactly what they want. Again the "A file or directory..." is still just a file a or directory... not much difference. But syslog has already been seperated into the different sections. i.e. the options are already there:

  • From a TCP Port
  • From a UDP Port

So for example, if you knew you had to collect syslog over UDP, you could go straight to "From a UDP Port" in Data Source section. If you were unsure of how you were going to collect syslog, you could following the guided section in Data Type for more info. Same goes for the rest.

As for the second part of your question, "...the data do not get indexed in my search app...". If you have already indexed the data through the first option, you can not index it twice. Splunk put in place checks to stop the same data being indexed (i.e. you do not want two events for the same raw data in most cases). Put simply, Splunk control this with the "fishbucket" index (which contains CRC (Cyclic redundancy check) checks and seek points). For more info on this, see the following blog here. However unless you are sure of what you are doing. I would not recommend performing actions on the fishbucket. As it could effect your license or your results(i.e. you could index the same data twice) and potential disrupt your Splunk installation... just thought it was a good read to understand a little more about Splunk.

Hope this helps,

MHibbin

Communicator

This is the problem im having now:

i added a source (no udp and stuff)
it has the following attributes:

Set host: Constant Value
Source type:
Destination index: default
number of files: 1516
App: Search
Status: Enabled

However, the events are not showing up in my search at all.

Im just trying to load some old logfiles to run some analysis but i cant get by this indexing file part

0 Karma

Influencer

OK, I was only using that as an example to answer your question, as it was a bit more descriptive. You should definately look at the Splunk docs on setting you installation up (the docs are seperated into different levels of user involvement from user to admin to developer).

They can be found here : http://docs.splunk.com/

If this has answered your question, can you please mark the answer as accepted (click the empty tick next to my answer) and even up-vote if you feel like it. This just tells the Splunkbase community that the question has been answered and doesn't require more attention).

0 Karma

Communicator

thanks a lot. im not looking into doing big stuff with udp and tcps yet.
just using it to monitor some logs (not real time).
still trying to figure out lots of stuff here!

0 Karma