Getting Data In

How to specify source type for virtual indexes.?

Explorer

Hi,

I have data in HDFS and I am creating Virtual Indexes to access the data. However, I need to make get the whole file content as an event. For that, I have already created one source_type, which will get the whole file data. How can I apply the source_type to virtual indexes.?

0 Karma

Path Finder

This technique doen't seem very well documented... and it looks like splunk prefer you to perform this within a props.conf file.
This answer presents people with an option how to do this from the virtual index UI for hadoop provider.

In the UI, select settings----> virtual indexes.
Ensure you have a data provider configured, that works...
Then within the virtual indexes menu, create a new virtual index.

This example is going to use the following folders
/data/auditlogs/RHEL_syslog
/data/auditlogs/WindowEvents

Within the UI, an admin should enter the following HDFS path setting:
/data/auditlogs/${sourcetype}

The admin could apply a whitelist if only one of the folder is required to be searched ..

By applying the ${sourcetype} variable in the UI... this will be written to a props.conf file...
Whenever a search is performed across this virtual index, two sourcetypes should appear.

0 Karma

Splunk Employee
Splunk Employee

[source::/home/somepath/twitter/...]
priority = 100
sourcetype = twitter-hadoop
SHOULD_LINEMERGE = false
DATETIME_CONFIG = NONE

[twitter-hadoop]
KV_MODE = json
EVAL-_time = strptime(postedTime, "%Y-%m-%dT%H:%M:%S.%lZ")

0 Karma

Explorer

Hi I am looking for some solution that does not depend on props.conf. I already created source type but how could I apply that to virtual index.? That is the question

0 Karma

Revered Legend
0 Karma

Explorer

Hi, this documentation which mentions about props.conf does apply the source type to every index. I want it to have only for specific index.

0 Karma

Revered Legend

IMO, the sourcetype is applied to a data input or data source, not to an index. Props.conf will allow you to set the sourcetype for a source, which are being stored in virtual index.

0 Karma

Explorer

Hi,

I might have not mentioned my view properly.
Lets say, i have two types of data

1) JSON
2) CSV
3) XML.

I need to get whole file for JSON and XML, and i need to get the data split when reading CSV. CSV Data goes to one index and xml data goes to other.

In this case, can we get the data shown with their respective requirements.? i.e, get the whole file data for xml and json and splitted data for csv.

Can we do that with props.conf.?

0 Karma

Revered Legend

Give this a try

[source::.../*.xml]
sourcetype=your_xml_sourcetype
priority=100

[source::.../*.csv]
sourcetype=your_csv_sourcetype
priority=100

[source::.../*.json]   *****use the correct extension of the file
sourcetype=your_json_sourcetype
priority=100


[your_csv_sourcetype]
define property per your requirement

[your_xml_sourcetype]
define property per your requirement

[your_json_sourcetype]
define property per your requirement
0 Karma

Explorer

Thank you somesoni.. I would give it a try and let you know.. Are you splunkr.? if so, is there a way to reach you over mail or so.?

0 Karma

SplunkTrust
SplunkTrust

You can identify Splunk employees by the [Splunk] after their username - therefore @somesoni2 is no splunkr, but he once was 😉

0 Karma

Splunk Employee
Splunk Employee

For the most part JSON does not need a source type. Hunk understand that format without any additional work from you. CSV with a header, also does not require any additional work.

So, that means only your XML and CSV without Headers will require some additional manipulation in the Props.conf files.

In your case, are these 3 data types stored in the exact same HDFS directory /user/data/alldata or do you have /user/data/jsondata /user/data/xmldata /user/data/csvdata ?

0 Karma

Explorer

Hi rdaga,

Yes. There is chance that they might have in same directory.
Is there any solution if they reside in different directory.?

0 Karma

Splunk Employee
Splunk Employee

Same Hadoop directory:
[source::/user/data/alldata/*.xml]
priority = 100
sourcetype = xml-hadoop

[source::/user/data/alldata/*.csv]
priority = 101
sourcetype = csv-hadoop

Different Hadoop directory:
[source::/user/data/xmldata/...]
priority = 100
sourcetype = xml-hadoop

[source::/user/data/csvdata/...]
priority = 101
sourcetype = csv-hadoop

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!