Hi,
I have data in HDFS and I am creating Virtual Indexes to access the data. However, I need to make get the whole file content as an event. For that, I have already created one source_type, which will get the whole file data. How can I apply the source_type to virtual indexes.?
This technique doen't seem very well documented... and it looks like splunk prefer you to perform this within a props.conf file.
This answer presents people with an option how to do this from the virtual index UI for hadoop provider.
In the UI, select settings----> virtual indexes.
Ensure you have a data provider configured, that works...
Then within the virtual indexes menu, create a new virtual index.
This example is going to use the following folders
/data/auditlogs/RHEL_syslog
/data/auditlogs/WindowEvents
Within the UI, an admin should enter the following HDFS path setting:
/data/auditlogs/${sourcetype}
The admin could apply a whitelist if only one of the folder is required to be searched ..
By applying the ${sourcetype} variable in the UI... this will be written to a props.conf file...
Whenever a search is performed across this virtual index, two sourcetypes should appear.
[source::/home/somepath/twitter/...]
priority = 100
sourcetype = twitter-hadoop
SHOULD_LINEMERGE = false
DATETIME_CONFIG = NONE
[twitter-hadoop]
KV_MODE = json
EVAL-_time = strptime(postedTime, "%Y-%m-%dT%H:%M:%S.%lZ")
Hi I am looking for some solution that does not depend on props.conf. I already created source type but how could I apply that to virtual index.? That is the question
Splunk documentation to rescue. See this
http://docs.splunk.com/Documentation/Hunk/6.2.5/Hunktutorial/SearchbySourcetype
Hi, this documentation which mentions about props.conf does apply the source type to every index. I want it to have only for specific index.
IMO, the sourcetype is applied to a data input or data source, not to an index. Props.conf will allow you to set the sourcetype for a source, which are being stored in virtual index.
Hi,
I might have not mentioned my view properly.
Lets say, i have two types of data
1) JSON
2) CSV
3) XML.
I need to get whole file for JSON and XML, and i need to get the data split when reading CSV. CSV Data goes to one index and xml data goes to other.
In this case, can we get the data shown with their respective requirements.? i.e, get the whole file data for xml and json and splitted data for csv.
Can we do that with props.conf.?
Give this a try
[source::.../*.xml]
sourcetype=your_xml_sourcetype
priority=100
[source::.../*.csv]
sourcetype=your_csv_sourcetype
priority=100
[source::.../*.json] *****use the correct extension of the file
sourcetype=your_json_sourcetype
priority=100
[your_csv_sourcetype]
define property per your requirement
[your_xml_sourcetype]
define property per your requirement
[your_json_sourcetype]
define property per your requirement
Thank you somesoni.. I would give it a try and let you know.. Are you splunkr.? if so, is there a way to reach you over mail or so.?
You can identify Splunk employees by the [Splunk]
after their username - therefore @somesoni2 is no splunkr, but he once was 😉
For the most part JSON does not need a source type. Hunk understand that format without any additional work from you. CSV with a header, also does not require any additional work.
So, that means only your XML and CSV without Headers will require some additional manipulation in the Props.conf files.
In your case, are these 3 data types stored in the exact same HDFS directory /user/data/alldata or do you have /user/data/jsondata /user/data/xmldata /user/data/csvdata ?
Hi rdaga,
Yes. There is chance that they might have in same directory.
Is there any solution if they reside in different directory.?
Same Hadoop directory:
[source::/user/data/alldata/*.xml]
priority = 100
sourcetype = xml-hadoop
[source::/user/data/alldata/*.csv]
priority = 101
sourcetype = csv-hadoop
Different Hadoop directory:
[source::/user/data/xmldata/...]
priority = 100
sourcetype = xml-hadoop
[source::/user/data/csvdata/...]
priority = 101
sourcetype = csv-hadoop