Getting Data In

How to configure Splunk to split a single large file into 2 sourcetypes based on a keyword in the log file?

rakesh_498115
Motivator

Hi ,

I have a single source which has a huge number of events. These events are broadly classified into two groups and all are present in the same file single file. Now, my requirement is to get the file indexed into a single index as called "myindex" and have two different sourcetypes "group1" and "group2". group1 and group2 category in the file is distinguihsed with the help of the keyword XXX and YYY in my log file.for example XXX denotes group1 and YYY denotes group2.

Here is the sample of log file.

// mylog_sample.txt

24-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
25-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456

All the data is present in the same file. Now i want to split the whole data into two different sourcetypes "group1" and "group2" in a single index.

so if i search the data with:

index="myindex" sourcetype="group1"  

it should list the following data ..

24-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456
24-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,XXX,56,5768,342,34456

and if I search with the following:

index="myindex" sourcetype="group2"  

it should list the following data ..

25-08-2014 10:23:34  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:35  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:36  1w2,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:37  12e,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456
25-08-2014 10:23:39  122,34,56,67,87,90,123, 34,545,45,YYY,56,5768,342,34456

Any help on the above use case. I used to transforms.conf, but no luck on separation. Please post the proper configuration that helps and suits the requirement.

Many thanks.
Rakesh.

0 Karma

stephanefotso
Motivator

Yes it is possible, but you could do it before the indexing-time of the data pipeline, since override a sourcetype occurs at parse-time.
I hope this could help you. http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Advancedsourcetypeoverrides

SGF
0 Karma

rakesh_498115
Motivator

Thanks for the update stephan. but this seems not working below is my configuration.

// inputs.conf

[monitor:///opt/splunk/splunkInput/mylog_sample.txt]
disabled = false
followTail = 0
recursive = false
sourcetype = temp
index = myindex

// transforms.conf

[set_group1_routing]
REGEX = XXX
FORMAT = sourcetype::group1
DEST_KEY = MetaData:Sourcetype

[set_group2_routing]
REGEX = YYY
FORMAT = sourcetype::group2
DEST_KEY = MetaData:Sourcetype

// props.conf

[group1]
TRANSFORMS-350_routing=set_group1_routing
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false

[group2]
TRANSFORMS-350_routing=set_group2_routing
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false

Help me if am missing something. thanks in advance 🙂

0 Karma

maciep
Champion

it looks like your data will be sending with a sourcetype of temp intially. So your props can probably look more like this

[temp]
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TRANSFORMS-350_routing=set_group1_routing, set_group2_routing

group1

group2

So the data will come in with a sourcetype of "temp" and hit your props. So along with the timestamp/linebreak settings, your transforms will be applied which will set the new sourcetype accordingly.

Also, if you decide on creating field extractions or other search-time settings, they would be applied to/configured in the stanzas for those new sourcetypes you created - group1 and group2

0 Karma

vincenteous
Communicator

Are you using a heavy forwarder? Where do you put this configuration? Is it your indexer?

0 Karma

rakesh_498115
Motivator

No vincenteous... i am using this configuration at indexer .

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...