Getting Data In

Ingestion: Dividing 3 sections of data in TSV.

rajyah
Communicator

Good day Splunkers!

We have this case that in one TSV are 3 types or categories of data.

The first and third section of data can be ingested normally, but the problem is the second one. It is a receipt. Is there any way that the indexer knows when to divide and ingest those sections?

By the way, each section has a somehow unique marker like:

SECTION1980980989
SECTION1098098435
SECTION1345873485
SECTION1098340982
RECEIPT10912831921
RECEIPT10912830912
RECEIPT10983459821
RECEIPT19898281921
RECEIPT10910293849
SECTION298129381
SECTION298493859
SECTION298439588
SECTION203948533

This is the sample for the TSV. I only indicated the "unique" marker since I want to know if there's a way Splunk can determine how to divide and ingest those three sections of data.

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

Knowing better what's needed here, hopefully this is the right answer.

What's apparently needed is that for this one input, some events should go to indexA, other events should go to indexB.

This is possible. I'm pretty sure there's no way to do this in the UI, so you'll have to manually edit configuration files by hand. But it's not too hard if you take your time, think about what you are doing, and test. Also make backups of your configurations before you start! (It's as easy as tar'ing them up, or making a copy.)

Your main idea is to us the route and filter data section in the Splunk documentation. This gives a good overview and specifics for quite a few scenarios - unfortunately, your specific one isn't in there.

But there is help, once you know how to look for it. For instance, a web search for "splunk dest_key=index" turns up this answer.

We can modify it though.

 ### transforms.conf
 [index_redirect_section]
 REGEX = ^SECTION
 DEST_KEY = _MetaData:Index
 FORMAT = name_of_index_for_section_events

 [index_redirect_receipt]
 REGEX = ^RECEIPT
 DEST_KEY = _MetaData:Index
 FORMAT = name_of_index_for_receipt_events

 ### props.conf
 [sourcetype, host, or source that you want to redirect - see the docs for examples] 
 TRANSFORMS-route_different_indexes = index_redirect_section, index_redirect_receipt

Now, some caveats:
First, make sure you are editing local versions of the conf files, not default. So not $SPLUNKHOME/etc/apps/myapp/default/props.conf, but instead $SPLUNKHOME/etc/apps/myapp/local/props.conf.

Second, these regular expressions will only work if RECEIPT or SECTION is at the beginning of the event. If not, remove the "^" from the front of that. But ... lots of testing needs to be done, because if the wrong word appears anywhere in the event, well, unpredictable things may happen. But as long as that doesn't happen, it should work.

Third, the [sourcetype, host or source that you want to redirect] section - you didn't provide what the sourcetype is, the source, or anything at all for us to work with, so you are on your own there for implementing that. We can help, but hopefully the examples in the route and filter data docs, plus the words I used above, will help you enough to get that sorted out.

So, generally, adding the section into the local props.conf tells Splunk to run a transform on the data as it comes in. Indeed, it tells it to run TWO transforms. So it'll check transforms.conf for the stanzas it needs, and run both in order. So, if the regex matches SECTION, it'll rewrite the destination index of that event and tell it to go to index name_of_index_for_section_events. It then continues and runs the next one, which if it matches (RECEIPT) it'll rewrite the index to be name_of_index_for_receipt_events.

So, give that a try and see how it works. If you have problems, especially in this particular case details will matter - what you've tried, copies of the configurations you've put into place, and what exactly happens.

Happy Splunking,
Rich

View solution in original post

0 Karma

Richfez
SplunkTrust
SplunkTrust

Knowing better what's needed here, hopefully this is the right answer.

What's apparently needed is that for this one input, some events should go to indexA, other events should go to indexB.

This is possible. I'm pretty sure there's no way to do this in the UI, so you'll have to manually edit configuration files by hand. But it's not too hard if you take your time, think about what you are doing, and test. Also make backups of your configurations before you start! (It's as easy as tar'ing them up, or making a copy.)

Your main idea is to us the route and filter data section in the Splunk documentation. This gives a good overview and specifics for quite a few scenarios - unfortunately, your specific one isn't in there.

But there is help, once you know how to look for it. For instance, a web search for "splunk dest_key=index" turns up this answer.

We can modify it though.

 ### transforms.conf
 [index_redirect_section]
 REGEX = ^SECTION
 DEST_KEY = _MetaData:Index
 FORMAT = name_of_index_for_section_events

 [index_redirect_receipt]
 REGEX = ^RECEIPT
 DEST_KEY = _MetaData:Index
 FORMAT = name_of_index_for_receipt_events

 ### props.conf
 [sourcetype, host, or source that you want to redirect - see the docs for examples] 
 TRANSFORMS-route_different_indexes = index_redirect_section, index_redirect_receipt

Now, some caveats:
First, make sure you are editing local versions of the conf files, not default. So not $SPLUNKHOME/etc/apps/myapp/default/props.conf, but instead $SPLUNKHOME/etc/apps/myapp/local/props.conf.

Second, these regular expressions will only work if RECEIPT or SECTION is at the beginning of the event. If not, remove the "^" from the front of that. But ... lots of testing needs to be done, because if the wrong word appears anywhere in the event, well, unpredictable things may happen. But as long as that doesn't happen, it should work.

Third, the [sourcetype, host or source that you want to redirect] section - you didn't provide what the sourcetype is, the source, or anything at all for us to work with, so you are on your own there for implementing that. We can help, but hopefully the examples in the route and filter data docs, plus the words I used above, will help you enough to get that sorted out.

So, generally, adding the section into the local props.conf tells Splunk to run a transform on the data as it comes in. Indeed, it tells it to run TWO transforms. So it'll check transforms.conf for the stanzas it needs, and run both in order. So, if the regex matches SECTION, it'll rewrite the destination index of that event and tell it to go to index name_of_index_for_section_events. It then continues and runs the next one, which if it matches (RECEIPT) it'll rewrite the index to be name_of_index_for_receipt_events.

So, give that a try and see how it works. If you have problems, especially in this particular case details will matter - what you've tried, copies of the configurations you've put into place, and what exactly happens.

Happy Splunking,
Rich

0 Karma

rajyah
Communicator

I'll consider this as an answer since this is the answer I'm looking for! Thank you for the concrete and detailed explanation about my inquiry. As I thought, I really should fiddle around with props/transforms.conf.

Again, thank you!

0 Karma

Richfez
SplunkTrust
SplunkTrust

You are welcome. If you get into any specific minor issues with this, be sure to post back here (like, the regex may need a tiny bit of tweaking).

Otherwise, have fun in props.conf! It's whole new world!

-Rich

0 Karma

rajyah
Communicator

Ah! It seems I forgot to mention something. Do you think it's possible playing with this? If possible I want to know your idea regarding this:

FIELD1[tab]FIELD2[tab]FIELD3[tab]FIELD4[tab]FIELD5[tab]
SECTION1980980989[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
SECTION1098098435[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
SECTION1345873485[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
SECTION1098340982[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
FIELD1[tab]FIELD2[tab]FIELD3[tab]FIELD4[tab]FIELD5[tab]
RECEIPT10912831921[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
RECEIPT10912830912[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
RECEIPT10983459821[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
RECEIPT19898281921[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
RECEIPT10910293849[tab]VALUES1[tab]VALUES2[tab]VALUES3[tab]VALUES4[tab]VALUES5
FIELD1[tab]FIELD2[tab]FIELD3[tab]FIELD4[tab]FIELD5[tab]
SECTION298129381
SECTION298493859
SECTION298439588
SECTION203948533

I forgot to include the fields. But yeah, I think that's the structure of the log. Have any thoughts about this, sir? Have you already experienced this case? Please enlightened me.

0 Karma

Richfez
SplunkTrust
SplunkTrust

That looks like it should work fine with what we had discovered before. The leading ^ in the REGEX should probably be OK, too, which makes it more efficient (that symbol tells the regex engine to look at the start of the string for each word, like "SECTION", so it doesn't have to waste time looking through the whole thing.)

I do see one thing that might not be handled properly. Unless I'm misreading the logs, you have a line that has only fields, with no leading "RECEIPT" or "SECTION" in it. What did you want done with those? If you want it to go to either one of those indexes we've already defined, then there's an easy answer:

On the input itself make sure you set the index=blah setting. That will be the "Default" index those events will go to, an indeed any event that doesn't match our specific redirections to other indexes will just go to that default index. It can be the same as one of the specific indexes - that's no problem at all.

Also how are you assigning time stamps?

0 Karma

rajyah
Communicator

The log presented is like 3 different logs compiled in one. So each logs(SECTION1,RECEIPT,SECTION2) has different field data. What I'm worried about is the fields of RECEIPT and SECTION2 logs might be considered as events when ingested. But I'll try fiddling first with props.conf and will give an update. Thank you sir! I've at least got an idea with this. 😃

0 Karma

Richfez
SplunkTrust
SplunkTrust

Well, true, but I expect your line breaking will take care of that. And perhaps it'll need a little tweaking, but that's well documented. As long as the lines break properly, which is a very testable thing - just ingest into a temporary index to test them out then you can just delete that index and use those same settings into production - then everything else should work fine.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Why can it not ingest the second section? And by "ingest" what exactly do we mean? And to what effect would dividing it give you?

Or let me ask more particular questions about something I'm guessing: You have extractions happening on the data, which turns gobbledygook raw logs into pretty fields. For the ones with SECTION this is OK, but with RECEIPT ones it's not working right?

Let us know if that sounds like the problem. If it is not, that's OK too, if you could provide a little better example of one event that works right and one that doesn't, and a better description of what it means to not be working right, I think that would help a lot. But never fear, I'm sure we can figure this out.

0 Karma

rajyah
Communicator

Thank you for responding!

Sorry, I think I poorly explained what the post meant.

The ingestion is fine, the thing is is there a way to divide those three sections of data? Like ingesting it to different index coming from 1 TSV.

Sorry if my explanation is poor.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...