Getting Data In

Index By host OR Sourcetype by host

jamie_leclair
Engager

Hello, I have 2 questions I am hoping someone can help me with.

I am trying to figure out how to categorize data based on host (ip) at a heavy forwarder that ultimately categorizes data based on a list of IP's

Examples:

1) Have data from host=x.x.x.x OR host=y.y.y.y ; sourceype=vendorA AND index=vendorA
2) Have data from host=a.a.a.a OR host=b.b.b.b; sourcetype=vendorB AND index=vendorB

Currently, I have a series of hosts logging to a heavy forwarder and the heavy forwarder sending that data over to an index cluster.. everything is working, but all the data ends up in MAIN and I would like to separate that data for both RBAC and extraction reasons.

I hope that makes sense... Any help would be appreciated.

Thank you,
Jamie

0 Karma

oscar84x
Contributor

It's difficult to figure out without being able to see your entire configuration (inputs, outputs, props, etc.)
But another suggestion is to identify the sourcetypes with a regex and start with that. Then you don't have to worry about the source host. If you have one single input on your HF collecting all sourcetypes from all hosts and calling it syslog, then this might be the best option without having to do a whole overhaul of how you're collecting data.

First use transforms.conf to assign the sourcetypes based on a regex that identifies the sourcetype. So you'd have to do this only for the number of sourcetypes and not have to worry about which host it is coming from:

   [change_sourcetypeA]
    REGEX = "regex that identifies the sourcetype"
    DEST_KEY = MetaData:Sourcetype 
    FORMAT = sourcetype::vendorA

REAPEAT AS NEEDED

Then for those sourcetypes, create transforms that sorts them into their indexes:

     [indexA]
     DEST_KEY =_MetaData:Index
     REGEX = .
     FORMAT = indexA

     [indexB]
     DEST_KEY =_MetaData:Index
     REGEX = .
     FORMAT = indexB

REPEAT AS NEEDED

Finally call them into action in props under their respective stanzas:

[syslog]
TRANSFORMS-sourcetypes = change_sourcetypeA, change_sourcetypeB, etc

[sourcetypeA]
TRANSFORMS-newindex = indexA

[sourcetypeB]
TRANSFORMS-newindex = indexB

There are a few moving parts to this so it might be tricky. I'm also trying to remember if it will have a problem going back to assign the index to the new sourcetype or if it goes through that filer just once. But give it a try.

0 Karma

jamie_leclair
Engager

Oscar,

Thanks for your reply.

I did look into this, and feel like this is probably the method that is best for this kind of thing, but I feel like this is going to be a challenge for sure. There is nothing in the log itself that really differentiates one vendor from another.

I.E. Cisco Nexus Syslog VS Juniper syslog VS Cisco ACI syslog VS Aruba Wireless Controller Syslog.

The only way I feel like the transforms will help me in this case is if I can run this based on a CSV list of hosts.. that would be perfect if I could do that.. but it seems regex is the only way. Also, if all the different devices were located in different subnets,.. sure this could work.. but its all over the place.. no easily distinguishable separation.

Right now I feel like catching it at the input is the easiest way to do this..... that or essentially having a HF for every vedor.. but that seems like a pretty lame way of doing this. Funny thing is I use KIWI syslog to do this bit for me now and it works perfectly.. just a simple CSV of host IP's seperates all the data.

[udp://X.X.X.X:514]
index = vendorA
sourcetype = vendorA_syslog

[udp://X.X.X.X:514]
index = vendorB
sourcetype = vendorB_syslog

0 Karma

oscar84x
Contributor

Wait. There should be add-ons in Splunkbase for all the sourcetype you've mentioned. And certainly there should be differentiators between the logs.
Are you using any add-ons at all at the moment?

0 Karma

jamie_leclair
Engager

Oscar,

No not really. Not on my production servers anyway. I have experimented with a few addons, but my production system is more or less only "applications" I have developed myself (and I am not a developer).

How I do it today is basically collect all the data via KIWI syslog server, use KIWI to separate by geographical location and manufacturer/device type (Cisco Nexus VS Cisco ACI are very different); once I have all the like data separated by source type I manually create all the extractions for the fields we care about and manually create dashboards to highlight those things.. then anything else is more or less manual query.

One of the main reasons I need different things separated is for RBAC as well. Ex. I have a data center that logs all its devices to a single collector. We only allow certain teams to view certain data.. so I ingest different data into different indexes just to perform the RBAC function.

I will explore some addons today in hopes it helps me in my lab related to these separations. I assume it uses stanzas and transforms to do this work.. so yeah.. in a perfect world this could be good. However I am just concerned about how these addons will perform after manufacturer updates/ on very old out of date hardware.

Thanks for the Reply,
Jamie

0 Karma

codebuilder
Influencer

If your data is ending up in "main" that means that your indexes are either not created on the indexer cluster, the permissions are not correct, or you have a configuration error within inputs.conf.

"main" is the default index, where events go when they do not match a specific index.

You can also override this behavior within indexes.conf by setting the lastChanceIndex parameter.

https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/Indexesconf

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

jamie_leclair
Engager

Your comment got me on the right track.

inputs.conf -- on heavy forwarder

[udp://X.X.X.X:514]
index = vendorA
sourcetype = vendorA_syslog

[udp://X.X.X.X:514]
index = vendorB
sourcetype = vendorB_syslog

Obviously I had to do more than this when forwarding it to the index cluster, but this was more or less what I was looking for.

However I feel this is going to be a pretty inefficent way of categorizing the data as every single device would have to be defined (as long as I wanted categorized correctly). I was hoping there was a way I could effectively do the same thing, but read all the hosts in via csv or something of that nature.

I believe my real answer has to do with data routing and using transforms to do this work, however it seems the only way to do that is VIA regex.. which is essentially just as ineffient as what I have already done above.

Thanks for your reply,
Jamie

0 Karma

oscar84x
Contributor

Are you using Universal Forwarder on the hosts? Or monitoring logs on the HF?
What is your inputs setup?

0 Karma

jamie_leclair
Engager

I am using a heavy forwarder to recieve the data, the inputs very simple...just for syslog (UDP:514) collection; then that data is forwarded over to my index cluster..

0 Karma

oscar84x
Contributor

Is the data being written to files on the HF?
The simplest way would be if you had the data written to directories specific to each host. That way you can accomplish everything you're looking to do with a simple monitor stanza under inputs.conf. Like in the example below, you would repeat this for each host.

[monitor:///path/to/file/hostx/file.log]
index = <your_index>
sourcetype = <your_sourcetype>
host_segment = 4 <-- this will take the 4th segment of the path, "hostx". and assign it as the host 
0 Karma

jamie_leclair
Engager

I used to use a 3rd party application to do this for me. Then I would just read all the data in from a Share. However this I found is not all that sclable because the application I am using is limited. So I completely understand what youre asking me to do, but the raw data is not anywhere. From what I understand, on a heavy forwarder configured for store and forward it simply indexes the data as well.. and thats fine, but nowhere can I see the ability to read from raw files on the HF

What I am asking for is more or less for a proof of concept to see if Splunk can categorize all the data the data for the purposes of dashboarding and RBAC.

Ultimately the issue is I think I am too new to the heavy forwarder and cannot figure out how to categorize the data (Seperating by lists of host ip's.. ideally from a file if possible). I am questioning if my question was premaure and if I should just start reading up more on the HF.

Thanks for your reply,
Jamie

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...