I am having some troubles understanding splunk's correlation features. I think it is really important to understand all the possibilities of Splunk before setting up a good environnement, this is why I am asking for help.
I will try to describe what I understand right now, please, tell me if I'm wrong:
I realize I am getting confused with the CIM and its possibilites to normalize data.
If a good soul could help me, that would be great !
Your question 2 depends on how you have set things up. You can limit the indexes that the data model searches across on the CIM's setup page. That's described in step 4 here: http://docs.splunk.com/Documentation/CIM/latest/User/Install
Here's a great place to start to answer #3:
To add to that, Splunk can search across a variety of disparate data sources at once when appropriate, like Syslog inputs from your firewall, Active Directory login information and your anti-spam devices. All it takes is being able to get the data into Splunk.
Yes, you are right that having an add-on or app provide CIM compliance will help that a lot because the CIM data models will aggregate all sorts of similar information together. But it can be done without those, too, either by making your own data models or by just using searches across multiple indexes.
Field extraction is a separate but related thing and a simple example may help in showing how it makes easy correlation searches "by default". Supposed you have various applications/logs that have the "user" in different named fields; "User Name" or "user" or "clientName" or whatever. And one that just lists the user's name somewhere in it, but doesn't have it set like "user=myusername" or anything, so Splunk will see "myusername" but won't know it's actually the user.
In that case, searching for just "myusername" across all indexes will show all events where myusername shows up, regardless of what the field is called. Now, if you extract "myusername" into a named field - let's call it "user", then alias all those other names other logs call the user into the field "user" as well, you now have a single fieldname you can specify stuff against, like a search for "user=myusername OR user=anotheruser" or "user!=thatuser", and when run it'll find all events where something's created that field "user" and where the values of it match what you asked for.
This is essentially what CIM compliance does - it maps tags and eventtypes, then aliases/creates the fields all mapped into one fieldname for you. Usually. 🙂
Tags and eventtypes - think of those as a logical tag on an event. So, you can have events tagged "malware" so you can search for all malware related items across all events you have, regardless of where they originally came from: AV, Cisco Firesight, Nessus scans, whatever - if it's tagged right, those events will be associated into the CIM data model in the right places and can be searched on as one group.
Anyway, good questions! Did the answers help?
I may add, keep reading the docs from Splun - they're awesome!
Thank you guys.
I checked the following link, it helped me a lot understanding CIM usage: http://docs.splunk.com/Documentation/CIM/latest/User/UsetheCIMtonormalizeOSSECdata
So if I understand correctly, to search through CIM-compliant data, I have to use the tags in order to correlate ? And because all the fields will be normalized, I can use the field name with the guarantee that they are all the same accross different sources.
Yes, you apply tags and you also normalize fields that do not already match the field names expected by the model, so you can guarantee that the fields will be the same across sources because you have done the normalization. Step 5 on that page shows some examples of that. And, like you correctly pointed out before, there might be add-ons available for your data sources that takes care of the tagging and field normalization for you.
Best of luck!