Splunk Search

How to perform Behavioral Analysis/Cross-data correlation on Logs

tanyongjin
Explorer

Hi,

We are trying to perform analysis on logs to determine whether there is an significant relationship between the log during an specific event's occurrence and its preceding log and how it affects the following log.

So for all the occurrences of event A, we want to find out what is are the usual events that occurred before and also after the event.

Is there anyway through apps or script that this can be done?

Thank you.

Tags (2)
0 Karma
1 Solution

DalJeanis
Legend

Yes, but the words "through app or script" are omitting the most important part of the process.

Over 80 percent of data science is the data preparation: it's going to take a human being to pare down the data relative to event A, to determine examples or what identifiable and interesting things might happen before or after, and then set up your extract script to pull all events related to those identified things into a pool for analysis.

Then you are going to run all that data into one of the apps or platforms that can look for patterns in the data. adonio's associate and contingency verbs are two of the tools in splunk to do that analysis.

Once you've got some basic patterns, then you can figure out what the outliers are, and figure out what you need to focus the next layer of attention on.

For example, let's say you want to see what is a normal pattern for an employee who accesses a high-security database. So, you first specify which logs document the access itself. Next, you ask what kind of activities MIGHT be relevant. Off the top of your head, you think that an employee might typically access some other database, or use a specific application to determine something about a user they might be researching, or one or two other things.

So, you take one example employee, and one access attempt to the high-security database, and look at what else they were doing. Suppose you find they locked their screen for a few minutes (...although that kind of thing is NOT usually stored in the logs of large company, but just GO with me here...)

You run a data analysis and find that a couple of minutes of screen lock is completely typical before this data access. Why? A phone call to a business analyst in that department gives you a surprising answer. That happens because that particular department has a duty to process paper records that must be reviewed and verified against the secure database, and physically getting the papers from the secure file room requires locking their terminal. It's not something that you would have predicted before looking at the data, but it is typical in that particular high security work environment.

Now, this is a totally made-up example, but in my real life data analysis, I've found FAR WEIRDER correlations in things that turned out to make perfect sense.

Which is why a canned app or script is not going to do the whole job for you, unless you were the one that put it in the can.

View solution in original post

0 Karma

DalJeanis
Legend

Yes, but the words "through app or script" are omitting the most important part of the process.

Over 80 percent of data science is the data preparation: it's going to take a human being to pare down the data relative to event A, to determine examples or what identifiable and interesting things might happen before or after, and then set up your extract script to pull all events related to those identified things into a pool for analysis.

Then you are going to run all that data into one of the apps or platforms that can look for patterns in the data. adonio's associate and contingency verbs are two of the tools in splunk to do that analysis.

Once you've got some basic patterns, then you can figure out what the outliers are, and figure out what you need to focus the next layer of attention on.

For example, let's say you want to see what is a normal pattern for an employee who accesses a high-security database. So, you first specify which logs document the access itself. Next, you ask what kind of activities MIGHT be relevant. Off the top of your head, you think that an employee might typically access some other database, or use a specific application to determine something about a user they might be researching, or one or two other things.

So, you take one example employee, and one access attempt to the high-security database, and look at what else they were doing. Suppose you find they locked their screen for a few minutes (...although that kind of thing is NOT usually stored in the logs of large company, but just GO with me here...)

You run a data analysis and find that a couple of minutes of screen lock is completely typical before this data access. Why? A phone call to a business analyst in that department gives you a surprising answer. That happens because that particular department has a duty to process paper records that must be reviewed and verified against the secure database, and physically getting the papers from the secure file room requires locking their terminal. It's not something that you would have predicted before looking at the data, but it is typical in that particular high security work environment.

Now, this is a totally made-up example, but in my real life data analysis, I've found FAR WEIRDER correlations in things that turned out to make perfect sense.

Which is why a canned app or script is not going to do the whole job for you, unless you were the one that put it in the can.

0 Karma

tanyongjin
Explorer

Thank you for the answer. I do agree that the main work for data science is the data cleaning & preparation in itself.

I am researching on how Splunk could have better help to breakdown and perform some analysis of the data I am interested in, its capabilities and such.

The main idea of a tool is to make the work of data preparation and analysis even simpler and more straightforward, which I believe what Splunk as a tool is meant to do. As such, I was interested to find out if there is any existing plug in or app/script that could have already implemented it.

However, since adonio has recommended associate and contingency, I will look further into it to see how it can help me to find out information I am interested in.

Thank you!

adonio
Ultra Champion

hello tanyongjin,
before diving in to apps or scrips, you can start with the contingency and associate commands. read more here:
https://docs.splunk.com/Documentation/Splunk/6.5.3/SearchReference/Associate
http://docs.splunk.com/Documentation/Splunk/6.6.0/SearchReference/Contingency
hope it helps

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...