Splunk Use Case- How can I compare two IP addresse...

Splunk_Master01 · ‎08-16-2022

Hi All,

I am trying to build a use case with the below scenarios:

1) Person A can do tasks X and Y but not task Z or,

2) Person A can do tasks Y and Z but not task X or,

3) Person A can either do task X or task Y or task Z

At no given point is Person A allowed to conduct all three tasks and at no given point should the IP addresses of Person A and Person B be the same.

Information is being picked from two separate source types from the same index.

The challenge is picking the IP address when Person A does tasks X and Y and Person B does task Z, how can I get the IP addresses of both Person A and Person B, so as to compare and make sure that they are two different IP addresses, keeping in mind that the IP address is only available in one source type and not the other?

Any assistance on this would be appreciated ...

Splunk_Master01 · ‎08-16-2022

Hi Giuseppe,

Yes. IP address of Person A and Person B are in one source type and the tasks being performed by Person A and Person B are in the other source type and both source types belong to the same index.

gcusello · ‎08-16-2022

Hi @Splunk_Master01,

you can correlate data from different sourcetypes using the stats command, something like this:

index=your_index (sourcetype=sourcetypeA OR sourcetype=sourcetypeB)
| stats dc(ip_address) AS ip_count values(ip_address) AS ip_address values(task) AS task BY person

Having all these correlated information, you can find the rules you like: e.g. if there's more than ip, etc...

Ciao.

Giuseppe

Splunk_Master01 · ‎08-16-2022

Hi Giuseppe,

I have tried this, but it still doesn't bring the IP addresses of both Person A and Person B.

The query picks the IP address of only Person A.

I need both IP addresses to be displayed.

Splunk_Master01 · ‎08-18-2022

I am using the below query:

index=myindex (sourcetype="mysourcetype" OR sourcetype=mysouretype)
| eval USER = mvappend(USERID_A,USERID_X,USERID_Y,USERID_Z)
| stats dc(IP) as IP_COUNT values(IP) as IP values(USERID_X) values(USERID_Y) values(USERID_Z) BY USER

The reason behind usage of mvappend is because the usernames in both source types have different titles.

The results I get with the above query are depicted in the table below:

USER	IP_COUNT	IP	USERID_X	USERID_Y	USERID_Z

The IP that is currently being picked is for "USER" and "USERID_X" but I require IP addresses of USERID_Y and USERID_Z as well, as in some situations they vary. After that I need to put in the IF statement to compare IP addresses to make sure that they are not the same.

yuanliu · ‎08-18-2022

First, how does USERID_A factor into the equation in your illustration? (I interpret "title" as field name.)

Second, how did you determine that only USERID_X is being picked up? More fundamentally, how did you determine that events containing USERID_Y and USERID_Z are actually in your search range?

Let's examine data. Perform this simple test for each of USERID_X, USERID_Y, USERID_Z

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB) USERID_X=*
|  eval USER = mvappend(USERID_A,USERID_X,USERID_Y,USERID_Z)

Are there any events with USERID_Y and USERID_Z? What are values of USER in each case?

Splunk_Master01 · ‎08-18-2022

I have run the below test query changing the USERID to X, Y and Z respectively:

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB) USERID_X=*
|  eval USER = mvappend(USERID_A,USERID_X,USERID_Y,USERID_Z)

Each time the USERID changes to X,Y or Z, results are populated for USER, USERID_X, USERID_Y and USERID_Z.

What I've noticed is that either values in USER, USERID_X and USERID_Y are the same or values in USER, USERID_Y and USERID_Z are the same.

yuanliu · ‎08-18-2022

Maybe I should have designed the test a little more comprehensively, as your main interest is to examine IP. How about

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB) USERID_X=*
| eval USER = mvappend(USERID_A,USERID_X,USERID_Y,USERID_Z)
| eventstats values(IP) as IPs

Replace USERID_X with other field names. Do the resultant IPs field vary? How do values of IPs overlap?

Splunk_Master01 · ‎08-18-2022

I have run the below query as advised.

I get no results when I change the values to X,Y and Z respectively.

However removing the line USERID_X=* generates results of IP addresses.

The users are different in every row, but the IP addresses being picked are for all IP's in a subnet and all subnets and not the specific users that are part of the row.

Results are something as below:

USER

USERID_X

USERID_Y

USERID_Z

IPs

Mary

Jane

Mary

Jane

10.10.1.1

10.10.1.2

10.10.1.3

10.10.2.1

10.10.2.2

10.10.2.3

10.10.3.1

10.10.3.2

10.10.3.3

....

How can I find out which IP belongs to Mary and which IP belongs to Jane from the long list of IPs?

yuanliu · ‎08-18-2022

@Splunk_Master01 wrote:
I have run the below query as advised.
I get no results when I change the values to X,Y and Z respectively.
However removing the line USERID_X=* generates results of IP addresses.

This means that USERID_X, USERID_Y, etc., are not searchable fields in raw search. This is just to get an understanding of the data and unrelated to the solution.

To perform stats on disparate field names, you may iterate over them with foreach. For example,

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)]
| stats dc(IP) as IP_COUNT values(IP) as IP BY USER

(This assumes that different field names do not appear in the same event.) If these field names have a common pattern, you may use wildcard to enumerate, like foreach USERID_*.

Splunk_Master01 · ‎08-18-2022

I require something like the below:

USER	USERID_X	IP OF USERID_X	USERID_Y	IP OF USERID_Y	USERID_Z	IP OF USERID_Z
MARY MARY JANE	MARY	10.10.1.1	MARY	10.10.1.1	JANE	10.10.1.2
BEATRICE ARTHUR ARTHUR	BEATRICE	10.10.2.1	ARTHUR	10.10.2.3	ARTHUR	10.10.2.3
AMY DIANA BELLA	AMY	10.10.3.1	DIANA	10.10.3.2	BELLA	10.10.3.6
JOSEPH JACINTA JACINTA	JOSEPH	10.10.4.1	JACINTA	10.10.4.1	JACINTA	10.10.4.1

Where the row highlighted in bold needs to be fired as an alert as all the IP's are the same.

How can I achieve something like the table above?

yuanliu · ‎08-19-2022

You need to better describe data to explain how they relate to your mockup. What is the mechanism/criteria to group MARY, MARY, JANE in one row and BEATRICE, ARTHUR, ARTHUR in another row? Is it some sort of session, transaction/sequence? Time bucket?

Splunk_Master01 · ‎08-19-2022

Transactions being conducted by them.

The problem is some data is one sourcetype and IP address is in another sourcetype.

So picking IP addresses of the people is a challenge.

yuanliu · ‎08-19-2022

Still unclear. Do you mean each row is associated with a unique transaction ID?

Splunk_Master01 · ‎08-19-2022

Yes.

yuanliu · ‎08-20-2022

Let's first take the alert action you described in the mockup. This should be sufficient for that purpose, is it not?

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)]
| stats values(USER) as USER by IP transactionID
| where mvcount(USER) > 1

Judging from your mockup, the above should give something like

IP

transactionID

USER

10.10.4.1

some ID

JOSEPH

JACINTA

What is the use of knowing "IP of USERID_X", "IP of USERID_Y"? (Especially when, as illustrated, MARY has the exact same IP address in the same transaction whether she appears as USERID_X or USERID_Y, so does AUTHUR in the same transaction whether he appears as USERID_Y or USERID_Z.) This table tells you exactly which transaction encountered one or more IP addresses that are each used by more than one user. The alert shouldn't care about what is the original field name that is merged into USER, or which transactions are without such incidents.

This said, suppose the redundant listing has meaning to someone - I also have bosses that are particular about layout, you can spaghettify code to give

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)
    | eval "IP OF <<FIELD>>" = IP]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID

This should produce the same layout as your mockup except the added column of transactionID without which the problem is ill-defined. Again, using wildcard and such could reduce the amount of typing.

Splunk_Master01 · ‎08-20-2022

When I run this:

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)]
| stats values(USER) as USER by IP transactionID
| where mvcount(USER) > 1

It brings something like this:

IP

transactionID

USER

10.10.4.1

some ID

JOSEPH

JACINTA

The issue is that it is picking JOSEPH's IP address only and not JACINTA's and both users are using different IP's when I check individually for each user.

When I run this:

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)
    | eval "IP OF <<FIELD>>" = IP]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID

I get something close to this:

USER	USERID_X	IP OF USERID_X	USERID_Y	IP OF USERID_Y	USERID_Z	IP OF USERID_Z
MARY MARY JANE	MARY	10.10.1.1	MARY	10.10.1.1	JANE	10.10.1.2
BEATRICE ARTHUR ARTHUR	BEATRICE	10.10.2.1	ARTHUR	10.10.2.3	ARTHUR	10.10.2.3
AMY DIANA BELLA	AMY	10.10.3.1	DIANA	10.10.3.2	BELLA	10.10.3.6
JOSEPH JACINTA JACINTA	JOSEPH	10.10.4.1	JACINTA	10.10.4.1	JACINTA	10.10.4.1

The issue here is that when I use BY Transaction_ID, the IP addresses disappear and I am only left with the fields USER, USERID_X, USERID_Y and USERID_Z.

Removing the line BY TRANSACTION_ID populates the whole table but the issue is that IP addresses duplicate across the fields for USER, USERID_X, USERID_Y and USERID_Z and are not correctly mapped to the correct user.

So I see something like:

USER

USERID_X

IP OF USERID_X

USERID_Y

IP OF USERID_Y

USERID_Z

IP OF USERID_Z

Jane

Martha

Jane

10.10.1.1

Jane

10.10.1.1

Martha

10.10.1.1

And I know that the IP addresses assigned to Martha and Jane are in-correct because I have run searches for each individual user to see their individual IP addresses.

yuanliu · ‎08-20-2022

IP transactionID USER
10.10.4.1 some ID
JOSEPH
JACINTA
The issue is that it is picking JOSEPH's IP address only and not JACINTA's and both users are using different IP's when I check individually for each user.

This is significant because your previous mockup depicts JACINTA and JOSEPH as having the same IP address. If the real world data doesn't match the mockup, you need to examine data further.

My search did not account for "that the IP address is only available in one source type and not the other." The original did not mention transactionID. I'll have to assume that it exists in both. (As a general rule, I cannot overemphasize the importance of describing data for others to be helpful. It is always good to illustrate either sanitized real data samples or simulated data samples that resemble the main characteristics of real data; it is always best to illustrate data in text rather than screenshots.)

Again, focusing on IP alert first.

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)]
| stats values(IP) as IP by USER transactionID
| mvexpand IP ``` may not be strictly necessary ```
| stats values(USER) as USER by IP transactionID
``` | where mvcount(USER) > 1 ``` ``` this would be the alert ```
| eval users_per_IP = mvcount(USER)
| sort - user_per_IP

Here, the alert filter is commented out; instead, I added a sort command to help observe output with duplicates on top of the table. Does the output match your manual examination in regard to duplicate use of IP address?

Splunk_Master01 · ‎08-20-2022

@yuanliu ,

TransactionID is not present in both source types as well.

The common thing in both source types is the USERID, however, they are renamed differently in both source types. Hence, we had to create a field USER to put all the users inside that field.

Common things in both source types are USERID, USERCODES, REMARKS and all 3 of these fields are renamed differently in both source types.

Other than the above mentioned fields, there is nothing common in both the source types.

I need results to be close to this format:

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)
    | eval "IP OF <<FIELD>>" = IP]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID

But I need information to be picked correctly for each user.

What is happening with the above query is as below:

USER	USERID_X	IP OF USERID_X	USERID_Y	IP OF USERID_Y	USERID_Z	IP OF USERID_Z
BEATRICE		10.10.10.1		10.10.10.1		10.10.10.1
ARTHUR	JANE	10.10.10.3	ARTHUR	10.10.10.3	ARTHUR	10.10.10.3

Beatrice's IP is correct.

Now, ideally, I should see the IP address of Jane under IP OF USERID_X, which should be 10.10.10.4, but it is picking the IP OF USERID_Y who is Arthur and whose IP is 10.10.10.3 instead, because I have grouped BY USER and not BY TRANSACTIONID as it was not common in both source types.

When I analyze my data and look for Jane and Arthur separately, the users do not have the same IP address and Jane has never used Arthur's IP address.

Correct me if I am wrong, when we use the command | foreach USER_ID, USERID_X, USERID_Y, USERID_Z are we trying to combine the information for all these into 1 one common field called USER? (I have never used the | foreach command before, hence, the question)

Something I've realized is that the information of users, can only be the same for:

USER, USERID_X and USERID_Y or

USER, USERID_Y and USERID_Z or

USER, USERID_X and USERID_Z

At no given point are all 4 of these i.e. USER, X, Y and Z picking the same users at the same time.

Maybe that's why the IP address is only picked for the user where the information is the same and leaves the IP address of the one where the user is different?

yuanliu · ‎08-20-2022

TransactionID is not present in both source types as well.
The common thing in both source types is the USERID, however, they are renamed differently in both source types. Hence, we had to create a field USER to put all the users inside that field.
Common things in both source types are USERID, USERCODES, REMARKS and all 3 of these fields are renamed differently in both source types.
Other than the above mentioned fields, there is nothing common in both the source types.

This is truly confusing. If transactionID is not ubiquitous, the problem remains mathematically ill defined. (Unless transactionID and IP appear in the same sourcetype; in that case, you don't need the other sourcetype at all. Does transactionID exist in the same sourcetype as IP?) Back to my earlier question about the mockup: What is the mechanism/criteria to group MARY, MARY, JANE in one row and BEATRICE, ARTHUR, ARTHUR in another row? You need to define the problem to be solvable first.

In order to group these user names and IP addresses, you must use some data (in addition to USERID) that are common in both source type, that relates to your original requirement about "at no given point ...". (Physically, this "given point" is never a mathematical point, but a measurable period of time.) A ubiquitous field across all logs is a common practice to establish such a "given point". If such a field does not exist, another technique is to define a fixed calendar time period, e.g., a day, a shift, etc., to represent this "given point." But that will only work if transactions do not overlap in time. The point is, you must define this "given point" with data that are common in both sourcetypes in order to group them in the way you wanted.

You mentioned transactionID. Is there time overlap between transactions? If not, there might be a way to populate transactionID to all sourcetypes.

Correct me if I am wrong, when we use the command | foreach USER_ID, USERID_X, USERID_Y, USERID_Z are we trying to combine the information for all these into 1 one common field called USER? (I have never used the | foreach command before, hence, the question)

That is the intention.

Something I've realized is that the information of users, can only be the same for:
USER, USERID_X and USERID_Y or
USER, USERID_Y and USERID_Z or
USER, USERID_X and USERID_Z
At no given point are all 4 of these i.e. USER, X, Y and Z picking the same users at the same time.
Maybe that's why the IP address is only picked for the user where the information is the same and leaves the IP address of the one where the user is different?

Since I have not seen actual data, I cannot tell. Put in another way, the characteristics presented above is insufficient to make a determination.

Splunk_Master01 · ‎08-20-2022

TransactionID is in one source type and IP address is in the other source type.

So, this use case is for a maker-checker-verifier where somebody can be a maker and checker but not a verifier or someone can be a checker and verifier but not a maker or someone can be a maker and verifier but not a checker. Those are the USERID_X, USERID_Y and USERID_Z respectively.

Their IP addresses need to be different as well. If one person is a maker and checker it is okay for their IP to be 10.10.10.1 but the verifier needs to have a different IP address e.g. 10.10.10.2 and so on ...

This is where I am stuck. As the IP address of the same individual appears but that of the one who has a different IP address, is not appearing in my results.

I am grouping by USER and I think Splunk is taking USERID, USERID_X, USERID_Y and USERID_Z and thinking that the information belongs to one individual and I have realized that this is not the case because of the below explanation:

The information of users, can only be the same for:

USER, USERID_X and USERID_Y or

USER, USERID_Y and USERID_Z or

USER, USERID_X and USERID_Z

At no given point are all 4 of these i.e. USER, X, Y and Z picking the same users at the same time.

Splunk Use Case- How can I compare two IP addresses when IP address is only available in one sourcetype?

eval

stats

table

Developer Spotlight with Brett Adams

Index This | What can you do to make 55,555 equal 500?

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...