Splunk Search

Splunk Use Case- How can I compare two IP addresses when IP address is only available in one sourcetype?

Splunk_Master01
Explorer

Hi All,

I am trying to build a use case with the below scenarios:

1) Person A can do tasks X and Y but not task Z or,

2) Person A can do tasks Y and Z but not task X or,

3) Person A can either do task X or task Y or task Z

At no given point is Person A allowed to conduct all three tasks and at no given point should the IP addresses of Person A and Person B be the same.

Information is being picked from two separate source types from the same index.

The challenge is picking the IP address when Person A does tasks X and Y and Person B does task Z, how can I get the IP addresses of both Person A and Person B, so as to compare and make sure that they are two different IP addresses, keeping in mind that the IP address is only available in one source type and not the other?

Any assistance on this would be appreciated ...

Labels (3)
Tags (2)

yuanliu
SplunkTrust
SplunkTrust

Again, you mentioned "at the same time" without defining a method to determine this "time" in which to perform calculations.  This determination is key to make the problem well defined.

Can you post output of at least two transactions from the following?

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
| table sourcetype _time transactionID USERID_X USERID_Y USERID_Z

You can anonymize as much as needed, but time sequence must be as realistically as is allowed.  In particular, I want to know whether transactions overlap in time. 

0 Karma

Splunk_Master01
Explorer

I have run what you had requested for and the below are the sample results:

SOURCETYPE_TIMETRANSACTION_IDUSERID_XUSERID_YUSERID_Z
sourcetypeB2022-08-20 12:08:12012434PAMELAAMYAMY
sourcetypeB2022-08-20 12:08:12012434PAMELAAMYAMY
sourcetypeB2022-08-20 13:16:10013435DIANAARTHURARTHUR
sourcetypeB2022-08-20 13:16:10013435DIANAARTHURARTHUR

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

In this set of output, there is no sourcetypeA.  Because you indicated that TRANSACTION_ID was only in one sourcetype and is not in the same sourcetype that contained IP, this means that your search returned no IP address.  I think it is safe to say that there is no way to display IP in any format.  Is this correct?

You need to extend your search to include at least two complete transactions including sourcetypeA that you believe belongs to these two transactions. (That will give clues to what criteria are potentially used to determine which data belongs to which transaction.  It is not a perfect way to get those criteria but if you cannot describe it directly, it is worth a try.)

0 Karma

Splunk_Master01
Explorer

I have re-run the query, see below for sample results:

SOURCETYPE_TIMEUSER_IDDEVICE_IDTRANSACTION_IDUSERID_XUSERID_YUSERID_Z
sourcetypeA2022-08-24 08:14:56DANIEL10.10.10.1    
sourcetypeA2022-08-24 08:12:27DAVID10.10.50.2    
sourcetypeB2022-08-23 16:59:02  017235PAMELAPAMELAAMY
sourcetypeB2022-08-23 17:28:07  017550JOHNSTANLEYSTANLEY
0 Karma

Splunk_Master01
Explorer

@yuanliu

I have re-run the query, see below for sample results:

SOURCETYPE_TIMEUSER_IDDEVICE_IDTRANSACTION_IDUSERID_XUSERID_YUSERID_Z
sourcetypeA2022-08-24 08:14:56DANIEL10.10.10.1    
sourcetypeA2022-08-24 08:12:27DAVID10.10.50.2    
sourcetypeB2022-08-23 16:59:02  017235PAMELAPAMELAAMY
sourcetypeB2022-08-23 17:28:07  017550JOHNSTANLEYSTANLEY

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, each transaction is reported in a single event in sourcetypeB.  Now, can you run the same query, maybe using some larger time range, as to reveal IP addresses of JOHN, STANLEY, PAMELA, and AMY?  What is really important is to understand WHEN these IP addresses is revealed because you indicate that there is no other relationship between sourcetypeA and sourcetypeB.  Because DANIEL and DAVID are not involved in any of the illustrated transactions, those values are of no use to your task.

Another possibility is that in the entire sourcetypeA, each of these users can only appear once.  Can this be true?  That seems contradict to possibilities you laid out in the question itself.

Additionally, can you explain the listing header USERID (not USERID_X or _Y or _Z) corresponding to sourcetypeA?  Using "| table sourcetype _time transactionID USERID_X USERID_Y USERID_Z", it is not possible to have a field named USERID.

0 Karma

Splunk_Master01
Explorer

@yuanliu 

USERID and IP Adderess belongs to sourcetype A

USERID_X, USERID_Y, USERID_Z and TransactionID belong to sourcetype B.

USERID and USERID_X, Y and Z is what is common between both sourcetypes. 

USERID_X, Y and Z depict Makers, Checkers and Verifiers where for example Pamela can be a maker and a checker but not verifier i.e UserID_X and Y or she can be a checker and a verifier but not a maker i.e USERID_Y and Z or she can be a maker and a verifier but not a checker i.e USERID_X and Z.

What is happening with my query currently, is that it is picking the users correctly, it is also picking IP addresses but it picks IP address for users that are the same. 

So if Pamela is a maker and a checker, it will pick her IP and Arthur is the verifier but it instead of picking Arthur's IP it will pick Pamela's IP address because we have used FOREACH and grouped by USER in the query. 

Is there a way in which I can give the conditions and say that where Splunk sees USERID_X and USERID_Y is the same, it should pick it as one user and give me the IP address and and it should also display USERID_Z and give me the IP address? 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I still cannot wrap my mind around the requirement because I cannot see where and (most importantly) when the IP addresses of JOHN, STANLEY, PAMELA, and AMY are captured.  Is there some way to show that?  What is the nature of mysourcetypeA?  Is it a real time series or is it just a lookup of permanent or current addresses? (If mysroucetypeA is permanent, your problem can be solved with no need for mysourcetypeB; if it only contains current addresses, your problem has no dynamic solution, and again, information is mysourcetypeB is irrelevant.)


USERID and IP Adderess belongs to sourcetype A

This is another long-lost piece of the puzzle in previous back-and-forths.  You did mention a USERID_A in one earlier reply but all later discussion implied that USERID_X, _Y, _Z were all there were. (I understand your need to obfuscate data.  But whenever you use anonymized field names, consistence is super important.  Field mapping must be one-to-one and cannot change.)

Yet another missing piece of the puzzle was also revealed very late, that USERID_X, _Y, _Z all appeared in each and same event of sourcetypeB. (This is a really strange way to record a multi-party transaction.)  The foreach command is devised based on the following assumption that I explicitly stated: "This assumes that different field names do not appear in the same event."

As mentioned above, there is still one missing piece of information: how is "given point" or "at the same time" can be calculated in mysourcetypeA and mysourcetypeB, respectively?  Without this information, the problem remains ill defined.

Without this most important missing piece, you can try the following blindly:

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
| eval USER = if(sourcetype == "mysourcetypeA", USERID, mvappend(USERID_X, USERID_Y, USERID_Z))
| stats values(IP) as IP values(transactionID) as transactionID by USER
| stats values(USER) as USER by IP transactionID
``` | where mvcount(USER) > 1 ``` ``` this would be the alert ```
| eval users_per_IP = mvcount(USER)
| sort - user_per_IP

The above incorporates the two missing pieces that I just pieced together.  But it will not give any better result than the following search using mysourcetypeA only:

index=myindex (sourcetype="mysourcetypeA")
| stats values(USERID) as USER dc(USERID) as user_per_IP by IP
| sort - user_per_IP

 

0 Karma

Splunk_Master01
Explorer
index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)
    | eval "IP OF <<FIELD>>" = IP]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID

This is the only query which is close to what I require.

I only need it to pick the IP address of the user where the USERID_X is not the same as USERID_Y and USERID_Z 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

As I preambled when introducing that foreach iteration, it is based on the assumption that USERID_X, _Y, _Z, etc., do not appear in the same event.  Otherwise USER will only take on the value of USERID_Z.  As enticing as it is, this is not what you wanted.

In addition, the "IP of <<FIELD>>" assignment is also meaningless in your data set because IP only exists where none of USERID_X, _Y, or _Z exists.  that entire line can only return the last (earliest) IP address in the entire mysourcetypeA.  That is also not what you wanted.

From my observation, the purpose of showing "IP of xxx" is just to visually validate alert output.  But there are a million ways to validate output without involving such laborious tactic.  This said, you can try

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
| eval USER = if(sourcetype == "mysourcetypeA", USERID, mvappend(USERID_X, USERID_Y, USERID_Z))
| eventstats values(IP) as IP values(transactionID) as transactionID by USER
| mvexpand USER
| foreach USERID_X USERID_Y USERID_Z
    [ eval "IP of <<FIELD>>" = if(USER == <<FIELD>>, IP, 'IP of <<FIELD>>') ]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID

 

 

0 Karma

Splunk_Master01
Explorer

@yuanliu, or anybody else, please assist.

How can I display IP addresses for the users?

0 Karma

Splunk_Master01
Explorer

I need results to be close to this format:

index=myindex (sourcetype="mysourcetypeA" OR sourcetype=mysouretypeB)
|  foreach USERID_X USERID_Y USERID_Z
    [eval USER = if(isnull(<<FIELD>>), USER, <<FIELD>>)
    | eval "IP OF <<FIELD>>" = IP]
| stats values(USER) as USER values(USERID_X) as USERID_X values(eval('IP OF USERID_X')) as "IP OF USERID_X" values(USERID_Y) as USERID_Y values(eval('IP OF USERID_Y')) as "IP OF USERID_Y" values(USERID_Z) as USERID_Z values(eval('IP OF USERID_Z')) as "IP OF USERID_Z" by transactionID

But I need information be picked correctly for each user.

0 Karma

Splunk_Master01
Explorer

@yuanliu

Correct me if I am wrong. When we use the command | foreach  USER_ID, USERID_X, USERID_Y, USERID_Z are we trying to combine the information for all these into 1 USER?

Something I've realized is that the information can only be the same for:

USER, USERID_X and USERID_Y or

USER, USERID_Y and USERID_Z or

USER, USERID_X and USERID_Z

At no given point are USER, X, Y, Z the same at the same time.

Maybe that's why the IP address is only picked for the user where the information is the same and leaves the IP address of the one where the user is different?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

gcusello's code should result in a list like the following

personip_countip_addresstask
Person A2

10.101.8.1

10.101.9.3

X

Y

Person B3

10.101.7.105

10.101.8.1

10.101.8.12

Z

If in your data you only see one line, check your search time period to make sure both Person A and Person B have activities.

The code does not directly address the part of requirement about At no given point.  You may need a time bucket variable to flesh that out.  In physics, there is no such a thing as a point in time, so I assume that you meant "during no given time interval" (of a predefined length).  In general you don't want to deal with a rolling time interval in Splunk, so choose a meaningful span, say, 1h, and use gcusello's formula with this modifier

index=your_index (sourcetype=sourcetypeA OR sourcetype=sourcetypeB)
| bin span=1h _time
| stats dc(ip_address) AS ip_count values(ip_address) AS ip_address values(task) AS task BY _time person

The span can be any value that is a common multiple of your data collection intervals.

gcusello
SplunkTrust
SplunkTrust

Hi @Splunk_Master01,

maybe there's an error in the question: you sayed:

3) Person A can either do task X or task Y or task Z

At no given point is Person A allowed to conduct all three tasks 

then you don't give any condition for PersonB.

Then some additional information:

  • I suppose that IP address is in a field called "ip_address"
  • you have ip_address of PersonA and PersonB from the first sourcetype, instead tasks are from the second sourcetype, is this correct?

Ciao.

Giuseppe

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...