Splunk Search

How to combine two searches into one?

sulaimancds
Engager

1st query

 

 

 

index=mail NOT [ | inputlookup suspicoussubject_keywords.csv | rename keyword AS query | fields query ]
| lookup email_domain_whitelist domain AS RecipientDomain output domain as domain_match
| where isnull(domain_match)
| lookup all_email_provider_domains domain AS RecipientDomain output domain as domain_match2
| where isnotnull(domain_match2)
| stats values(recipient) as recipient values(subject) as subject earliest(_time) AS "Earliest" latest(_time) AS "Latest" by RecipientDomain sender
| where mvcount(recipient)=1
| eval subject_count=mvcount(subject)
| sort - subject_count
| convert ctime("Latest")
| convert ctime("Earliest")

 

 

 

2nd query

 

 

 

index=o365
| dedup Id
| rename _time as DateTime, PolicyDetails{}.PolicyName as PolicyName, PolicyDetails{}.Rules{}.RuleName as RuleName, ExchangeMetaData.UniqueID as UniqueID, ExchangeMetaData.Subject as Subject, ExchangeMetaData.From as Sender, ExchangeMetaData.To{} as Recipient, ExchangeMetaData.CC{} as CC, ExchangeMetaData.BCC{} as BCC, ExchangeMetaData.RecipientCount as RecipientCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Count as SensitiveInformationCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Name as PIIName, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Value as PIIValue, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Location as Location
| dedup UniqueID
| rex field=Recipient "@(?<domain>.*$)"
| rex field=CC "@(?<domain>.*$)"
| rex field=BCC "@(?<domain>.*$)"
| eval domain=lower(domain)
| lookup email_domain_whitelist domain output domain as domain_match
| where isnull(domain_match)
| stats values(Recipient) values(CC) values(BCC) values(domain) Count sum(SensitiveInformationCount) by PolicyName Subject Sender
| sort +values(domain)

 

 

 

hi i would like to combine the first query into the second query , but the second query only shows those matching the policy , other than that it does not show. i want to show those matching the policy and if does  not match also , please show it, but policy field will be empty.  please advise. index will be o365.

Labels (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Even though your contexts are different from https://community.splunk.com/t5/Splunk-Search/Multiple-sourcetypes-combine-datasets-similar-to-conce..., the constraints are extremely similar.  See if this restated problem is accurate:

  • Preserve all output from the mail search.
  • Add fields of interest from the o365 search if "something" match with that in the mail search.

One critical piece of information you left undefined is that "something" you want to match.  Is it the recipient's domain? Is it recipient?  Is it recipient AND sender?  Is it just sender?  Is it sender and subject?  Can it be sender, recipient, and Subject?  The possible combinations are too many. You need to be very clear about your requirement because volunteers here are not mind readers.  Because the only shared single-value information between the two stats is sender, I will first use "sender" as that "something" in the following to demonstrate a technique.

In the other thread, it is fairly easy to construct a search that does not use the expensive join command.  Your searches are more complex.  So, before attempting a joinless solution, let me first ask: Why not just use inner join, then remove anything that comes only from o365? (If not for reasons related to performance and memory.)  In the following example, I will also collapse several lookup-filter actions in the mail search into index search constraints to improve performance.

index=mail
    NOT
    [| inputlookup suspicoussubject_keywords.csv | rename keyword AS query | fields query ]
    NOT
    [| inputlookup email_domain_whitelist
    | fields domain
    | rename domain AS RecipientDomain]
    [| inputlookup all_email_provider_domains
    | fields domain
    | rename domain AS RecipientDomain]
| stats values(recipient) as recipient values(subject) as subject min(_time) AS "Earliest" max(_time) AS "Latest" by RecipientDomain sender
| where mvcount(recipient)=1
| eval subject_count=mvcount(subject)
| sort - subject_count
| convert ctime("Latest")
| convert ctime("Earliest")
| join type=inner sender=Sender
    [search index=o365
| dedup Id
| rename _time as DateTime, PolicyDetails{}.PolicyName as PolicyName, PolicyDetails{}.Rules{}.RuleName as RuleName, ExchangeMetaData.UniqueID as UniqueID, ExchangeMetaData.Subject as Subject, ExchangeMetaData.From as Sender, ExchangeMetaData.To{} as Recipient, ExchangeMetaData.CC{} as CC, ExchangeMetaData.BCC{} as BCC, ExchangeMetaData.RecipientCount as RecipientCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Count as SensitiveInformationCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Name as PIIName, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Value as PIIValue, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Location as Location
| dedup UniqueID
| rex field=Recipient "@(?<domain>.*$)"
| rex field=CC "@(?<domain>.*$)"
| rex field=BCC "@(?<domain>.*$)"
| eval domain=lower(domain)
| lookup email_domain_whitelist domain output domain as domain_match
| where isnull(domain_match)
| stats values(Recipient) values(CC) values(BCC) values(domain) Count sum(SensitiveInformationCount) by PolicyName Subject Sender
| sort +values(domain)]
| where NOT isnotnull(PolicyName) ``` from o365 ``` AND isnull(Earliest) ``` from mail ```

Does this look like what you wanted?  I am fairly certain that sender (alone) is not that "something" you want to "combine" with.  But the above shows a path using join.

Now, join is expensive time-wise and memory-wise; inner join even more so.  There are techniques to avoid join.  But unless the business case is correct, reducing cost would be wasted effort.

Tags (1)
0 Karma

sulaimancds
Engager

Hi, 

 

Recipient domain is the match. In the lookup there is Gmail, in recipient email, it will shows the results. I need to use o365 logs only is that possible with the criteria. 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Based on your original searches, RecipientDomain is a standalone field that directly comes from index mail.  In o365 search, recipient domain is extracted from three possible fields, ExchangeMetaData.To{}, ExchangeMetaData.CC{}, and ExchangeMetaData.BCC{}; the stats function group all of their values into a multivalue field "values(domain)", grouped by Sender.   The following command will join the two searches by these two final fields.  I will then explain some caveats of such a match.  You will need to experiment and refine your requirement.

index=mail
    NOT
    [| inputlookup suspicoussubject_keywords.csv | rename keyword AS query | fields query ]
    NOT
    [| inputlookup email_domain_whitelist
    | fields domain
    | rename domain AS RecipientDomain]
    [| inputlookup all_email_provider_domains
    | fields domain
    | rename domain AS RecipientDomain]
| stats values(recipient) as recipient values(subject) as subject min(_time) AS "Earliest" max(_time) AS "Latest" by RecipientDomain sender
| where mvcount(recipient)=1
| eval subject_count=mvcount(subject)
| sort - subject_count
| convert ctime("Latest")
| convert ctime("Earliest")
| join type=inner RecipientDomain
    [search index=o365
    | dedup Id
    | rename _time as DateTime, PolicyDetails{}.PolicyName as PolicyName, PolicyDetails{}.Rules{}.RuleName as RuleName, ExchangeMetaData.UniqueID as UniqueID, ExchangeMetaData.Subject as Subject, ExchangeMetaData.From as Sender, ExchangeMetaData.To{} as Recipient, ExchangeMetaData.CC{} as CC, ExchangeMetaData.BCC{} as BCC, ExchangeMetaData.RecipientCount as RecipientCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Count as SensitiveInformationCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Name as PIIName, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Value as PIIValue, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Location as Location
    | dedup UniqueID
    | rex field=Recipient "@(?<domain>.*$)"
    | rex field=CC "@(?<domain>.*$)"
    | rex field=BCC "@(?<domain>.*$)"
    | eval domain=lower(domain)
    | lookup email_domain_whitelist domain output domain as domain_match
    | where isnull(domain_match)
    | stats values(Recipient) values(CC) values(BCC) values(domain) as RecipientDomain Count sum(SensitiveInformationCount) by PolicyName Subject Sender]
| where NOT isnotnull(PolicyName) ``` from o365 ``` AND isnull(Earliest) ``` from mail ```

Given that the o365 search groups domains by PolicyName, Subject, and Sender, this means that if a sender sent two E-mails with the same subject to totally different groups of recipients under the same policy, both groups of recipients will be returned even if one of the groups do not contain any recipient from that matching domain.

This is to say that, if this is not your intention, you need to rethink how the o365 search should be crafted.

0 Karma

sulaimancds
Engager

there are Complete 115 events  , but nothing is shown under statitcs.

0 Karma

sulaimancds
Engager

hi let me make it easier for you to understand ,

 

| lookup email_domain_whitelist domain AS RecipientDomain output domain as domain_match
| where isnull(domain_match)
| lookup all_email_provider_domains domain AS RecipientDomain output domain as domain_match2
| where isnotnull(domain_match2)
| stats values(recipient) as recipient values(subject) as subject earliest(_time) AS "Earliest" latest(_time) AS "Latest" by RecipientDomain sender
| where mvcount(recipient)=1
| eval subject_count=mvcount(subject)
| sort - subject_count
| convert ctime("Latest")
| convert ctime("Earliest")

 

i need this requirments under O365 index , also the current one which matches the policy should show the results too 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

So you do not want to "combine" results of the two queries into one, just to apply some additional conditions to the o365 search, conditions used in the mail search that haven't been applied in the o365 search. You also want to change the original stats output to be closer to the illustrated mail search. (For example, everything about CC and BCC can be ignored.)  However, your desired search itself has nothing to do with data in index mail.   Is this correct?

If the above is correct, I will start by noting that the filter around lookup email_domain_whitelist is already applied in the o365 search.  Therefore there is no need to reapply.  Secondly, because you have already done most of the coding, I am not sure what is the real question.

If I make some superficial field name adjustments, what you explained can easily be

index=o365
| dedup Id
| rename _time as DateTime, PolicyDetails{}.PolicyName as PolicyName, PolicyDetails{}.Rules{}.RuleName as RuleName, ExchangeMetaData.UniqueID as UniqueID, ExchangeMetaData.Subject as Subject, ExchangeMetaData.From as Sender, ExchangeMetaData.To{} as Recipient, ExchangeMetaData.CC{} as CC, ExchangeMetaData.BCC{} as BCC, ExchangeMetaData.RecipientCount as RecipientCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Count as SensitiveInformationCount, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Name as PIIName, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.SensitiveInformationDetections.DetectedValues{}.Value as PIIValue, PolicyDetails{}.Rules{}.ConditionsMatched.SensitiveInformation{}.Location as Location
| dedup UniqueID
| rex field=Recipient "@(?<domain>.*$)"
| rex field=CC "@(?<domain>.*$)"
| rex field=BCC "@(?<domain>.*$)"
| eval domain=lower(domain)
| lookup email_domain_whitelist domain output domain as domain_match
| where isnull(domain_match)
| lookup all_email_provider_domains domain output domain as domain_match2
| where isnotnull(domain_match2)
| stats values(Recipient) as Recipient values(Subject) as Subject min(_time) as Earliest max(_time) as Latest by Recipient Sender
| where mvcount(Recipient)==1
| eval subject_count=mvcount(Subject)
| sort - subject_count
| convert ctime("Latest")
| convert ctime("Earliest")

 

0 Karma

sulaimancds
Engager

hi

 

only those matching the  policy will show for o365. i want to show all , and if hitsthe policy , it shoud show that it his the policy PII. Also there is an error running the command . currently it only show hit PII , i want to show the rest also , which does not hit any PII , if the recipient is free-domains like gmail, yahoo, outlook, then it should show the result as well

 

Error in 'stats' command: The output field 'Recipient' cannot have the same name as a group-by field.

0 Karma

sulaimancds
Engager

let me give you the raw log 

{"Organization": "groupxyz.onmicrosoft.com", "MessageId": "<12345678>", "Received": "2023-03-13T01:56:22.9207071", "SenderAddress": "bca@bca.com", "RecipientAddress": "dlf@g.com", "Subject": "12312312332231'", "Status": "Delivered", "ToIP": "111.1.11.1", "FromIP": "12.23.4.2.23232", "Size": 2022121 "MessageTraceId": "4f74644747749djhrhfbf", "Index": 0}

in

hi this is my raw data; how can i show it in a table in a nice format?

index=o365, index=mail also has the same information , maybe we can use mail too

RecipientDomain

sender

recipient

subject

Earliest

Latest

 

 

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

This just goes to show how important it is to illustrate your data (with proper sanitization) before asking a question.  The raw log you showed contains very few fields used in your illustrated codes.  If this is the real data format and the search codes are the real ones you tried, how can you expect the code to do anything?

Strictly using the data you illustrated, assuming it is the exact format that Splunk ingests, Splunk would have already extracted (flattened) all JSON nodes to respective fields, namely FromIP, MessageId, Organization, Received, RecipientAddress, SenderAddress, Size, Status, Subject, and ToIP.  The data contains no array.  So, all fields are single value.  The JSON also has no nested nodes.  So, all fields are flat.  To achieve your illustrated output, all you  need to do is to extract SenderDomain from SenderAddress, then stats on _time to get earliest and latest, i.e.,

| rex field=RecipientAddress "[^@]@(?<RecipientDomain>.+)"
| stats min(_time) as Earliest max(_time) as Latest by RecipientDomain SenderAddress RecipientAddress Subject
| convert ctime(Earliest), ctime(Latest)

So, I didn't bother to rename RecipientAddress as recipient, SenderAddress as sender.  But you already know how to do that if that is desirable.

This, of course, is a far cry from the original problem statement.  I suspect that there are some additions to the search.  But at least the search should match real data.

PickleRick
SplunkTrust
SplunkTrust

Your previous searches included deduping by Id field, some renames on json substructures and whatnot whereas the event you posted is a very simple single-level json structure. You don't have most of the fields you're referring to in your search. Either your search is completely irrelevant to your data or you're not posting the events you're searching from.

0 Karma

sulaimancds
Engager

there is error in the command 

 

Error in 'join' command: Invalid argument: 'sender=sender'

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Let me say first that your 1st search might (but that would need some debugging) be highly suboptimal. (due to a negation and possibly a large list of the negated terms).

In second search you might be getting wrong results. If Id field doesn't uniquely identify combination of interesting fields, you'll be losing data because dedup will return only first event with given Id.

It's not clear what the combined search should return. Yes, there are some common fields but it's not clearly stated what you want to get.

So please show us an example what first search returns, what second one returns and what would be the desired output.

0 Karma

sulaimancds
Engager

Hi,

 

o365 has all mail logs , so let's ignore the first one . I would just need some of the filters from first one to be in second one o365. the problem with second is that only if policy hits , it will show , but i want to show both hit policy and not hit policy. 

 

filter 

1 to 1 

Subject count

and both lookup list.

 

Please try to help me . i can pm you the details you need.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, we don't know what data you have in this index. Maybe some people know by heart what kind of data o365 provides (and what TA it's ingested with and so on) but I don't.

So post here some samples of your events (anonymized if needed) and description what you want to achieve.

Remember that this is a community forum so that everyone can benefit (for example if someone in the future has similar problem and stumbles across this thread), not a free technical support service.

0 Karma

sulaimancds
Engager
  • okay thank you 

 

i would like to combine  the first one into o365 , even if does not match the policy it should show it to me  all the results .  recipient should be 1 to 1 and show subject count as well, show bcc and cc as well.

 

the other 

RecipientDomainsenderrecipientsubjectEarliestLatestsubject_count
gmail.com123@xyz.com123@gmail.comdata3/10/2023 18:523/10/2023 18:5213 

 

 

o365

PolicyNameSubjectSenderEarliest EventLatest EventToCCBCCDomainCountTotal Sensitive Information Count
PII data123@xyz.com3/10/2023 18:523/10/2023 18:52123@gmail.com  gmail.com1

 

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK, but you said that the o365 index should contain all the information so why read from two indexes if you can get all info from one? "Joining" (I use quotes because it's not relational database join) data from separate datasets is usually more complicated that getting it from one index.

0 Karma

sulaimancds
Engager

Hi, 

 

 

Yes I want to use o365 only with some of the filter like subject count, 1 recipient only and lookup list However O365, only shows those matching the policy. If it does not match the policy, it does not show it to me. Please help for the new query. 

0 Karma

sulaimancds
Engager

Hi, 

 

 

Just show me how to use o365 email logs and show all those hitting an the policy and those not hitting the policy, I will try to figure it out after that. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

But you're still showing what you're getting from your search, not what data you have in the index. I have no idea whether - for example - emails not hitting any policy are getting logged at all

0 Karma

sulaimancds
Engager

Hi o365 logs has all email captures. 

I've shown you the table above for PII result table.

 

Please help. 

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...