Splunk Search

How to interpret the asterisk as a wildcard in a join left?

JChris_
Path Finder

I have an index where one of the relevant fields is a domain. This index is used in a search in a dashboard, where I have a table showing relevant fields. Let's consider this:

 

index=foo contains 5 logs:

  1. domain=community.example.com serial=123 desc=whatever1
  2. domain=dev.example.com serial=456 desc=whatever2
  3. domain=abc.dev.example.com serial=789 desc=whatever3
  4. domain=def.dev.com serial=098 desc=whatever4
  5. domain=blog.example.com serial=765 desc=whatever5

 

I have a whitelist csv inputlookup with the following:

  1. *.dev.example.com
  2. blog.example.com

 

I want the table to show a column "whitelist" with YES if the domain is in the whitelist. I'm using the query below to do that, but it does not work for wildcard domains. For example, all domains *.dev.example.com (in my example, abc.dev.example.com and def.dev.example.com) should have the column whitelist as YES, but they appear blank. On the other hand, blog.example.com appear with YES in the whitelist column, because it does not have any wildcard in the whitelist.

 

 

index=foo
    | join type=left domain
    [| inputlookup whitelist_foo.csv 
    | eval whitelist="YES"
    | fields domain, serial, desc, whitelist]
| table domain, serial, desc, whitelist

 

 

From what I understood, the join left interprets the asterisk as a string, not as a wildcard. How can I overcome that?

Labels (1)
0 Karma

JChris_
Path Finder

Anyone?

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@JChris_ 

Can you please try this?

index=foo | fields domain serial desc
| join [| inputlookup whitelist_foo.csv | eval whitelist="YES" | fields domain  | eval domain=replace(domain,"\*","%") | stats delim="," values(domain) as domain_1] 
| eval domain_1=split(domain_1,",") | mvexpand domain_1
| where like(domain,domain_1)

 

My Sample Search :

| makeresults | eval raw="domain=community.example.com serial=123 desc=whatever1|
domain=dev.example.com serial=456 desc=whatever2|
domain=abc.dev.example.com serial=789 desc=whatever3|
domain=def.dev.com serial=098 desc=whatever4|
domain=blog.example.com serial=765 desc=whatever5" | eval raw=split(raw,"|
") | mvexpand raw | rename raw as _raw | extract | table domain serial desc |
join [
| makeresults | eval _raw="domain
*.example.com
*.dev.com
def.dev.com
blog.example.com"| multikv forceheader=1 | table domain | eval domain=replace(domain,"\*","%") | stats delim="," values(domain) as domain_1] | eval domain_1=split(domain_1,",") | mvexpand domain_1
| where like(domain,domain_1)


 Thanks
KV
▄︻̷̿┻̿═━一

If any of my reply helps you to solve the problem Or gain knowledge, an upvote would be appreciated.

0 Karma

JChris_
Path Finder

Did not work, the column whitelist was empty for everything. Maybe my demo data was not good enough and did not reflect my actual data precisely, here is a better demo data and better explanation of the issue.

 

Consider whitelist_foo.csv as an inputlookup with the following data:

"_time","common_name",description,"issuer_name","requester"
"2021-01-01T00:00:00.000+0000","*.alpha.examplecloud.com","Free text, can be anything.","C=US, O=Amazon, OU=*, CN=Amazon","John"
"2021-01-01T00:00:00.000+0000","partner.example.com","Free text, can be anything.","C=US, O=Let's Encrypt, CN=R3","Mary"
"2021-01-01T00:00:00.000+0000","dev.example.io","Free text, can be anything.","C=US, O=Let's Encrypt, CN=R3","Bob"
"2021-01-01T00:00:00.000+0000","*.dev.example.io","Free text, can be anything.","C=US, O=Amazon, OU=*, CN=Amazon","Victor"
"2021-01-01T00:00:00.000+0000","status.example.com","Free text, can be anything.","C=US, O=Let's Encrypt, CN=R3","Alice"

Take note that common_name and issuer_name can have wildcard (*) and those should be taken into consideration in the dashboard query, just like I have it today.

 

Currently, the table in my dashboard is created using this query:

index=foo NOT 
    [| inputlookup whitelist_foo.csv 
    | table common_name, issuer_name] 
| eval _time=strptime(entry_timestamp, "%FT%T.%3N") 
| table entry_timestamp, not_before, not_after, common_name, issuer_name, serial_number 
| sort -entry_timestamp

 

What this query does is hide all domains included in the whitelist (be it a precise match or a wildcard match). If, for example, in the last few days the certificates below were issued, we would have the following behaviour:

  1. bar.dev.example.io (not shown, because match wildcard)
  2. dev.example.io (not shown, because precise match)
  3. abc.example.com (shown, because no match)
  4. xyz.partner.example.com (shown, because no match)
  5. example.com (shown, because no match)
  6. example.io (shown, because no match)
  7. examplecloud.com (shown, because no match)
  8. lipsum.status.example.com (shown, because I did not whitelist *.status.example.com)
  9. alpha.examplecloud.com (shown, because I only whitelist *.alpha.examplecloud.com)
  10. dev.example.io issued by CA ShaddyCo (shown, because did not match the CA in the whitelist)

 

Hope this give you a better idea of the issue.

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@JChris_ 

Have you tried this?

index=foo NOT 
    [| inputlookup whitelist_foo.csv 
    | table common_name, issuer_name | format | fields search ] 
| eval _time=strptime(entry_timestamp, "%FT%T.%3N") 
| table entry_timestamp, not_before, not_after, common_name, issuer_name, serial_number 
| sort - entry_timestamp
0 Karma

JChris_
Path Finder

This is the behaviour I have already with the query I provided in the post above. I want to change this to a table that show ALL entries, but will have a whitelist column with YES/NO.

The query you provided will only show the non-whitelisted 🙂

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...