Solved: zScaler "url" field lacks protocol and doesn't mat...

stroud_bc · ‎12-09-2019

We use the zScaler proxy product and have it configured with NSS to collect logs in Splunk Enterprise. We also download the PhishTank URL watchlist into the Threat_Intelligence framework in Enterprise Security. We have a problem because the URL field in our zScaler logs is stored differently than the IoCs in the PhishTank list.

A zScaler log for an HTTPS connection might look like either of these:
`url=google.com/images
protocol=HTTPS

url=google.com/images
protocol=SSL`

while PhishTank would provide an IoC like this:
https://google.com/images

Glancing at the Web data model, it seems like the expectation is that the URL field includes the protocol, so it seems like the logs are what need to be fixed, not the threat list (see hxxps://docs.splunk[.]com/Documentation/CIM/4.14.0/User/Web)

At first glance, it seems like the easy fix would just be to alias the "url" field to have the "protocol" field in front of it, ( url_new=protocol."://".url_old or similar) but in cases where the protocol field is SSL, it needs to be slightly more complicated. I could write this as an |eval statement, but I'm not exactly sure where to put it to put it into effect. Would this be a field extraction?

In any case, I need to conditionally add "https://" to the url field for SSL or HTTPS, while adding "http://" in cases where the protocol field is HTTP.

Many thanks!

stroud_bc · ‎01-22-2020

Ended up altering the threat_gen search for URL matches to make the url field an mvfield with the three most likely url protocol prefixes. The only line I added to the stock code was | eval url=mvappend("http://".url, "https://".url, "ftp://".url), shown in line below. This allows the lookups and the domain extraction to function properly, and the analyst is able to review the original log to see the actual protocol used. If you have Zscaler and need a cleaner solution, you can probably add "Web.transport" to the |tstats ... by clause and actually build the correct value with an eval case statement, but I went with the mvfield option for its simplicity.

| `tstats` values(sourcetype) as sourcetype,values(Web.src),values(Web.dest) from datamodel=Web.Web by Web.http_referrer 
| eval url='Web.http_referrer' 
| eval threat_match_field="http_referrer" 
| `tstats` append=true values(sourcetype) as sourcetype,values(Web.src),values(Web.dest) from datamodel=Web.Web by Web.url 
| eval url=if(isnull(url),'Web.url',url) 
| eval threat_match_field=if(isnull(threat_match_field),"url",threat_match_field)
| stats values(sourcetype) as sourcetype,values(Web.src) as src,values(Web.dest) as dest by url,threat_match_field 
| eval url=mvappend("http://".url, "https://".url, "ftp://".url)
| extract domain_from_url 
| `threatintel_url_lookup(url)` 
| `threatintel_domain_lookup(url_domain)` 
| search threat_collection_key=* 
| `mvtruncate(src)` 
| `mvtruncate(dest)` 
| `zipexpand_threat_matches`

View solution in original post

stroud_bc · ‎01-22-2020

Ended up altering the threat_gen search for URL matches to make the url field an mvfield with the three most likely url protocol prefixes. The only line I added to the stock code was | eval url=mvappend("http://".url, "https://".url, "ftp://".url), shown in line below. This allows the lookups and the domain extraction to function properly, and the analyst is able to review the original log to see the actual protocol used. If you have Zscaler and need a cleaner solution, you can probably add "Web.transport" to the |tstats ... by clause and actually build the correct value with an eval case statement, but I went with the mvfield option for its simplicity.

| `tstats` values(sourcetype) as sourcetype,values(Web.src),values(Web.dest) from datamodel=Web.Web by Web.http_referrer 
| eval url='Web.http_referrer' 
| eval threat_match_field="http_referrer" 
| `tstats` append=true values(sourcetype) as sourcetype,values(Web.src),values(Web.dest) from datamodel=Web.Web by Web.url 
| eval url=if(isnull(url),'Web.url',url) 
| eval threat_match_field=if(isnull(threat_match_field),"url",threat_match_field)
| stats values(sourcetype) as sourcetype,values(Web.src) as src,values(Web.dest) as dest by url,threat_match_field 
| eval url=mvappend("http://".url, "https://".url, "ftp://".url)
| extract domain_from_url 
| `threatintel_url_lookup(url)` 
| `threatintel_domain_lookup(url_domain)` 
| search threat_collection_key=* 
| `mvtruncate(src)` 
| `mvtruncate(dest)` 
| `zipexpand_threat_matches`

zScaler "url" field lacks protocol and doesn't match threat lists in ES

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Data Persistence in the OpenTelemetry Collector

Thanks for the Memories! Splunk University, .conf25, and our Community

Are you a member of the Splunk Community?

zScaler "url" field lacks protocol and doesn't match threat lists in ES

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Data Persistence in the OpenTelemetry Collector

Thanks for the Memories! Splunk University, .conf25, and our Community