Splunk Search

Sanity Check - No Web Browsing Data Model?

tretrigh
Path Finder

I'm building a search which takes a URL and returns all events from separate indexes/products where a client (user endpoint, server, etc) attempted access.  The goal is to answer "who tried to visit url X".

I have reviewed the default CIM data models here: https://docs.splunk.com/Documentation/CIM/5.1.0/User/CIMfields

However, none seem to fit this specific use case.  Can anyone sanity check me to see if I've overlooked one?  Thanks!

0 Karma

tretrigh
Path Finder

Thank you for the input everyone.  @isoutamo - you are correct in that each data source I'm looking at has vastly different data available...

Some sources come from endpoint agents which have username, endpoint name, ip address (local/public), url, url ip, etc.  Other sources from network devices and might track users by local IP only, but also might have which FW the request goes through, etc. 

I have one source which only lists a single field to identify the user.... the MAC address... really not helpful without an additional lookup.

I ended up using a number of macros and lots of coalesces to make my field names consistent.

isoutamo
SplunkTrust
SplunkTrust

If you can do you can try to standardise your company log messages like

Mandatory part

Here are some fields/information that every event must contain, regardless of the service. It doesn’t matter if those fields are KV, JSON, or just placement formatted. More important is that these are always present and easily identified in the log event.

 
 
 

Field

 

Content

 

Example

 

Purpose

 

timestamp

RFC-3339 formatted timestamp

2024-07-01T12:13:15.123+03:00

When an event occurs in this service.

log_type

audit/apps/trace/metric

audit

What is the event type from a security content perspective?

source_ip

IP of logging systems / service

10.11.22.123

Where did the event occur from an IP perspective?

source_system

Source System of log event

aa.bb.local

Host or service where event was created.

process

Process / service which has processed event

app_abc

Which application / service processed the event.

sessionId

Session where event belongs

82B98B54-9553-43CD-A5AB-E6F45656CD95

e.g. GUID to identify the entire session.

requestId

Request where event belongs

9DF09DE7-4061-487B-953C-49B73C000E2C

e.g. GUID to identify individual request within session.

userId

User’s identification on service.

a12345

Pseudonyms should be used instead of real user IDs to avoid exposing PII.

outcome

Status of action

Error

Did the action succeed or fail?

errorDetails

Details for action result

Not authorized

A more detailed error message, including the full message, could be part of the service-based payload

payload

Application / service specific parts

{ “as”:23, { “aa”:”bb”, “cc”:12}}

A separate payload based on the real audit trail needs.

 

In that way you could do some kind of DM based on this, but as I said usually the payload is the interesting part and this is different by every subsystem / service etc. And there are lot equipments which have their own log format.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

This is actually quite simple question but give a good answer is extreme hard 😞

My answer or actually thinking is based on what I have learn e.g. in finance sector.

I suppose that on almost all enterprise grade business URL is just a start/execution point for real application. This means that when this endpoint is called it's just e.g. API Gateway request which are forwarder to one (usually several) backend(s) which are processing real request and then return needed response to client. 

In tecnical point of view this means that there are a session (what user are doing in real life transaction) and this contains several request (those individual URLs) which are processing e.g. individual dashboard or step in real process.

Usually there should be sessionId which are fixed for one real life transaction e.g. login into web bank and do what ever you are doing in one login (e.g. check balance, pay some invoices, transfer money etc.). Then there is requestId which is execution for one individual URL / process step (like see our account amount, check invoice, modify invoice, accept it into pay etc.).

When you are think this workflow and which kind of event all those tens of subsystems are generating for click one entry point UR it's quite obviously that. you cannot define any DM which can describe this bunch of events. I suppose that you can do some DM for base audit data, but as payloads of different requests for backend systems are totally different it will be extremely hard to create generic DM for this. If/when needed you can do it by yourself, but quite probably it will be different for every customer or at least for every entry  point

Just some thoughts not any real answer.

r. Ismo.

 

richgalloway
SplunkTrust
SplunkTrust

The Web datamodel appears to have the fields needed for that use case.  It is documented at https://docs.splunk.com/Documentation/CIM/5.1.0/User/Web

Do you have data for that DM?  Is the data CIM-compliant so the DM can find it?

---
If this reply helps you, Karma would be appreciated.
0 Karma

tretrigh
Path Finder

Hi @richgalloway - I considered this one.  The description is:

The fields in the Web data model describe web server and/or proxy server data in a security or operational context.


Looking at the fields in this data model, this seems to me to be geared more for web servers, not the clients of those servers.

Many recommended fields in this data model would not apply to the web browsing logs from the client's perspective.  Is attempting to squeeze logs from the clients into this data model commonly done?

And to answer your questions - We have other data (from web servers) which use the Web data model.

Furthermore, the data I want to group/find with this search is definitely NOT CIM compliant.  As the number of data sources for web browsing is high for our environment (something like 10+ sources), many of the sources do not have the same information available.  I'm building a list of fields myself to standardize the names and would ideally map them to a data model.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Typically you don't have logs from the client's side 🙂 It's way way more common use case to have webserver or proxy server logs.

And yes, part of normalizing your data to make it CIM-compliant is making sure the fields are properly mapped or calculated if they're not there in the original data. You can help yourself on this task at least partially with Add-On Builder.

richgalloway
SplunkTrust
SplunkTrust

It was not clear from the OP that the data comes from clients rather than servers/proxies - just that the data is *about* clients.  That doesn't mean the DM can't be used, but some fields won't apply.  It's normal for a DM to not have all fields populated.

---
If this reply helps you, Karma would be appreciated.

tretrigh
Path Finder

You're right... the original question wasn't clear enough.  Well it was to me... but that is always the case I suppose!

I'll consider using the existing Web DM or potentially creating a new one that will allow a little more customization for what I'm after.

Thank you for the input.

0 Karma
Get Updates on the Splunk Community!

Best Strategies to Optimize Observability Costs

 Join us on Tuesday, May 6, 2025, at 11 AM PDT / 2 PM EDT for an insightful session on optimizing ...

Fueling your curiosity with new Splunk ILT and eLearning courses

At Splunk Education, we’re driven by curiosity—both ours and yours! That’s why we’re committed to delivering ...

Splunk AI Assistant for SPL 1.1.0 | Now Personalized to Your Environment for Greater ...

Splunk AI Assistant for SPL has transformed how users interact with Splunk, making it easier than ever to ...