Sanity Check - No Web Browsing Data Model?

tretrigh · ‎01-07-2025

I'm building a search which takes a URL and returns all events from separate indexes/products where a client (user endpoint, server, etc) attempted access. The goal is to answer "who tried to visit url X".

I have reviewed the default CIM data models here: https://docs.splunk.com/Documentation/CIM/5.1.0/User/CIMfields

However, none seem to fit this specific use case. Can anyone sanity check me to see if I've overlooked one? Thanks!

tretrigh · ‎01-08-2025

Thank you for the input everyone. @isoutamo - you are correct in that each data source I'm looking at has vastly different data available...

Some sources come from endpoint agents which have username, endpoint name, ip address (local/public), url, url ip, etc. Other sources from network devices and might track users by local IP only, but also might have which FW the request goes through, etc.

I have one source which only lists a single field to identify the user.... the MAC address... really not helpful without an additional lookup.

I ended up using a number of macros and lots of coalesces to make my field names consistent.

isoutamo · ‎01-08-2025

If you can do you can try to standardise your company log messages like

Mandatory part

Here are some fields/information that every event must contain, regardless of the service. It doesn’t matter if those fields are KV, JSON, or just placement formatted. More important is that these are always present and easily identified in the log event.

Field

Content

Example

Purpose

timestamp	RFC-3339 formatted timestamp	2024-07-01T12:13:15.123+03:00	When an event occurs in this service.
log_type	audit/apps/trace/metric	audit	What is the event type from a security content perspective?
source_ip	IP of logging systems / service	10.11.22.123	Where did the event occur from an IP perspective?
source_system	Source System of log event	aa.bb.local	Host or service where event was created.
process	Process / service which has processed event	app_abc	Which application / service processed the event.
sessionId	Session where event belongs	82B98B54-9553-43CD-A5AB-E6F45656CD95	e.g. GUID to identify the entire session.
requestId	Request where event belongs	9DF09DE7-4061-487B-953C-49B73C000E2C	e.g. GUID to identify individual request within session.
userId	User’s identification on service.	a12345	Pseudonyms should be used instead of real user IDs to avoid exposing PII.
outcome	Status of action	Error	Did the action succeed or fail?
errorDetails	Details for action result	Not authorized	A more detailed error message, including the full message, could be part of the service-based payload
payload	Application / service specific parts	{ “as”:23, { “aa”:”bb”, “cc”:12}}	A separate payload based on the real audit trail needs.

In that way you could do some kind of DM based on this, but as I said usually the payload is the interesting part and this is different by every subsystem / service etc. And there are lot equipments which have their own log format.

isoutamo · ‎01-08-2025

This is actually quite simple question but give a good answer is extreme hard 😞

My answer or actually thinking is based on what I have learn e.g. in finance sector.

I suppose that on almost all enterprise grade business URL is just a start/execution point for real application. This means that when this endpoint is called it's just e.g. API Gateway request which are forwarder to one (usually several) backend(s) which are processing real request and then return needed response to client.

In tecnical point of view this means that there are a session (what user are doing in real life transaction) and this contains several request (those individual URLs) which are processing e.g. individual dashboard or step in real process.

Usually there should be sessionId which are fixed for one real life transaction e.g. login into web bank and do what ever you are doing in one login (e.g. check balance, pay some invoices, transfer money etc.). Then there is requestId which is execution for one individual URL / process step (like see our account amount, check invoice, modify invoice, accept it into pay etc.).

When you are think this workflow and which kind of event all those tens of subsystems are generating for click one entry point UR it's quite obviously that. you cannot define any DM which can describe this bunch of events. I suppose that you can do some DM for base audit data, but as payloads of different requests for backend systems are totally different it will be extremely hard to create generic DM for this. If/when needed you can do it by yourself, but quite probably it will be different for every customer or at least for every entry point

Just some thoughts not any real answer.

r. Ismo.

richgalloway · ‎01-07-2025

The Web datamodel appears to have the fields needed for that use case. It is documented at https://docs.splunk.com/Documentation/CIM/5.1.0/User/Web

Do you have data for that DM? Is the data CIM-compliant so the DM can find it?

---
If this reply helps you, Karma would be appreciated.

tretrigh · ‎01-07-2025

Hi @richgalloway - I considered this one. The description is:

The fields in the Web data model describe web server and/or proxy server data in a security or operational context.

Looking at the fields in this data model, this seems to me to be geared more for web servers, not the clients of those servers.

Many recommended fields in this data model would not apply to the web browsing logs from the client's perspective. Is attempting to squeeze logs from the clients into this data model commonly done?

And to answer your questions - We have other data (from web servers) which use the Web data model.

Furthermore, the data I want to group/find with this search is definitely NOT CIM compliant. As the number of data sources for web browsing is high for our environment (something like 10+ sources), many of the sources do not have the same information available. I'm building a list of fields myself to standardize the names and would ideally map them to a data model.

PickleRick · ‎01-07-2025

Typically you don't have logs from the client's side 🙂 It's way way more common use case to have webserver or proxy server logs.

And yes, part of normalizing your data to make it CIM-compliant is making sure the fields are properly mapped or calculated if they're not there in the original data. You can help yourself on this task at least partially with Add-On Builder.

richgalloway · ‎01-07-2025

It was not clear from the OP that the data comes from clients rather than servers/proxies - just that the data is *about* clients. That doesn't mean the DM can't be used, but some fields won't apply. It's normal for a DM to not have all fields populated.

---
If this reply helps you, Karma would be appreciated.

tretrigh · ‎01-07-2025

You're right... the original question wasn't clear enough. Well it was to me... but that is always the case I suppose!

I'll consider using the existing Web DM or potentially creating a new one that will allow a little more customization for what I'm after.

Thank you for the input.