- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Handling nulls in a string
I've got this search
index=my_index data_type=my_sourcetype earliest=-15m latest=now
| eval domain_id=if(isnull(domain_id), "NULL_domain_id", domain_id)
| eval domain_name=if(isnull(domain_name), "NULL_domain_name", domain_name)
| eval group=if(isnull(group), "NULL_Group", group)
| eval non_tier_zero_principal=if(isnull(non_tier_zero_principal), "NULL_non_tier_zero_principal", non_tier_zero_principal)
| eval path_id=if(isnull(path_id), "NULL_path_id", path_id)
| eval path_title=if(isnull(path_title), "NULL_path_title", path_title)
| eval principal=if(isnull(principal), "NULL_principal", principal)
| eval tier_zero_principal=if(isnull(tier_zero_principal), "NULL_tier_zero_principal", tier_zero_principal)
| eval user=if(isnull(user), "NULL_user", user)
| eval key=sha512(domain_id.domain_name.group.non_tier_zero_principal.path_id.path_title.principal.tier_zero_principal.tier_zero_principal.user)
| table domain_id, domain_name, group, non_tier_zero_principal, path_id, path_title, principla, tier_zero_principal, user, key
Due to the fact that we get repeating events where the only difference is the timestamp, I'm trying to put together a lookup that contains the sha512 key and that will allow an event to be skipped. What I found is I can't have a blank value in the sha512 command. Does anyone have a better way of doing this, then what I have?
TIA,
Joe
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@bowesmana, @gcusello, and @yuanliu thanks for the responses. This has been shelved due to funding issues. If it gets funded, we will go back to the vendor and see if they can add something that will say this is new or timestamp it so we can keep track that way.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I agree with @gcusello that dedup should suffice because dedup really performs on the same principle. But before going into code, you need to define what you are looking for using data illustrations. Without such definition, we could be talking across each other.
So, assuming that you have these raw events
_raw | |
1 | {"timestamp":"2024-08-20 15:33:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
2 | {"timestamp":"2024-08-20 15:32:00.837000","data_type":"finding_export","domain_id":"your_domain_id","domain_name":"your_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
3 | {"timestamp":"2024-08-20 15:31:10.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"your_user"} |
4 | {"timestamp":"2024-08-20 15:31:05.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
5 | {"timestamp":"2024-08-20 15:31:00.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
6 | {"timestamp":"2024-08-20 15:30:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","user":"my_user"} |
7 | {"timestamp":"2024-08-20 15:28:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
Of the seven (7) events, 1 and 7 differ only in timestamp; 4 and 5 differ only in timestamp; 2 through 6 are missing some fields or another. Is it your attention to deduce them to five (5) events like the following?
_raw | |
1 | {"timestamp":"2024-08-20 15:33:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
2 | {"timestamp":"2024-08-20 15:32:00.837000","data_type":"finding_export","domain_id":"your_domain_id","domain_name":"your_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
3 | {"timestamp":"2024-08-20 15:31:10.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"your_user"} |
4 | {"timestamp":"2024-08-20 15:31:05.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} |
5 | {"timestamp":"2024-08-20 15:30:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","user":"my_user"} |
If this is what are you look for, there is no need to perform complicated manipulations and no need for lookup. Just do
index=my_index data_type=my_sourcetype earliest=-15m latest=now
| fillnull value=UNSPEC
| dedup keepempty=true data_type domain_id domain_name path_id path_title user
``` below simply restores null values, not required for dedup ```
| foreach *
[eval <<FIELD>> = if(<<FIELD>> == "UNSPEC", null(), <<FIELD>>)]
Here is an emulation to produce the sample data illustrated above. You can play with it and compare with real data:
| makeresults format=json data="
[{\"timestamp\": \"2024-08-20 15:33:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:32:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"your_domain_id\", \"domain_name\": \"your_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:31:10.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"your_user\"},
{\"timestamp\": \"2024-08-20 15:31:05.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:31:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:30:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:28:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"}
]"
| eval _time = strptime(timestamp, "%F %T.%6N")
``` the above emulates
index=my_index data_type=my_sourcetype earliest=-15m latest=now
```
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

and from a purely SPL point of view, technically you could do any of these to fill the null values.
| foreach domain_id domain_name group non_tier_zero_principal path_id path_title principal tier_zero_principal user [
| fillnull "<<FIELD>>" value="NULL_<<FIELD>>"
]
OR
| foreach domain_id domain_name group non_tier_zero_principal path_id path_title principal tier_zero_principal user [
| eval <<FIELD>>=if(isnull('<<FIELD>>'), "NULL_<<FIELD>>", '<<FIELD>>')
]
OR
| fillnull domain_id domain_name group non_tier_zero_principal path_id path_title principal tier_zero_principal user value="NULL"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


Hi @jwhughes58 ,
instead of using the lookup, why don't you dedup for all fields contained in your events?
or take a portion of _raw (excluding the timestamp) and dedup fot it?
Ciao.
Giuseppe
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Buongiorno Giuseppe,
I see what you are saying, but I don't think that will work. Here is what is in an event.
{"timestamp": "2024-08-20 15:30:00.837000", "data_type": "finding_export", "domain_id": "my_domain_id", "domain_name": "my_domain_name", "path_id": "T0MarkSensitive", "path_title": "My Path Title", "user": "my_user"}
Every 15 minutes the binary goes to the API and pulls events. Most of the events are duplicates except for the timestamp. There may or may not be a new event which needs to be alerted on. The monitoring team doesn't want to see any duplication, thus the lookup to save what has already come through.
Now the issue is that not all the fields have values all the time. When a field has no value the SHA256 command doesn't work. Which is why I asked is there a better way than doing isnull on each field.
Ciao,
Joe
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Do you understand WHY you are getting duplicates from the API?
At what point would you want a 'new' event not to be treated as a duplicate? Forever? Last 60 minutes?
Depending on that, you could make your alert look back at a longer time window and aggregate common events together with first and last timers and then ignore any 'new' events in the window you are interested in that have a count > 1 in the larger window.
