Splunk Search

Handling nulls in a string

jwhughes58
Contributor

I've got this search

index=my_index data_type=my_sourcetype earliest=-15m latest=now
| eval domain_id=if(isnull(domain_id), "NULL_domain_id", domain_id) 
| eval domain_name=if(isnull(domain_name), "NULL_domain_name", domain_name) 
| eval group=if(isnull(group), "NULL_Group", group) 
| eval non_tier_zero_principal=if(isnull(non_tier_zero_principal), "NULL_non_tier_zero_principal", non_tier_zero_principal) 
| eval path_id=if(isnull(path_id), "NULL_path_id", path_id) 
| eval path_title=if(isnull(path_title), "NULL_path_title", path_title) 
| eval principal=if(isnull(principal), "NULL_principal", principal) 
| eval tier_zero_principal=if(isnull(tier_zero_principal), "NULL_tier_zero_principal", tier_zero_principal) 
| eval user=if(isnull(user), "NULL_user", user) 
| eval key=sha512(domain_id.domain_name.group.non_tier_zero_principal.path_id.path_title.principal.tier_zero_principal.tier_zero_principal.user) 
| table domain_id, domain_name, group, non_tier_zero_principal, path_id, path_title, principla, tier_zero_principal, user, key

Due to the fact that we get repeating events where the only difference is the timestamp, I'm trying to put together a lookup that contains the sha512 key and that will allow an event to be skipped.  What I found is I can't have a blank value in the sha512 command.  Does anyone have a better way of doing this, then what I have?

TIA,

Joe

Tags (2)
0 Karma

jwhughes58
Contributor

@bowesmana, @gcusello, and @yuanliu thanks for the responses.  This has been shelved due to funding issues.  If it gets funded, we will go back to the vendor and see if they can add something that will say this is new or timestamp it so we can keep track that way.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I agree with @gcusello that dedup should suffice because dedup really performs on the same principle.  But before going into code, you need to define what you are looking for using data illustrations.  Without such definition, we could be talking across each other.

So, assuming that you have these raw events

 _raw
1{"timestamp":"2024-08-20 15:33:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
2{"timestamp":"2024-08-20 15:32:00.837000","data_type":"finding_export","domain_id":"your_domain_id","domain_name":"your_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
3{"timestamp":"2024-08-20 15:31:10.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"your_user"}
4{"timestamp":"2024-08-20 15:31:05.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
5{"timestamp":"2024-08-20 15:31:00.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
6{"timestamp":"2024-08-20 15:30:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","user":"my_user"}
7{"timestamp":"2024-08-20 15:28:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}

Of the seven (7) events, 1 and 7 differ only in timestamp; 4 and 5 differ only in timestamp; 2 through 6 are missing some fields or another.  Is it your attention to deduce them to five (5) events like the following?

 _raw
1{"timestamp":"2024-08-20 15:33:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
2{"timestamp":"2024-08-20 15:32:00.837000","data_type":"finding_export","domain_id":"your_domain_id","domain_name":"your_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
3{"timestamp":"2024-08-20 15:31:10.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"your_user"}
4{"timestamp":"2024-08-20 15:31:05.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"}
5{"timestamp":"2024-08-20 15:30:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","user":"my_user"}

If this is what are you look for, there is no need to perform complicated manipulations and no need for lookup.  Just do

 

index=my_index data_type=my_sourcetype earliest=-15m latest=now
| fillnull value=UNSPEC
| dedup keepempty=true data_type domain_id domain_name path_id path_title user
``` below simply restores null values, not required for dedup ```
| foreach *
    [eval <<FIELD>> = if(<<FIELD>> == "UNSPEC", null(), <<FIELD>>)]

 

Here is an emulation to produce the sample data illustrated above.  You can play with it and compare with real data:

 

| makeresults format=json data="
        [{\"timestamp\": \"2024-08-20 15:33:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
        {\"timestamp\": \"2024-08-20 15:32:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"your_domain_id\", \"domain_name\": \"your_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
        {\"timestamp\": \"2024-08-20 15:31:10.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"your_user\"},
        {\"timestamp\": \"2024-08-20 15:31:05.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
        {\"timestamp\": \"2024-08-20 15:31:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
        {\"timestamp\": \"2024-08-20 15:30:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"user\": \"my_user\"},
        {\"timestamp\": \"2024-08-20 15:28:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"}
        ]"
| eval _time = strptime(timestamp, "%F %T.%6N")
``` the above emulates
index=my_index data_type=my_sourcetype earliest=-15m latest=now
```

 

Tags (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

and from a purely SPL point of view, technically you could do any of these to fill the null values.

| foreach domain_id domain_name group non_tier_zero_principal path_id path_title principal tier_zero_principal user [
   | fillnull "<<FIELD>>" value="NULL_<<FIELD>>"
]

OR 

| foreach domain_id domain_name group non_tier_zero_principal path_id path_title principal tier_zero_principal user [
   | eval <<FIELD>>=if(isnull('<<FIELD>>'), "NULL_<<FIELD>>", '<<FIELD>>')
]

OR

| fillnull domain_id domain_name group non_tier_zero_principal path_id path_title principal tier_zero_principal user value="NULL"

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jwhughes58 ,

instead of using the lookup, why don't you dedup for all fields contained in your events?

or take a portion of _raw (excluding the timestamp) and dedup fot it?

Ciao.

Giuseppe

0 Karma

jwhughes58
Contributor

Buongiorno Giuseppe,

I see what you are saying, but I don't think that will work.  Here is what is in an event.

 

{"timestamp": "2024-08-20 15:30:00.837000", "data_type": "finding_export", "domain_id": "my_domain_id", "domain_name": "my_domain_name", "path_id": "T0MarkSensitive", "path_title": "My Path Title", "user": "my_user"}

 

Every 15 minutes the binary goes to the API and pulls events.  Most of the events are duplicates except for the timestamp.  There may or may not be a new event which needs to be alerted on.  The monitoring team doesn't want to see any duplication, thus the lookup to save what has already come through.

Now the issue is that not all the fields have values all the time.  When a field has no value the SHA256 command doesn't work.  Which is why I asked is there a better way than doing isnull on each field.

Ciao,

Joe

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Do you understand WHY you are getting duplicates from the API?

At what point would you want a 'new' event not to be treated as a duplicate? Forever? Last 60 minutes?

Depending on that, you could make your alert look back at a longer time window and aggregate common events together with first and last timers and then ignore any 'new' events in the window you are interested in that have a count > 1 in the larger window.

 

0 Karma
Get Updates on the Splunk Community!

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...

AppDynamics is now part of Splunk Ideas

Hello Splunkers, We have exciting news for you! AppDynamics has been added to the Splunk Ideas Portal. Which ...

Advanced Splunk Data Management Strategies

Join us on Wednesday, May 14, 2025, at 11 AM PDT / 2 PM EDT for an exclusive Tech Talk that delves into ...