I agree with @gcusello that dedup should suffice because dedup really performs on the same principle. But before going into code, you need to define what you are looking for using data illustrations...
See more...
I agree with @gcusello that dedup should suffice because dedup really performs on the same principle. But before going into code, you need to define what you are looking for using data illustrations. Without such definition, we could be talking across each other. So, assuming that you have these raw events _raw 1 {"timestamp":"2024-08-20 15:33:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 2 {"timestamp":"2024-08-20 15:32:00.837000","data_type":"finding_export","domain_id":"your_domain_id","domain_name":"your_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 3 {"timestamp":"2024-08-20 15:31:10.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"your_user"} 4 {"timestamp":"2024-08-20 15:31:05.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 5 {"timestamp":"2024-08-20 15:31:00.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 6 {"timestamp":"2024-08-20 15:30:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","user":"my_user"} 7 {"timestamp":"2024-08-20 15:28:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} Of the seven (7) events, 1 and 7 differ only in timestamp; 4 and 5 differ only in timestamp; 2 through 6 are missing some fields or another. Is it your attention to deduce them to five (5) events like the following? _raw 1 {"timestamp":"2024-08-20 15:33:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 2 {"timestamp":"2024-08-20 15:32:00.837000","data_type":"finding_export","domain_id":"your_domain_id","domain_name":"your_domain_name","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 3 {"timestamp":"2024-08-20 15:31:10.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"your_user"} 4 {"timestamp":"2024-08-20 15:31:05.837000","data_type":"finding_export","domain_id":"my_domain_id","path_id":"T0MarkSensitive","path_title":"My Path Title","user":"my_user"} 5 {"timestamp":"2024-08-20 15:30:00.837000","data_type":"finding_export","domain_id":"my_domain_id","domain_name":"my_domain_name","user":"my_user"} If this is what are you look for, there is no need to perform complicated manipulations and no need for lookup. Just do index=my_index data_type=my_sourcetype earliest=-15m latest=now
| fillnull value=UNSPEC
| dedup keepempty=true data_type domain_id domain_name path_id path_title user
``` below simply restores null values, not required for dedup ```
| foreach *
[eval <<FIELD>> = if(<<FIELD>> == "UNSPEC", null(), <<FIELD>>)] Here is an emulation to produce the sample data illustrated above. You can play with it and compare with real data: | makeresults format=json data="
[{\"timestamp\": \"2024-08-20 15:33:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:32:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"your_domain_id\", \"domain_name\": \"your_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:31:10.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"your_user\"},
{\"timestamp\": \"2024-08-20 15:31:05.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:31:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:30:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"user\": \"my_user\"},
{\"timestamp\": \"2024-08-20 15:28:00.837000\", \"data_type\": \"finding_export\", \"domain_id\": \"my_domain_id\", \"domain_name\": \"my_domain_name\", \"path_id\": \"T0MarkSensitive\", \"path_title\": \"My Path Title\", \"user\": \"my_user\"}
]"
| eval _time = strptime(timestamp, "%F %T.%6N")
``` the above emulates
index=my_index data_type=my_sourcetype earliest=-15m latest=now
```