Different search time in subsearch/join

KellyP · ‎03-25-2024

Hi

I have two sets of data, one is proxy logs (index=netproxy) and the other is an extract of LTE Logs which is logs every time the device joins. I'd like to cross reference the proxy logs with the LTE data so I can extract the IMEI number but the IMEI number could exist in logs outside of the search time window.

The below search works but only if the timeframe is big enough that it includes the device in the proxy logs. Is there a way I can maybe extend the earliest time for 24 hours prior to the search time window? I don't want to do "all time" on the subsearch because the IP Address allocations will change over time and then be matched against the wrong IMEI.

index=netproxymobility sourcetype="zscalernss-web"
| fields transactionsize responsesize requestsize urlcategory serverip ClientIP hostname appname appclass urlclass
type=left ClientIP 
         [ search index=netlte
         | dedup ClientIP
         | fields ClientIP IMEI 
         ]

thanks

gcusello · ‎03-25-2024

Hi @KellyP ,

in the search you shared you forgot the join command, but anyway avoid to use join, and possible forget this command because it's very slow and resource consuming: Splunk isn't a relational DB. t's a search engine.

So you can correlate events in a different way usng stats:

(index=netproxymobility sourcetype="zscalernss-web") OR index=netlte
| stats 
   values(transactionsize) AS transactionsize
   values(responsesize) AS responsesize
   values(requestsize) AS requestsize
   values(urlcategory) AS urlcategory
   values(serverip)serverip
   values(ClientIP) ASClientIP
   values(hostname) AS hostname
   values(appname) AS appname
   values(appclass) AS appclass
   values(urlclass) AS urlclass
   values(IMEI) AS IMEI
   BY ClientIP

if you want onlythe events in both the indexes, you can add an additional clause:

(index=netproxymobility sourcetype="zscalernss-web") OR index=netlte
| stats 
   values(transactionsize) AS transactionsize
   values(responsesize) AS responsesize
   values(requestsize) AS requestsize
   values(urlcategory) AS urlcategory
   values(serverip)serverip
   values(ClientIP) ASClientIP
   values(hostname) AS hostname
   values(appname) AS appname
   values(appclass) AS appclass
   values(urlclass) AS urlclass
   values(IMEI) AS IMEI
   dc(index) AS index_count
   BY ClientIP
| where index_count=2
| fields - index_count

Ciao.

Giuseppe

yuanliu · ‎03-25-2024

There is a recent question about doing this in dashboard. Using a time selector token would be the cleanest.

If you are not doing it in dashboard, but this is a singular use case, I can think of an ugly map, like this

| makeresults
| addinfo
| eval start_24h_earlier = relative_time(info_min_time, "-24h")
| map start_24h_earlier search="search index=netlte earliest=$start_24h_earlier
         | dedup ClientIP
         | fields ClientIP IMEI"
| join ClientIP
    [index=netproxymobility sourcetype="zscalernss-web"
    | fields transactionsize responsesize requestsize urlcategory serverip ClientIP hostname appname appclass urlclass]

Here is a proof of concept:

| makeresults
| addinfo
| eval int_start = relative_time(info_min_time, "-24h"), int_end = relative_time(info_max_time, "-4h")
| map info_min_time info_max_time search="search index=_audit earliest=$int_start$ latest=$int_end$
  | stats min(_time) as in_begin max(_time) as in_end by action"
| join action
    [search index = _audit
    | stats min(_time) as out_begin max(_time) as out_end by action]
| fieldformat in_begin = strftime(in_begin, "%F %T")
| fieldformat in_end = strftime(in_end, "%F %T")
| fieldformat out_begin = strftime(out_begin, "%F %T")
| fieldformat out_end = strftime(out_end, "%F %T")

My output looks like

action	in_begin	in_end	out_begin	out_end
expired_session_token	2024-03-24 23:23:20	2024-03-25 12:25:33	2024-03-25 22:33:51	2024-03-25 23:19:51
login attempt	2024-03-24 22:23:12	2024-03-25 04:25:11	2024-03-25 21:33:26	2024-03-25 21:33:26
quota	2024-03-24 22:23:16	2024-03-25 04:25:17	2024-03-25 21:41:52	2024-03-25 23:21:45
read_session_token	2024-03-24 19:21:02	2024-03-25 19:18:58	2024-03-25 20:17:30	2024-03-25 23:21:49
search	2024-03-24 19:37:09	2024-03-25 11:29:40	2024-03-25 21:15:35	2024-03-25 23:21:05
update	2024-03-24 19:51:57	2024-03-25 09:15:17	2024-03-25 21:13:23	2024-03-25 23:09:53
validate_token	2024-03-24 19:21:02	2024-03-25 19:18:58	2024-03-25 20:17:30	2024-03-25 23:21:49

KendallW · ‎03-25-2024

Would adding "earliest=<24 hours prior to the search time window>" in the subsearch fix this?

KellyP · ‎03-25-2024

that's what I was wondering but not sure how to reference the earliest values from within the join

Different search time in subsearch/join

join

Index This | Forward, I’m heavy; backward, I’m not. What am I?

A Guide To Cloud Migration Success

Join Us for Splunk University and Get Your Bootcamp Game On!