Other Usage

How to join distinct data sources and grouping logically?

Paluri
Explorer

Two different sources returning data in the below format.

Source 1 - Determines the time range for a given date based on the execution of a Job, which logically concludes the End of Day in Application.

Source 2 – Events generated in real time for various use cases in the application. EventID1 is generated as part of the Job in Source1.

 

Source 1

DATE

Start Time

End Time

Day 3

2023-09-12 01:12:12.123

2023-09-13 01:13:13.123

Day 2

2023-09-11 01:11:11.123

2023-09-12 01:12:12.123

Day 1

2023-09-10 01:10:10.123

2023-09-11 01:11:11.123

 

Source 2

Event type

Time

Others

EventID2

2023-09-11 01:20:20.123

 

EventID1

2023-09-11 01:11:11.123

 

EventID9

2023-09-10 01:20:30.123

 

EventID3

2023-09-10 01:20:10.123

 

EventID5

2023-09-10 01:10:20.123

 

EventID1

2023-09-10 01:10:10.123

 

 

There are no common fields available to join the two sources other than the time at which the job is executed and at which the EventID1 is generated.

Expectation is to logically group the events based on Date and derive the stats for each day.

I'm new to Splunk and i would really appreciate if you guys can provide suggestions on how to handle this one.

 

Expected Result

Date

Events

Count

Day 1

EventID1
EventID2
EventID3
- - -
EventID9

1
15
10
- -
8

Day 2

EventID1
EventID2
- - -
EventID9
EventID11

1
2
- -
18
6

             

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Actually it's not so hard to achieve.

Here is another example, where I have added another Day 0 and some event dates inside and outside any day.

| makeresults
| eval _raw="DATE,Start_Time,End_Time
Day_3,2023-09-12 01:12:12.003,2023-09-13 01:13:13.993
Day_2,2023-09-11 01:11:11.002,2023-09-12 01:12:12.992
Day_1,2023-09-10 01:10:10.001,2023-09-11 01:11:11.991
Day_0,2023-09-04 01:12:12.000,2023-09-06 17:22:13.990"
| multikv forceheader=1
| table DATE Start_Time End_Time
| eval _time = strptime(Start_Time, "%F %T.%Q")
| eval end = strptime(End_Time, "%F %T.%Q"), start=_time
| append [ 
  | makeresults
  | eval _raw="Event type,Time,Others
EventID2,2023-09-11 01:20:20.133,  ``` INSIDE DAY 2 ```
EventID1,2023-09-11 01:11:11.132,  ``` INSIDE DAY 2 ```
EventID9,2023-09-10 01:20:30.131,  ``` INSIDE DAY 1 ```
EventID3,2023-09-10 01:20:10.130,  ``` INSIDE DAY 1 ```
EventID5,2023-09-10 01:10:20.129,  ``` INSIDE DAY 1 ```
EventID1,2023-09-10 01:10:10.128,  ``` INSIDE DAY 1 ```
EventID4,2023-09-07 01:10:10.127,  ``` OUTSIDE ANY ```
EventID3,2023-09-06 06:10:10.126,  ``` INSIDE DAY 0 ```
EventID2,2023-09-05 19:10:10.125,  ``` INSIDE DAY 0 ```
EventID1,2023-09-04 04:10:10.124,  ``` INSIDE DAY 0 ```
EventID0,2023-09-04 01:10:10.123," ``` OUTSIDE ANY ```
  | multikv forceheader=1
  | table Event_type Time
  | eval _time = strptime(Time, "%F %T.%Q")
  | eval eventTime=_time
  | fields - Time
]
| sort _time

| filldown DATE start end
| eval eventIsInside=case(isnull(Event_type), "YES", isnotnull(Event_type) AND _time>=start AND _time<=end, "YES", 1==1, "NO")
| where eventIsInside="YES"
| stats values(*_Time) as *_Time list(Event_type) as eventIDs list(eventTime) as eventTimes by DATE
| eval eventTimes=strftime(eventTimes, "%F %T.%Q")
| table DATE Start_Time End_Time eventIDs eventTimes

You can see that this works by making a common time, which is based on either start time or event time and then sorting by time.

Setting start and end epoch times for the source 1 data means you can then 'filldown' those fields to subsequent event (source 2) rows until the next source 1 Day.

Then as each event source 2 now has the preceeding day's start/end time, it can make the comparison for it's own time.

Hope this helps.

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

If you paste this into your search window, you can see it being done with your example dataset

| makeresults
| eval _raw="DATE,Start_Time,End_Time
Day_3,2023-09-12 01:12:12.123,2023-09-13 01:13:13.123
Day_2,2023-09-11 01:11:11.123,2023-09-12 01:12:12.123
Day_1,2023-09-10 01:10:10.123,2023-09-11 01:11:11.123"
| multikv forceheader=1
| table DATE Start_Time End_Time
| eval _time = relative_time(strptime(Start_Time, "%F %T.%Q"), "@d")
| append [ 
  | makeresults
  | eval _raw="Event type,Time,Others
EventID2,2023-09-11 01:20:20.123,
EventID1,2023-09-11 01:11:11.123,
EventID9,2023-09-10 01:20:30.123,
EventID3,2023-09-10 01:20:10.123,
EventID5,2023-09-10 01:10:20.123,
EventID1,2023-09-10 01:10:10.123,"
  | multikv forceheader=1
  | table Event_type Time
  | eval _time = strptime(Time, "%F %T.%Q")
  | fields - Time
]
| bin _time span=1d
| stats list(*) as * count by _time

but the way you should do this is to

search source1 OR search source2
| eval _time = if(event=from_source_1, 
     relative_time(strptime(Start_Time, "%F %T.%Q"), "@d"), 
     strptime(Time, "%F %T.%Q"))
| bin _time span=1d
| stats list(*) as * count by _time

so this will create a _time field for the source 1 events that is the start of the day, it creates a _time field based on source 2 event times and then uses BIN to create a 1 day grouping and then stats list to collect them together. Count will always be one more than the source 2 events.

Note that this 

  • assumes each source 1 event only occurs once on a day
  • assumes that source 2 events will not occur outside the time range of source 1 range

 

0 Karma

Paluri
Explorer

@bowesmana  thanks for your inputs.

source 2 events are not tied to the physical clock and a single day in application could span multiple days in calendar or multiple days in application can fit in a single calendar day time frame.

i'm exploring the option of populating these two sources separately in dashboard and try to pass the source 1 date/time as inputs to source 2 and get the events by each logical date.

Tags (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Actually it's not so hard to achieve.

Here is another example, where I have added another Day 0 and some event dates inside and outside any day.

| makeresults
| eval _raw="DATE,Start_Time,End_Time
Day_3,2023-09-12 01:12:12.003,2023-09-13 01:13:13.993
Day_2,2023-09-11 01:11:11.002,2023-09-12 01:12:12.992
Day_1,2023-09-10 01:10:10.001,2023-09-11 01:11:11.991
Day_0,2023-09-04 01:12:12.000,2023-09-06 17:22:13.990"
| multikv forceheader=1
| table DATE Start_Time End_Time
| eval _time = strptime(Start_Time, "%F %T.%Q")
| eval end = strptime(End_Time, "%F %T.%Q"), start=_time
| append [ 
  | makeresults
  | eval _raw="Event type,Time,Others
EventID2,2023-09-11 01:20:20.133,  ``` INSIDE DAY 2 ```
EventID1,2023-09-11 01:11:11.132,  ``` INSIDE DAY 2 ```
EventID9,2023-09-10 01:20:30.131,  ``` INSIDE DAY 1 ```
EventID3,2023-09-10 01:20:10.130,  ``` INSIDE DAY 1 ```
EventID5,2023-09-10 01:10:20.129,  ``` INSIDE DAY 1 ```
EventID1,2023-09-10 01:10:10.128,  ``` INSIDE DAY 1 ```
EventID4,2023-09-07 01:10:10.127,  ``` OUTSIDE ANY ```
EventID3,2023-09-06 06:10:10.126,  ``` INSIDE DAY 0 ```
EventID2,2023-09-05 19:10:10.125,  ``` INSIDE DAY 0 ```
EventID1,2023-09-04 04:10:10.124,  ``` INSIDE DAY 0 ```
EventID0,2023-09-04 01:10:10.123," ``` OUTSIDE ANY ```
  | multikv forceheader=1
  | table Event_type Time
  | eval _time = strptime(Time, "%F %T.%Q")
  | eval eventTime=_time
  | fields - Time
]
| sort _time

| filldown DATE start end
| eval eventIsInside=case(isnull(Event_type), "YES", isnotnull(Event_type) AND _time>=start AND _time<=end, "YES", 1==1, "NO")
| where eventIsInside="YES"
| stats values(*_Time) as *_Time list(Event_type) as eventIDs list(eventTime) as eventTimes by DATE
| eval eventTimes=strftime(eventTimes, "%F %T.%Q")
| table DATE Start_Time End_Time eventIDs eventTimes

You can see that this works by making a common time, which is based on either start time or event time and then sorting by time.

Setting start and end epoch times for the source 1 data means you can then 'filldown' those fields to subsequent event (source 2) rows until the next source 1 Day.

Then as each event source 2 now has the preceeding day's start/end time, it can make the comparison for it's own time.

Hope this helps.

Paluri
Explorer

Thanks @bowesmana  this is closer to what i'm trying to achieve and has given me some idea on how to work on splunk searches of this complexity.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

the if (event=from_source1... is a test you will have to make using whatever fields you have to indicate the data is a souce1 event (source/sourcetype/index?)

If you need to only take source 2 events that are inside the source1 window or it can span more than one day, you'll have to do it a bit differently

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...