Splunk Search

sequential or execution in join (or simulating join)

jgm1977
Engager

Hi,

I'm in Splunk since August after 20 years working in SQL, a lot of new things and I need help.

I've a daily cron job that is executed writing in a csv. Something like this is an example of a data generate:

index=ans sourcetype="my_source" earliest=1
| stats max(DateLastUpdate) as DateLastUpdate by Ticket
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| table SavedDay Ticket DateLastUpdate

When it was executed in 20201019 (YYYYMMDD format) it wrote

SavedDayTicketDateLastUpdate
20201019A100220201019
20201019AD00X20200920
.........
20201019ADVCX20200520
20201019LDS0X20200415
20201019SD10X20200118

 

in 20201022, it wrote

SavedDayTicketDateLastUpdate
20201022PPU0X20201022
20201022SD10X20201021
20201022XX43820201021
2020102244$5%20201020
20201022A100220201019
20201022AD00X20200920
.........
20201022ADVCX20200520
20201022LDS0X20200415

 

The job failed for 20201021 and 20201022, now I would like to re-execute it to generate the changes, but:

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left SavedDay [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

It is only returning SavedDay for values 20201020 and 20201021 and not the rest of the historical data changes.

Another solution doing a "cartesian" works, but I don't like it because it isn't efficient in time and cost (around a 1Gb for regenerate 2 days).

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left max=0 [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

It is only returning SavedDay for values 20201020 and 20201021 and not the rest of the historical data changes.

Another solution doing a "cartesian" works, but I don't like it because it's inefficient in time and cost (around 8 minute and 1Gb for regenerate 2 days).

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left max=0 [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

Is there a way to do something similar to a loop execution (or sequential execution), "Date_missed >= Date_of_sourcetype" with subsearch or execute python?
Could I execute search or reports passing parameters values?

Thanks!!
Javier

Labels (1)
0 Karma
1 Solution

jgm1977
Engager

I've found the solution using function "map" working as a loop interaction.

| makeresults count=5000
| streamstats count as rownum
| eval SavedDay=strftime(now(),"%Y%m%d")
| eval InitialDay = "20200101"
| eval DiffDay = "-".rownum."d@d"
| eval SavedDay=strftime(relative_time(now(),DiffDay),"%Y%m%d")
| where SavedDay >= InitialDay
| join type=left SavedDay[| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay )
| rename SavedDay 
| where SavedDay >= "20190101"
and SavedDay < strftime(now(),"%Y%m%d")
| table SavedDay
| map search="search index=ans sourcetype=\"my_source\" earliest=1
| eval year = strftime(strptime(CreationDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y\") | where year>=2020
| eval Day=$SavedDay$
| where Day>= strftime(strptime(UpdateDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y%m%d\")
| stats max(UpdateDate) as UpdateDatef by Ticket Day
| join type=left Ticket Day [| inputlookup \"my_lookup.csv\"
| eval IsIn = 1]
| where isnull(IsIn)
.
.
.
| table Ticket Day CreationDate UpdateDate"
| table Ticket Day CreationDate UpdateDate

A little bit annoying because the double-quote and escape, but it works!

 

Javier

View solution in original post

Tags (2)
0 Karma

jgm1977
Engager

I've found the solution using function "map" working as a loop interaction.

| makeresults count=5000
| streamstats count as rownum
| eval SavedDay=strftime(now(),"%Y%m%d")
| eval InitialDay = "20200101"
| eval DiffDay = "-".rownum."d@d"
| eval SavedDay=strftime(relative_time(now(),DiffDay),"%Y%m%d")
| where SavedDay >= InitialDay
| join type=left SavedDay[| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay )
| rename SavedDay 
| where SavedDay >= "20190101"
and SavedDay < strftime(now(),"%Y%m%d")
| table SavedDay
| map search="search index=ans sourcetype=\"my_source\" earliest=1
| eval year = strftime(strptime(CreationDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y\") | where year>=2020
| eval Day=$SavedDay$
| where Day>= strftime(strptime(UpdateDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y%m%d\")
| stats max(UpdateDate) as UpdateDatef by Ticket Day
| join type=left Ticket Day [| inputlookup \"my_lookup.csv\"
| eval IsIn = 1]
| where isnull(IsIn)
.
.
.
| table Ticket Day CreationDate UpdateDate"
| table Ticket Day CreationDate UpdateDate

A little bit annoying because the double-quote and escape, but it works!

 

Javier

Tags (2)
0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...