Splunk Search

sequential or execution in join (or simulating join)

jgm1977
Engager

Hi,

I'm in Splunk since August after 20 years working in SQL, a lot of new things and I need help.

I've a daily cron job that is executed writing in a csv. Something like this is an example of a data generate:

index=ans sourcetype="my_source" earliest=1
| stats max(DateLastUpdate) as DateLastUpdate by Ticket
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| table SavedDay Ticket DateLastUpdate

When it was executed in 20201019 (YYYYMMDD format) it wrote

SavedDayTicketDateLastUpdate
20201019A100220201019
20201019AD00X20200920
.........
20201019ADVCX20200520
20201019LDS0X20200415
20201019SD10X20200118

 

in 20201022, it wrote

SavedDayTicketDateLastUpdate
20201022PPU0X20201022
20201022SD10X20201021
20201022XX43820201021
2020102244$5%20201020
20201022A100220201019
20201022AD00X20200920
.........
20201022ADVCX20200520
20201022LDS0X20200415

 

The job failed for 20201021 and 20201022, now I would like to re-execute it to generate the changes, but:

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left SavedDay [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

It is only returning SavedDay for values 20201020 and 20201021 and not the rest of the historical data changes.

Another solution doing a "cartesian" works, but I don't like it because it isn't efficient in time and cost (around a 1Gb for regenerate 2 days).

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left max=0 [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

It is only returning SavedDay for values 20201020 and 20201021 and not the rest of the historical data changes.

Another solution doing a "cartesian" works, but I don't like it because it's inefficient in time and cost (around 8 minute and 1Gb for regenerate 2 days).

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left max=0 [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

Is there a way to do something similar to a loop execution (or sequential execution), "Date_missed >= Date_of_sourcetype" with subsearch or execute python?
Could I execute search or reports passing parameters values?

Thanks!!
Javier

Labels (2)
0 Karma
1 Solution

jgm1977
Engager

I've found the solution using function "map" working as a loop interaction.

| makeresults count=5000
| streamstats count as rownum
| eval SavedDay=strftime(now(),"%Y%m%d")
| eval InitialDay = "20200101"
| eval DiffDay = "-".rownum."d@d"
| eval SavedDay=strftime(relative_time(now(),DiffDay),"%Y%m%d")
| where SavedDay >= InitialDay
| join type=left SavedDay[| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay )
| rename SavedDay 
| where SavedDay >= "20190101"
and SavedDay < strftime(now(),"%Y%m%d")
| table SavedDay
| map search="search index=ans sourcetype=\"my_source\" earliest=1
| eval year = strftime(strptime(CreationDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y\") | where year>=2020
| eval Day=$SavedDay$
| where Day>= strftime(strptime(UpdateDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y%m%d\")
| stats max(UpdateDate) as UpdateDatef by Ticket Day
| join type=left Ticket Day [| inputlookup \"my_lookup.csv\"
| eval IsIn = 1]
| where isnull(IsIn)
.
.
.
| table Ticket Day CreationDate UpdateDate"
| table Ticket Day CreationDate UpdateDate

A little bit annoying because the double-quote and escape, but it works!

 

Javier

View solution in original post

Tags (2)
0 Karma

jgm1977
Engager

I've found the solution using function "map" working as a loop interaction.

| makeresults count=5000
| streamstats count as rownum
| eval SavedDay=strftime(now(),"%Y%m%d")
| eval InitialDay = "20200101"
| eval DiffDay = "-".rownum."d@d"
| eval SavedDay=strftime(relative_time(now(),DiffDay),"%Y%m%d")
| where SavedDay >= InitialDay
| join type=left SavedDay[| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay )
| rename SavedDay 
| where SavedDay >= "20190101"
and SavedDay < strftime(now(),"%Y%m%d")
| table SavedDay
| map search="search index=ans sourcetype=\"my_source\" earliest=1
| eval year = strftime(strptime(CreationDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y\") | where year>=2020
| eval Day=$SavedDay$
| where Day>= strftime(strptime(UpdateDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y%m%d\")
| stats max(UpdateDate) as UpdateDatef by Ticket Day
| join type=left Ticket Day [| inputlookup \"my_lookup.csv\"
| eval IsIn = 1]
| where isnull(IsIn)
.
.
.
| table Ticket Day CreationDate UpdateDate"
| table Ticket Day CreationDate UpdateDate

A little bit annoying because the double-quote and escape, but it works!

 

Javier

Tags (2)
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...