Splunk Search

sequential or execution in join (or simulating join)

jgm1977
Engager

Hi,

I'm in Splunk since August after 20 years working in SQL, a lot of new things and I need help.

I've a daily cron job that is executed writing in a csv. Something like this is an example of a data generate:

index=ans sourcetype="my_source" earliest=1
| stats max(DateLastUpdate) as DateLastUpdate by Ticket
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| table SavedDay Ticket DateLastUpdate

When it was executed in 20201019 (YYYYMMDD format) it wrote

SavedDayTicketDateLastUpdate
20201019A100220201019
20201019AD00X20200920
.........
20201019ADVCX20200520
20201019LDS0X20200415
20201019SD10X20200118

 

in 20201022, it wrote

SavedDayTicketDateLastUpdate
20201022PPU0X20201022
20201022SD10X20201021
20201022XX43820201021
2020102244$5%20201020
20201022A100220201019
20201022AD00X20200920
.........
20201022ADVCX20200520
20201022LDS0X20200415

 

The job failed for 20201021 and 20201022, now I would like to re-execute it to generate the changes, but:

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left SavedDay [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

It is only returning SavedDay for values 20201020 and 20201021 and not the rest of the historical data changes.

Another solution doing a "cartesian" works, but I don't like it because it isn't efficient in time and cost (around a 1Gb for regenerate 2 days).

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left max=0 [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

It is only returning SavedDay for values 20201020 and 20201021 and not the rest of the historical data changes.

Another solution doing a "cartesian" works, but I don't like it because it's inefficient in time and cost (around 8 minute and 1Gb for regenerate 2 days).

index=ans sourcetype="my_source" earliest=1
| eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
| dedup SavedDay
| table SavedDay
| join type=left max=0 [| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay)| table SavedDay
| join type=inner SavedDay [search index=ans sourcetype="my_source" earliest=1
                           | eval SavedDay=strftime(relative_time(now(),"-1d@d"),"%Y%m%d")
                           | ...]

Is there a way to do something similar to a loop execution (or sequential execution), "Date_missed >= Date_of_sourcetype" with subsearch or execute python?
Could I execute search or reports passing parameters values?

Thanks!!
Javier

Labels (1)
0 Karma
1 Solution

jgm1977
Engager

I've found the solution using function "map" working as a loop interaction.

| makeresults count=5000
| streamstats count as rownum
| eval SavedDay=strftime(now(),"%Y%m%d")
| eval InitialDay = "20200101"
| eval DiffDay = "-".rownum."d@d"
| eval SavedDay=strftime(relative_time(now(),DiffDay),"%Y%m%d")
| where SavedDay >= InitialDay
| join type=left SavedDay[| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay )
| rename SavedDay 
| where SavedDay >= "20190101"
and SavedDay < strftime(now(),"%Y%m%d")
| table SavedDay
| map search="search index=ans sourcetype=\"my_source\" earliest=1
| eval year = strftime(strptime(CreationDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y\") | where year>=2020
| eval Day=$SavedDay$
| where Day>= strftime(strptime(UpdateDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y%m%d\")
| stats max(UpdateDate) as UpdateDatef by Ticket Day
| join type=left Ticket Day [| inputlookup \"my_lookup.csv\"
| eval IsIn = 1]
| where isnull(IsIn)
.
.
.
| table Ticket Day CreationDate UpdateDate"
| table Ticket Day CreationDate UpdateDate

A little bit annoying because the double-quote and escape, but it works!

 

Javier

View solution in original post

Tags (2)
0 Karma

jgm1977
Engager

I've found the solution using function "map" working as a loop interaction.

| makeresults count=5000
| streamstats count as rownum
| eval SavedDay=strftime(now(),"%Y%m%d")
| eval InitialDay = "20200101"
| eval DiffDay = "-".rownum."d@d"
| eval SavedDay=strftime(relative_time(now(),DiffDay),"%Y%m%d")
| where SavedDay >= InitialDay
| join type=left SavedDay[| inputlookup "my_lookup.csv"
                           | dedup SavedDay
                           | eval ExistsDay = 1]
| where isnull(ExistsDay )
| rename SavedDay 
| where SavedDay >= "20190101"
and SavedDay < strftime(now(),"%Y%m%d")
| table SavedDay
| map search="search index=ans sourcetype=\"my_source\" earliest=1
| eval year = strftime(strptime(CreationDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y\") | where year>=2020
| eval Day=$SavedDay$
| where Day>= strftime(strptime(UpdateDate,\"%Y-%m-%d %H:%M:%S\"),\"%Y%m%d\")
| stats max(UpdateDate) as UpdateDatef by Ticket Day
| join type=left Ticket Day [| inputlookup \"my_lookup.csv\"
| eval IsIn = 1]
| where isnull(IsIn)
.
.
.
| table Ticket Day CreationDate UpdateDate"
| table Ticket Day CreationDate UpdateDate

A little bit annoying because the double-quote and escape, but it works!

 

Javier

Tags (2)
0 Karma
Get Updates on the Splunk Community!

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

From Idea to Implementation: Why Splunk Built mTLS into Splunk Enterprise 10.0  mTLS wasn’t just a checkbox ...