Hi, I have a table as the main search using dbxquery below:
| dbxquery connection=my_connection query="SELECT id, start_date, end_date FROM my_table"
Sample records:
id, start_date, end_date
1, 2020-01-01, 2020-01-04
2, 2020-01-03, 2020-01-05
......
And I have another lookup csv with only two columns below:
date, amount
2020-01-01, 10
2020-01-02, 20
2020-01-03, 10
2020-01-04, 10
2020-01-05, 20
......
The output I want is:
id, start_date, end_date, total
1, 2020-01-01, 2020-01-04, 50 # total sum of 2020-01-01 to 2020-01-04 (10+20+10+10)
2, 2020-01-03, 2020-01-05, 40 # total sum of 2020-01-03 to 2020-01-05 (10+10+20)
What could be the best way to get this done? Thanks in advance!
| dbxquery connection=my_connection query="SELECT id, start_date, end_date FROM my_table"
| eval row=mvrange(0,2)
| mvexpand row
| eval date=if(row=0, start_date, end_date)
| append
[| inputlookup lookup.csv]
| sort 0 date
| streamstats sum(amount) as running
| eventstats max(amount) as amount max(running) as running by date
| eventstats range(running) as increase by id
| where row=0
| eval total=amount+increase
| table id start_date end_date total
Thanks a lot! Below is my solution, but there are some minor differences of the total. not sure which part might be wrong.
| dbxquery connection=my_connection query="SELECT * FROM my_table"
| join max=0
[| inputlookup lookup.csv | fields *]
| where date >= start_date and date <= end_date
| stats sum(amount) as total by id, start_date, end_date
What field is your join joining on? In your original question, you did not have any common fields between the fields returned by the dbxquery and the lookup file.
Assuming you have got a common field, the join will join one event from the lookup file so you would be missing any data from the lookup file which did not have a corresponding value in the dbxquery.
Unless I am missing something, I don't see how your query satisfies your requirement.
There is no common field between these two queries. The first one (dbxquery) has start_date/end_date, which are used to check the date range only in the second query (lookup.csv) with its date field.
My solution is try to get the cartesian product first (max=0 for this purpose), for each row in the first query, it should return something below:
id, start_date, end_date, date, amount
1, 2020-01-01, 2020-01-04, 2020-01-01, 10
1, 2020-01-01, 2020-01-04, 2020-01-02, 20
1, 2020-01-01, 2020-01-04, 2020-01-03, 10
1, 2020-01-01, 2020-01-04, 2020-01-04, 10
1, 2020-01-01, 2020-01-04, 2020-01-05, 20
......
Then use the date comparison check: date >= start_date AND date <= stop_date to filter out those unmatched rows in the lookup.csv, after that we get below for the first row in the first query:
1, 2020-01-01, 2020-01-04, 2020-01-01, 10
1, 2020-01-01, 2020-01-04, 2020-01-02, 20
1, 2020-01-01, 2020-01-04, 2020-01-03, 10
1, 2020-01-01, 2020-01-04, 2020-01-04, 10
Aggregate the above rows by sum, we get:
1, 2020-01-01, 2020-01-04, 50
Do you see anything not right? Thanks again!
As I said before, join needs a common field between the two data sources, you don't appear to have one, or at least, not the way you have described your data and your search.
I searched and saw someone mentioned Splunk could do cartesian product in this way. I'll do more research, thanks for your solution again.