Solved: Re-write DBXquery to summaryindex daily

LearningGuy · ‎04-22-2024

Hello,
I have a static data about 200,000 rows (potentially grow) needs to be moved to a summary index daily.
1) Is it possible to move the data from DBXquery to summary index and re-write the data daily, so there will not be old data with _time after the re-write?

2) Is it possible to use summary index without _time and make it like DBXquery?

The reason I do this is because I want to do data manipulation (split, etc) and move it to another "placeholder" other than CSV or DBXquery, so I can perform correlation with another index.

For example:

| dbxquery   query=" SELECT * from Table_Test"

the scheduled report for summary index will add something like this:
summaryindex spool=t uselb=t addtime=t index="summary" file="test_file" name="test" marker="hostname=\"https://test.com/\",report=\"test\""

Please suggest.
Thank you for your help.

bowesmana · ‎04-23-2024

summaryindex and collect are synonyms - I believe summaryindex is just an alias for the documented collect command.

Your understand is correct re the two searches. (1) happens before (2) and (2) can be done as often as needed in the same day until (1) happens again the following day.

That link is about moving existing CSV contents to KV store. You don't need a CSV to get data to a lookup. You can simply

search data
| outputlookup kv_store_lookup

Note that a KV store lookup is a lookup definition, not a lookup table file. A CSV is a lookup table file, but can also have a definition associated with it (and it's good practice), whereas a KV store lookup definition just requires the definition and an associated collection to be defined.

You can create collections using the Splunk app for lookup editing

https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/DefineaKVStorelookupinSplunkWeb

View solution in original post

LearningGuy · ‎04-22-2024

Hello,
Thank you again for your help.
Just to clarify, I cannot set _time to the exact time every time I query the data, , correct?
So, I need to filter the data last update, if I want to get the most recent copy.

I currently are using CSV as a lookup, but the limitation is the size like you mentioned.

I am trying to replace CSV lookup by doing the following: Please let me know what you think.
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...

| index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany"]
| stats values(*) as * by ip_address

bowesmana · ‎04-22-2024

If the data is in an index, it must be placed there with a timestamp, so if an app was updated 45 days ago that info ingested to Splunk and the Splunk _time timestamp is 45 days ago, the only way you can find that data is to search that data with a time range that encompasses that time.

But of course you don't know when it was updated.

I have used a technique in the past where I roll forward existing index data by running a search at say 1am, that will search for data from yesterday earliest=-d@d and latest=@d and does a stats latest(*) as * by X Y Z

Then use collect to write that data to the index with the current timestamp (1am) so effectively all rolled forward items from the previous day PLUS any new items that are added in the same day and collected to the same index.

Naturally you would need to massage the data so that any updates would then shift previous->discard, current->previous, new->current.

That means your previous day's data is always the latest view of all versions.

Not sure if this helps.

Have you tried using kv store for the lookup - that's another story and you can use some accelerated keys for that data that may make it perform faster than standard lookup.

LearningGuy · ‎04-22-2024

Hi,

I am not sure the purpose of roll forward existing index.
How do I use "collect" to write data in a scheduled report?
My understanding collect is a manual push. I am looking for automatic update daily.

Where does KV store lookup save the data? How do I move the DBXquery to KV store? Does it require "admin"?

What do you think about the "append command" in the previous post?
Thank you so much for your help.

bowesmana · ‎04-22-2024

I guess I'm only seeing half the picture here.

I understand you're trying to make a lookup into a index so the idea of rolling forward data is to make 'yesterday' have the entire dataset you care about regardless of any update dates.

collect is just a Splunk command that you add to the end of your SPL.

Manual or automatic is about whether a search is scheduled or not, nothing to do with what the SPL does.

https://docs.splunk.com/Documentation/SplunkCloud/9.1.2312/SearchReference/collect

If you have a scheduled saved search collect will just collect to the summary index. It is the same as enabling summary indexing on a scheduled saved search, but you have direct control of the parameters.

KV store uses a database in Splunk - it used to be mongodb - not sure if that's still the case. You don't need to care - for all intents and purposes, it's just a lookup, just backed by a database, not a CSV.

https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/DefineaKVStorelookupinSplunkWeb

As for the append - I don't know what you're actually trying to merge together from the vulnerabilities index and what comes from the dbxquery - that's a perfectly valid technique for combining data - but what will you do with that when you have it.

As I said, I've only got half the picture of your whole journey...

LearningGuy · ‎04-23-2024

Hi,
Thank you for your suggestion.
When I use a scheduled report, and see the recent search, Splunk appends "| summaryindex" at the end of the search, not "collect" command. So, I thought "collect" always refers to "manual" push versus "summary index" in a scheduled report.
My understanding you have 2 scheduled reports. Is the following accurate?
1) roll forward existing data (from and to the same summary index - say index A)
2) push new data (from a different index to summary index - say from index X to index A)

In my case, the data from DBXquery is always get re-write, so I only need the latest data, but I may use your method in the future. Thanks for this.

Based on the link that you sent and the following post, it looks like I still need the CSV file. (See below it does inputlookup to CSV first, then outputlookup to KV)
Is this correct? My goal is to avoid having CSV file since there's a limit in size
https://community.splunk.com/t5/Getting-Data-In/How-to-transfer-existing-CSV-data-to-kvstore/m-p/144...

| inputlookup filename.csv | outputlookup lookup_name

Thank you again.

bowesmana · ‎04-23-2024

summaryindex and collect are synonyms - I believe summaryindex is just an alias for the documented collect command.

Your understand is correct re the two searches. (1) happens before (2) and (2) can be done as often as needed in the same day until (1) happens again the following day.

That link is about moving existing CSV contents to KV store. You don't need a CSV to get data to a lookup. You can simply

search data
| outputlookup kv_store_lookup

Note that a KV store lookup is a lookup definition, not a lookup table file. A CSV is a lookup table file, but can also have a definition associated with it (and it's good practice), whereas a KV store lookup definition just requires the definition and an associated collection to be defined.

You can create collections using the Splunk app for lookup editing

https://docs.splunk.com/Documentation/SplunkCloud/latest/Knowledge/DefineaKVStorelookupinSplunkWeb

LearningGuy · ‎04-24-2024

Thanks for your suggestion.
I read your the link you provided.
So, I can't outputlookup data to KVStore without building KVcollection first, correct?
Should I create transform.conf and collection.conf?
I don't have admin right.

search data
| outputlookup kv_store_lookup

https://docs.splunk.com/Documentation/Splunk/9.2.1/Knowledge/ConfigureKVstorelookups
https://docs.splunk.com/Documentation/SplunkCloud/9.1.2312/SearchReference/Outputlookup

bowesmana · ‎04-25-2024

Unfortunately adding KV stores does require a level of privileges - I believe you need admin_all_objects.

You do have to create the collections and transforms conf files - I suspect you will need to run this past the admin, as they will most likely have to create it for you - the Splunk app for lookup file editing does allow you to create KV store definitions, both collections and transforms, but you will need those privileges.

There mey be some environmental issues around using the KV store in that admins would like to know about 😞

bowesmana · ‎04-22-2024

Splunk is a time series database, so you cannot have data without _time, unless you store that data in a lookup.

If you are using DBXQuery to fetch data and store it in a summary index it must have _time.

However, you can always set _time to be the time you query the data, so if you do this daily, then the last 24 hours _time data will be the most recent copy of the DBXQuery data.

By configuring the index retention period, you can control how long the data will exist for in the summary.

Alternatively you can write the data to a lookup (outputlook command) and this will overwrite any existing data in the lookup, so you only ever have the latest copy.

Note that there are some size constraints for lookup that affect how they behave, but this could be an option.

I am not sure I understand your example of correlating with another index - you cannot use dbxquery on a Splunk index.

Re-write DBXquery to summaryindex daily

indexing performance

other

search head

search performance

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

Splunk and Fraud

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...