Solved: How to search each value of a lookup table and upd...

Taruchit · ‎09-08-2022

Hi All,

I have a lookup table table1.csv with following fields: -

index
sourcetype
host
last_seen

I have a custom index: idx1 which has following fields: -
orig_index
orig_sourcetype
orig_host

I need to search each host value from lookup table in the custom index and fetch the max(_time) and then store that value against the same host in last_seen.

I tried the below SPL to build the SPL, but it is not fetching any results: -

|inputlookup table1.csv 
|eval index=lower(index)
|eval host=lower(host)
|eval sourcetype=lower(sourcetype)
|table index, host, sourcetype
|rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype
|format
|eval searchq=search
|eval searchq="index=\"idx1\"".searchq."|stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host"
|search searchq

However, when I used

|fields searchq

It gave a proper SPL as the result: -

index="idx1" (orig_host="1.1.1.1" AND orig_index="xxx" AND orig_sourcetype="sourcetype1") OR (orig_host="1.1.1.2" AND orig_index="xxx" AND orig_sourcetype="sourcetype2"))|stats max(_time) AS last_time BY orig_index, orig_sourcetype, orig_host

And when I run the above resulting SPL as separate search, I get the proper results.

Thus, please share if there is a way to correct the above approach or if some different approach can help to build the solution.

Thank you

JacekF · ‎09-08-2022

Will bellow query meet your requirements?

index="idx1" 
    [| inputlookup table1.csv 
    | eval index=lower(index) 
    | eval host=lower(host) 
    | eval sourcetype=lower(sourcetype) 
    | table index, host, sourcetype 
    | rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype] 
| stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host
| rename orig_index as index orig_sourcetype as sourcetype orig_host as host
| append [ | inputlookup table1.csv ]
| dedup index sourcetype host 
| outputlookup table1.csv

Please note that if you have many rows in your lookup file this solution will be time and resource consuming, as it will need to load the entire lookup twice.

View solution in original post

JacekF · ‎09-08-2022

Will bellow query meet your requirements?

index="idx1" 
    [| inputlookup table1.csv 
    | eval index=lower(index) 
    | eval host=lower(host) 
    | eval sourcetype=lower(sourcetype) 
    | table index, host, sourcetype 
    | rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype] 
| stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host
| rename orig_index as index orig_sourcetype as sourcetype orig_host as host
| append [ | inputlookup table1.csv ]
| dedup index sourcetype host 
| outputlookup table1.csv

Please note that if you have many rows in your lookup file this solution will be time and resource consuming, as it will need to load the entire lookup twice.

Taruchit · ‎09-09-2022

Hi @JacekF,

Thank you for your inputs and sharing the code. The SPL does meet the requirement, however, I observed some issues for which I will need your guidance.

I executed the code you shared and only removed the last line having outputlookup command, so that I can test and see results before data gets stored in the table. Also, in order to confirm if only the hosts in the lookup table are getting updated or not, I used a search command to filter couple of hosts.

index="idx1" 
    [| inputlookup table1.csv 
     | search host="10.1.1.1" OR host="10.1.1.2"
    | eval index=lower(index) 
    | eval host=lower(host) 
    | eval sourcetype=lower(sourcetype) 
    | table index, host, sourcetype 
    | rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype] 
| stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host
| rename orig_index as index orig_sourcetype as sourcetype orig_host as host
| append [ | inputlookup table1.csv ]
| dedup index sourcetype host

And I see that in the result the last_seen value for the two hosts is changed to current date time, and remaining have the old values (which is an expected result).

However, I noted that the SPL gave 50,000 records [Statistics (50,000)] as the result.

But, when I read the lookup table in separate search: -

|inputlookup table1.csv

The above command gives 50,653 records.

Thus, I want your help to know why there is mismatch in number of records between the final result and the inputlookup command; and how do we resolve that issue.

Thank you

JacekF · ‎09-09-2022

Most likely you hit the stats command limit. This can be changed in limits.conf file. You may find this thread useful: Solved: Why do I receive "Limit (50000 results) reached." ... - Splunk Community

Taruchit · ‎09-09-2022

Hi @JacekF,

Thank you for inputs.

|inputlookup table1.csv

The above SPL gives 50,653 records.

|inputlookup table1.csv
|stats count BY index, sourcetype, host

|inputlookup table1.csv
|stats values(last_seen) BY index, sourcetype, host

The above two SPLs also gives 50,653 records.

Thus, if there was a restriction of 50,000 for stats command, I think in the second SPL, I should have got lesser results.

Please share if my understanding is correct or if any correction is needed.

Thank you

JacekF · ‎09-11-2022

You are reaching one of the Splunk limits defined in limits.conf. This is why you receive the total number of 50000 records. I would suggest that you mark this thread as resolved, and open a new one regarding the limited number of rows in the result. This will, from one hand, allow others with the same problems as initial one in this thread, to find a solution, and from the other hand you will more likely find the answer to the limited results problem, as more people will take a look at it.

gcusello · ‎09-08-2022

Hi @Taruchit,

at first I see that you have a fields (searchq) that isn't present in your stats command and, after a tats command you have only the fields in the stats.

In addition you use as kay orig_index, orig_sourcetype and orig_host that you renamed, so they didn't exist more.

Then in the searchq construction, you have to add a space between the first string and the content of search field.

At least, in the lookup you probably have many records,, do you want to execute all of them?

I don't like this approach and I'm not sure that it can run, so anyway, try something like this:

| inputlookup table1.csv 
| eval 
   orig_index=lower(index), 
   orig_host=lower(host), 
   orig_sourcetype=lower(sourcetype)
| eval searchq="index=idx1"." orig_index=".lower(index)." orig_sourcetype=".orig_sourcetype." orig_host=".orig_host
| search searchq

Ciao.

Giuseppe

Taruchit · ‎09-09-2022

Hi @gcusello,

Thank you for sharing your inputs and the SPL.

I tried to execute the below code but it does not fetch me any results.

| inputlookup table1.csv 
| eval 
   orig_index=lower(index), 
   orig_host=lower(host), 
   orig_sourcetype=lower(sourcetype)
| eval searchq="index=idx1"." orig_index=".lower(index)." orig_sourcetype=".orig_sourcetype." orig_host=".orig_host
| search searchq

In order to check the SPL that got formed and stored in the field: searchq, I used the below code: -

| inputlookup table1.csv 
| eval 
   orig_index=lower(index), 
   orig_host=lower(host), 
   orig_sourcetype=lower(sourcetype)
| eval searchq="index=idx1"." orig_index=".lower(index)." orig_sourcetype=".orig_sourcetype." orig_host=".orig_host." |stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host |rename orig_* AS *"
| fields searchq

The above code gave line by line SPL for each host and its corresponding sourcetype and index.

Thus, can you please share how to execute each SPL from the result and store the results in lookup table?

Thank you

gcusello · ‎09-09-2022

Hi @Taruchit,

you search cannot run because after a stats command you have only the fields in stats, in your case searchq isn't one of them.

To use the last row, you have to add searchq to the stats command.

Ciao.

Giuseppe

yuanliu · ‎09-08-2022

Let me first clarify the requirements: You want to rebuild table1.csv without having to search the whole history. For example, if your new search outputs only one host, you want a command to update only that host.

Because lookup table is not a real database, there is no easy way to do this. The problem with your approach is that your | search searchq cannot operate on any indexed data. The preceding generating command is | inputlookup, which only outputs data from table1.csv.

Potentially you can use join or append and some complicated manipulation to achieve what you needed, but it may not be cheaper than simply rebuild the lookup.

To reduce the cost of searching the entire history, consider using tstats. For example,

| tstats max(_time) as last_seen where earliest=0 by index sourcetype host

Taruchit · ‎09-08-2022

Hi @yuanliu,

Thank you for your response and sharing your inputs.

I have following points based on your message: -

1. Clarifying the requirement: -
I have a lookup table: table1.csv which has list of all hosts along with respective index, sourcetype and last_seen (in terms of epoch/unix format).

I also have a summary index (custom index): idx1, which has fields such as: - orig_index, orig_sourcetype, orig_host.

I need to check each host in lookup table and search it for respective orig_index and orig_sourcetype in the summary index, fetch max(_time) value and store it in the lookup table against that host.

Thus, the end goal is to have the lookup table updated with last time when host was being reported in the summary index. And the SPL that will carry out that operation will be scheduled to compute results at definite intervals.

2. Thus, I will not be able to use |tstats command in the SPL, because I am using a custom index to fetch the results.

Please share if you need any more details from my end.

Additionally, do we have any way of invoking the value of field "searchq" and use it as an SPL to fetch the results?

Thank you

yuanliu · ‎09-08-2022

@Taruchit Thanks for the explanation. The key question remains whether this summary index idx1 contains complete records or just partial records. If it contains complete records, it is much easier to rebuild table1.csv than trying to selectively update rows. The latter requires random record access, which CSV doesn't support outright; or rather, outputlookup doesn't support. (I assume that your goal is to use outputlookup to update table.)

Assuming that idx1 contains complete records, you can still use stats or tstats agains this index to build table1.csv, e.g.,

| tstats max(_time) as last_seen where index=idx1 earliest=0 by orig_index orig_sourcetype orig_host
| rename origin_* as *
| outputlookup table1.csv

Alternatively, you can exclude idx1 in tstats so it searches only original indices, e.g,

| tstats max(_time) as last_seen where index!=idx1 earliest=0 by index sourcetype host
| outputlookup table1.csv

Hope this helps.

Taruchit · ‎09-09-2022

Hi @yuanliu,

Thank you for sharing your inputs.

In my scenario, the summary index: - idx1, has data for 100s of actual indexes. But, I need to have the lookup table updated with latest timestamp only for those hosts which are already stored in it.

The list of index values in lookup table is a subset of list of index values in the summary index.

The data in lookup table is not governed by me, thus, I do not have visibility towards the list of index values for which the hosts are stored in the table.

As the result, I will need to build a dynamic solution which takes index value, its corresponding sourcetype and host from the lookup table and then fetch max(_time) from summary index and then store it back in the lookup table.

Thus, please help by sharing your inputs.

Thank you

yuanliu · ‎09-09-2022

To update only selective rows in a CSV file, you'll need an external data management system such as a traditional DBMS. It would be easier to base your lookup on that DBMS in that case.

However,

In my scenario, the summary index: - idx1, has data for 100s of actual indexes. But, I need to have the lookup table updated with latest timestamp only for those hosts which are already stored in it.

I could be interpret the underlined part wrong. But to me, that means that it contains all data that table1.csv needs. Therefore, the following should suffice:

| tstats max(_time) as last_seen where index=idx1 earliest=0 by orig_index orig_sourcetype orig_host
| rename origin_* as *
| outputlookup table1.csv

Have you tried it? You can try without outputlookup to assess performance. You mentioned that you will run this at an interval of week or such. This search shouldn't take very long nor is it very demanding even with fairly large idx1.

How to search each value of a lookup table and update the corresponding column based on SPL?

fields

lookup

search job inspector

table

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

How to search each value of a lookup table and update the corresponding column based on SPL?

fields

lookup

search job inspector

table

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard