Splunk Search

How to search each value of a lookup table and update the corresponding column based on SPL?

Taruchit
Contributor

Hi All,

I have a lookup table table1.csv with following fields: -

index
sourcetype
host
last_seen

I have a custom index: idx1 which has following fields: -
orig_index
orig_sourcetype
orig_host

I need to search each host value from lookup table in the custom index and fetch the max(_time) and then store that value against the same host in last_seen. 

I tried the below SPL to build the SPL, but it is not fetching any results: -

|inputlookup table1.csv 
|eval index=lower(index)
|eval host=lower(host)
|eval sourcetype=lower(sourcetype)
|table index, host, sourcetype
|rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype
|format
|eval searchq=search
|eval searchq="index=\"idx1\"".searchq."|stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host"
|search searchq

However, when I used 

|fields searchq

It gave a proper SPL as the result: -

index="idx1" (orig_host="1.1.1.1" AND orig_index="xxx" AND orig_sourcetype="sourcetype1") OR (orig_host="1.1.1.2" AND orig_index="xxx" AND orig_sourcetype="sourcetype2"))|stats max(_time) AS last_time BY orig_index, orig_sourcetype, orig_host

And when I run the above resulting SPL as separate search, I get the proper results. 

Thus, please share if there is a way to correct the above approach or if some different approach can help to build the solution.

Thank you

Labels (5)
0 Karma
1 Solution

JacekF
Path Finder

Will bellow query meet your requirements?

index="idx1" 
    [| inputlookup table1.csv 
    | eval index=lower(index) 
    | eval host=lower(host) 
    | eval sourcetype=lower(sourcetype) 
    | table index, host, sourcetype 
    | rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype] 
| stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host
| rename orig_index as index orig_sourcetype as sourcetype orig_host as host
| append [ | inputlookup table1.csv ]
| dedup index sourcetype host 
| outputlookup table1.csv

Please note that if you have many rows in your lookup file this solution will be time and resource consuming, as it will need to load the entire lookup twice.

 

View solution in original post

JacekF
Path Finder

Will bellow query meet your requirements?

index="idx1" 
    [| inputlookup table1.csv 
    | eval index=lower(index) 
    | eval host=lower(host) 
    | eval sourcetype=lower(sourcetype) 
    | table index, host, sourcetype 
    | rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype] 
| stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host
| rename orig_index as index orig_sourcetype as sourcetype orig_host as host
| append [ | inputlookup table1.csv ]
| dedup index sourcetype host 
| outputlookup table1.csv

Please note that if you have many rows in your lookup file this solution will be time and resource consuming, as it will need to load the entire lookup twice.

 

Taruchit
Contributor

Hi @JacekF,

Thank you for your inputs and sharing the code. The SPL does meet the requirement, however, I observed some issues for which I will need your guidance.

I executed the code you shared and only removed the last line having outputlookup command, so that I can test and see results before data gets stored in the table. Also, in order to confirm if only the hosts in the lookup table are getting updated or not, I used a search command to filter couple of hosts.

index="idx1" 
    [| inputlookup table1.csv 
     | search host="10.1.1.1" OR host="10.1.1.2"
    | eval index=lower(index) 
    | eval host=lower(host) 
    | eval sourcetype=lower(sourcetype) 
    | table index, host, sourcetype 
    | rename index AS orig_index, host AS orig_host, sourcetype AS orig_sourcetype] 
| stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host
| rename orig_index as index orig_sourcetype as sourcetype orig_host as host
| append [ | inputlookup table1.csv ]
| dedup index sourcetype host 

And I see that in the result the last_seen value for the two hosts is changed to current date time, and remaining have the old values (which is an expected result). 

However, I noted that the SPL gave 50,000 records [Statistics (50,000)] as the result.

But, when I read the lookup table in separate search: -

|inputlookup table1.csv

The above command gives 50,653 records. 

 

Thus, I want your help to know why there is mismatch in number of records between the final result and the inputlookup command; and how do we resolve that issue.

Thank you

0 Karma

JacekF
Path Finder

Most likely you hit the stats command limit. This can be changed in limits.conf file. You may find this thread useful: Solved: Why do I receive "Limit (50000 results) reached." ... - Splunk Community

0 Karma

Taruchit
Contributor

Hi @JacekF,

Thank you for inputs.

|inputlookup table1.csv

The above SPL gives 50,653 records.

|inputlookup table1.csv
|stats count BY index, sourcetype, host
|inputlookup table1.csv
|stats values(last_seen) BY index, sourcetype, host

The above two SPLs also gives 50,653 records.

 

Thus, if there was a restriction of 50,000 for stats command, I think in the second SPL, I should have got lesser results.

Please share if my understanding is correct or if any correction is needed. 

Thank you

0 Karma

JacekF
Path Finder

You are reaching one of the Splunk limits defined in limits.conf. This is why you receive the total number of 50000 records.  I would suggest that you mark this thread as resolved, and open a new one regarding the limited number of rows in the result. This will, from one hand, allow others with the same problems as initial one in this thread, to find a solution, and from the other hand you will more likely find the answer to the limited results problem, as more people will take a look at it. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

at first I see that you have a fields (searchq) that isn't present in your stats command and, after a tats command you have only the fields in the stats.

In addition you use as kay orig_index, orig_sourcetype and orig_host that you renamed, so they didn't exist more.

Then in the searchq construction, you have to add a space between the first string and the content of search field.

At least, in the lookup you probably have many records,, do you want to execute all of them?

I don't like this approach and I'm not sure that it can run, so anyway, try something like this:

| inputlookup table1.csv 
| eval 
   orig_index=lower(index), 
   orig_host=lower(host), 
   orig_sourcetype=lower(sourcetype)
| eval searchq="index=idx1"." orig_index=".lower(index)." orig_sourcetype=".orig_sourcetype." orig_host=".orig_host
| search searchq

Ciao.

Giuseppe

 

0 Karma

Taruchit
Contributor

Hi @gcusello,

Thank you for sharing your inputs and the SPL.

I tried to execute the below code but it does not fetch me any results. 

| inputlookup table1.csv 
| eval 
   orig_index=lower(index), 
   orig_host=lower(host), 
   orig_sourcetype=lower(sourcetype)
| eval searchq="index=idx1"." orig_index=".lower(index)." orig_sourcetype=".orig_sourcetype." orig_host=".orig_host
| search searchq

 

In order to check the SPL that got formed and stored in the field: searchq, I used the below code: -

| inputlookup table1.csv 
| eval 
   orig_index=lower(index), 
   orig_host=lower(host), 
   orig_sourcetype=lower(sourcetype)
| eval searchq="index=idx1"." orig_index=".lower(index)." orig_sourcetype=".orig_sourcetype." orig_host=".orig_host." |stats max(_time) AS last_seen BY orig_index, orig_sourcetype, orig_host |rename orig_* AS *"
| fields searchq

The above code gave line by line SPL for each host and its corresponding sourcetype and index.

Thus, can you please share how to execute each SPL from the result and store the results in lookup table? 

Thank you

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

you search cannot run because after a stats command you have only the fields in stats, in your case searchq isn't one of them.

To use the last row, you have to add searchq to the stats command.

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Let me first clarify the requirements: You want to rebuild table1.csv without having to search the whole history.  For example, if your new search outputs only one host, you want a command to update only that host.

Because lookup table is not a real database, there is no easy way to do this.  The problem with your approach is that your | search searchq cannot operate on any indexed data.  The preceding generating command is | inputlookup, which only outputs data from table1.csv.

Potentially you can use join or append and some complicated manipulation to achieve what you needed, but it may not be cheaper than simply rebuild the lookup.

To reduce the cost of searching the entire history, consider using tstats.  For example,

| tstats max(_time) as last_seen where earliest=0 by index sourcetype host
Tags (1)
0 Karma

Taruchit
Contributor

Hi @yuanliu,

Thank you for your response and sharing your inputs.

I have following points based on your message: -

1. Clarifying the requirement: -
I have a lookup table: table1.csv which has list of all hosts along with respective index, sourcetype and last_seen (in terms of epoch/unix format).

I also have a summary index (custom index): idx1, which has fields such as: - orig_index, orig_sourcetype, orig_host.

I need to check each host in lookup table and search it for respective orig_index and orig_sourcetype in the summary index, fetch max(_time) value and store it in the lookup table against that host.

Thus, the end goal is to have the lookup table updated with last time when host was being reported in the summary index. And the SPL that will carry out that operation will be scheduled to compute results at definite intervals. 

2. Thus, I will not be able to use |tstats command in the SPL, because I am using a custom index to fetch the results. 

Please share if you need any more details from my end.

Additionally, do we have any way of invoking the value of field "searchq" and use it as an SPL to fetch the results?

Thank you

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@Taruchit Thanks for the explanation.  The key question remains whether this summary index idx1 contains complete records or just partial records.  If it contains complete records, it is much easier to rebuild table1.csv than trying to selectively update rows.  The latter requires random record access, which CSV doesn't support outright; or rather, outputlookup doesn't support. (I assume that your goal is to use outputlookup to update table.)

Assuming that idx1 contains complete records, you can still use stats or tstats agains this index to build table1.csv, e.g.,

 

| tstats max(_time) as last_seen where index=idx1 earliest=0 by orig_index orig_sourcetype orig_host
| rename origin_* as *
| outputlookup table1.csv

 

Alternatively, you can exclude idx1 in tstats so it searches only original indices, e.g,

 

| tstats max(_time) as last_seen where index!=idx1 earliest=0 by index sourcetype host
| outputlookup table1.csv

 

Hope this helps.

0 Karma

Taruchit
Contributor

Hi @yuanliu,

Thank you for sharing your inputs.

In my scenario, the summary index: - idx1, has data for 100s of actual indexes. But, I need to have the lookup table updated with latest timestamp only for those hosts which are already stored in it.

The list of index values in lookup table is a subset of list of index values in the summary index.

The data in lookup table is not governed by me, thus, I do not have visibility towards the list of index values for which the hosts are stored in the table.

As the result, I will need to build a dynamic solution which takes index value, its corresponding sourcetype and host from the lookup table and then fetch max(_time) from summary index and then store it back in the lookup table.

Thus, please help by sharing your inputs.

Thank you

0 Karma

yuanliu
SplunkTrust
SplunkTrust

To update only selective rows in a CSV file, you'll need an external data management system such as a traditional DBMS.  It would be easier to base your lookup on that DBMS in that case.

However,


In my scenario, the summary index: - idx1, has data for 100s of actual indexes. But, I need to have the lookup table updated with latest timestamp only for those hosts which are already stored in it.

 I could be interpret the underlined part wrong.  But to me, that means that it contains all data that table1.csv needs.  Therefore, the following should suffice:

| tstats max(_time) as last_seen where index=idx1 earliest=0 by orig_index orig_sourcetype orig_host
| rename origin_* as *
| outputlookup table1.csv

Have you tried it?  You can try without outputlookup to assess performance.  You mentioned that you will run this at an interval of week or such.  This search shouldn't take very long nor is it very demanding even with fairly large idx1.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...