Splunk Search

How to find results present in lookup table and not in custom index?

Taruchit
Contributor

Hi All,

I have enquired about this problem earlier as well in Splunk community, thus, apologies for duplicate query.
However, I am unable to get a solution that helps to get my results 100% correct. Moreover, the previous threads are old, thus, not sure if my comments will get visibility in those threads. 

Resources in hand:- 
I have a lookup table which has many fields. I am concerned with two fields: index and host.
I have a custom index, which has many fields, but I need to use orig_index, orig_host.

Requirement: -
I need to fetch list of those hosts for each index which are present in lookup table but not in custom index.

I tried with following with time range of last 24 hours:-

|inputlookup table.csv |fields index, host |search NOT [search index="xxx" |rename orig_* AS *| table index, host | format]

But, when I try to cross check the last reported date time of host value against the index, I get results of last hour. I tried with below SPL for cross check:-

|tstats max(indextime) AS lastTime where index="dummy" AND host="10.10.10.10" BY host |convert ctime(lastTime)

Thus, I need your help to resolve the issue.

Thank you.

Labels (6)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

ok for _time and _indextime, max(time) is the latest timestamp in a group of events that you group using tstats.

There's only one thing that I don't understand:

you want to find if there are some hosts that should send events in an index and that are missing, is this correct?

If this is your need, my first answer shoudl solve your request:

index=your_index
| eval orig_host=lower(orig_host), orig_index=lower(orig_index)
| stats count BY orig_index orig_host 
| append [ 
   | inputlookup table.csv 
   | eval orig_host=lower(host), orig_index=lower(index), count=0 
   | fields orig_index orig_host count 
   ]
| stats sum(count) AS total BY orig_index orig_host
| where total=0

Ciao.

Giuseppe

View solution in original post

Taruchit
Contributor

I got another solution from Dal Jeanis: -

index=xxx
|stats max(_time) AS _time
BY orig_index, orig_host
|rename orig_* AS *
|eval rectype="xxx"
|inputlookup table.csv append=true
|table _time index host rectype
|eval rectype=coalesce(rectype,"lookup"
|stats values(_time) AS _time values(*) AS * BY index host
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

I answered many times to questions similar to your, but there's a thing that I don't understand:

if you have in the tstats search index=dummy and host=10.10.10.10 what do you want to search?

I can understand if you want to search the index and host of the lookup missing in the main search, but what's your need?

Anyway, if you have in the lookup a list of pairs index and host and you want to check if someone of them are missed, you could run something like this:

| tstats max(indextime) AS lastTime count BY index host 
| eval host=lower(host)
| append [ 
   | inputlookup table.csv WHERE NOT index=xxx 
   | eval host=lower(host), count=0 
   | fields index host count 
   ]
| stats sum(count) AS total values(lastTime) AS last(Time) BY host
| where total=0

Ciao.

Giuseppe

Ciao.

Giuseppe

0 Karma

Taruchit
Contributor

Hi @gcusello,

Thank you for sharing the details. I think there is some communication gap which has prevented you to understand my post.

I will try to share you the details again: -

I have a lookup table and a custom index which is built using mcollect command. The custom index gets data on periodic basis.

I need to fetch the list of hosts and for each index which are present in lookup table and not in the custom index for past 24 hours.

In the lookup table, the index values are stored under field: index, and corresponding host values are stored under field: host. Thus, in lookup table, I have multiple rows of same index name but different host.

In custom index, the index values are stored under field: orig_index. The host values are stored under field: orig_host.

Now to the second part: -

|tstats max(_indextime) AS lastTime where index="dummy" AND host="10.10.10.10" BY host |convert ctime(lastTime)

I am using the above command to check whether the results I get are correct or not. Thus, here I am passing the actual index value and host value and then find if they were reporting in last 24 hours or not. 
In the above example, dummy and 10.10.10.10 are values only for representation purpose. 

 

Since I am using custom index, I cannot use the tstats command which was shared by you.

Please share if I can help by sharing any more details with regards to my problem statement or activities carried out with it.

Thank you

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

you could try something like this:

index=your_index
| eval orig_host=lower(orig_host), orig_index=lower(orig_index)
| stats count BY orig_index orig_host 
| append [ 
   | inputlookup table.csv 
   | eval orig_host=lower(host), orig_index=lower(index), count=0 
   | fields orig_index orig_host count 
   ]
| stats sum(count) AS total BY orig_index orig_host
| where total=0

In this way you extract all the pairs index,host from the lookup that aren't present in the index.

Ciao.

Giuseppe

0 Karma

Taruchit
Contributor

Hi @gcusello

Thank you for sharing the SPL. I executed it for last 24 hours.

But, when I check the max(_indextime) for those hosts, I do see results for past hour. 

Like I executed the SPL at  05/08/2022 13:54. But, when I check the _indextime for the host fetched in the result, I get 08/05/2022 12:59.

Thus, can you please suggest if I need to use _index_earliest and _index_latest or anyother modifications in the SPL to get correct results?

Thank you

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

there's something that I didn't understand: "_indextime" is the timestamp of the indexing of the event, not the event timestamp and it's usually later than the timestamp, so probably there's some misunderstanding.

Are you sure that indexitme is the correct firld to use in the tstats command?

Ciao.

Giuseppe

 

0 Karma

Taruchit
Contributor

Hi @gcusello,

Thank you for your response and sharing your inputs.

I will share below my understanding, please share if you agree or if anything is to be corrected.

_time: It gives the timestamp of extraction of events from a server or machine that generates those events.

_indextime: It gives the timestamp at which events were written in the index, that is, the time at which events arrive in Splunk.

max(_time): It tends to give future events when the events are getting generated in overlapping timezones. For example: The current time of Splunk server is 15:30. The machine that generates events is in different timezone and the machine's local time at which events got generated is 17:00. Thus, when we check in Splunk we need to wait till 17:00 to see those events, even though the events got generated now. 

Thus, to overcome this problem, and have constant way to check if hosts are reporting or not, it was felt the time at which events arrive in the index should be used, and thus, _indextime was used with tstats command.

 

Additionally, I tried using the below SPL and still get results in past 1 hour event though the SPL to fetch missing hosts says that hosts are missing. 

|tstats max(_indextime) AS lastIndexTime, max(_time) AS eventTimeStamp where index="xxx" AND host="host1" |convert ctime(lastIndexTime), ctime(eventTimeStamp)

I executed the main SPL at 6:31 AM and got results for missing hosts. 

Then I used the above SPL for one of those missing host and observed: -

lastIndexTime: 08/08/2022 06:31:10

eventTimeStamp: 08/08/2022 06:29:34

 

Please help by sharing your inputs.

Thank you

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Taruchit,

ok for _time and _indextime, max(time) is the latest timestamp in a group of events that you group using tstats.

There's only one thing that I don't understand:

you want to find if there are some hosts that should send events in an index and that are missing, is this correct?

If this is your need, my first answer shoudl solve your request:

index=your_index
| eval orig_host=lower(orig_host), orig_index=lower(orig_index)
| stats count BY orig_index orig_host 
| append [ 
   | inputlookup table.csv 
   | eval orig_host=lower(host), orig_index=lower(index), count=0 
   | fields orig_index orig_host count 
   ]
| stats sum(count) AS total BY orig_index orig_host
| where total=0

Ciao.

Giuseppe

Taruchit
Contributor

Thank you @gcusello for sharing your inputs and solution. 

I observed in my SPL, some transformation operations were needed to match data between index and lookup table.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

From the purely theoretical point of view, the _time field is just a timestamp. It doesn't have to mean anything. It _can_ be forced to have any value whatsoever. Most typically it's parsed out of the event but might be even configured to have constant or random value. But indeed as a rule of thumb it's supposed to be a "timestamp of the event" and as such it's most usually either parsed out of the event or - if the parsing is misconfigured or the timestamp is not contained within the event - is taken from the forwarder/indexer.

_indextime on the other hand contains the timestamp (as recorded by the indexer) at which the event was written to the index.

So the _indextime is supposed to be "right" if you have your servers properly set up and synchronized with reliable time source and _time doesn't have to be.

In typical case _time will be either slightly "older" than _indextime (in case of sources streaming events continously). In specific cases where the events are ingested in batches (like a server writes logs to a file, then the file is copied over to another server from which it is ingested into splunk), the _indextime-_time delay would be relatively big during some parts of the day and lower during others. Irregular delays between _time and _indextime might suggest interminnent ingestion queues clogging and other forwarding problems.

If you have situation when _time is _after_ _indextime - you most probably have a problem with your sources (typically its time synchronization or timezone settings) and/or parsing rules.

Taruchit
Contributor

Another example of results for your reference from below SPL: -

|tstats max(_indextime) AS lastIndexTime, max(_time) AS eventTimeStamp where index="xxx" AND host="host1" |convert ctime(lastIndexTime), ctime(eventTimeStamp)

-> lastIndexTime: 08/08/2022 02:53:59

-> eventTimeStamp: 08/08/2022 06:52:57

SPL was executed at 7:07 AM

Observation: -

max(_time) is after max(_indextime). 

 

Thank you

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...