Getting Data In

What sourcetypes or sources aren't being searched

Splunk Employee
Splunk Employee

Is there a way to determine what sources and/or sourcetypes AREN'T being searched? If data is coming into Splunk and nobody is really looking at that data then I don't need to keep bringing it in. I just want to find a way to determine this.

1 Solution

SplunkTrust
SplunkTrust

Not really... Splunk does log the searches it performs - in 5.0 they go into the _audit index, (and I think in 4.0 they had their own searches.log). However it doesn't actualy log the sourcetypes that are being returned in search results and that's really what you'd need.

By looking at searches logged you'll only ever be able to see people searching for explicit sourcetypes like sourcetype="foo". Thus you'll miss lots of searches. You'll miss any search that has sourcetype terms in a macro. You'll also miss searches where the initial search clause doesn't specify a sourcetype term but where the search assumes that they will be matched by the other searchterms. Also it wont match prefix searches on sourcetypes, and so on and so forth.

For the record, here's the best limited search I could come up with. This will find the sourcetypes that nobody has been explicitly searching for with exact sourcetype="foo" searchterms... Although it's kind of a cool search it's not going to be very reliable.

index=_audit action=search info=granted | eval _raw=search | eval _raw=mvindex(split(_raw,"|"),0) | table _raw | extract | stats count by sourcetype | eval hasBeenSearched=1 | append [| metadata index=* type="sourcetypes" | eval hasBeenSearched="0"] | stats max(hasBeenSearched) as hasBeenSearched by sourcetype | search hasBeenSearched="0"

View solution in original post

Builder

UP, is this still not possible in V7?
Thanks.

0 Karma

SplunkTrust
SplunkTrust

Still not possible in 7.2.4 without searching the audit index or building your own searches.

In Alerts for Splunk Admins or github link I have queries like "SearchHeadLevel - Search Queries summary exact match", where I attempt to determine what was queried per-index, you could do a similar search per sourcetype but it might get tricky...

Also you would never get it exactly right for wildcards, I built "SearchHeadLevel - Search Queries summary non-exact match" for that but it is complicated and slow!

0 Karma

Answer has not been updated in a while is there a way to find out what sources are being searched and what ones do are not.

0 Karma

SplunkTrust
SplunkTrust

Not really... Splunk does log the searches it performs - in 5.0 they go into the _audit index, (and I think in 4.0 they had their own searches.log). However it doesn't actualy log the sourcetypes that are being returned in search results and that's really what you'd need.

By looking at searches logged you'll only ever be able to see people searching for explicit sourcetypes like sourcetype="foo". Thus you'll miss lots of searches. You'll miss any search that has sourcetype terms in a macro. You'll also miss searches where the initial search clause doesn't specify a sourcetype term but where the search assumes that they will be matched by the other searchterms. Also it wont match prefix searches on sourcetypes, and so on and so forth.

For the record, here's the best limited search I could come up with. This will find the sourcetypes that nobody has been explicitly searching for with exact sourcetype="foo" searchterms... Although it's kind of a cool search it's not going to be very reliable.

index=_audit action=search info=granted | eval _raw=search | eval _raw=mvindex(split(_raw,"|"),0) | table _raw | extract | stats count by sourcetype | eval hasBeenSearched=1 | append [| metadata index=* type="sourcetypes" | eval hasBeenSearched="0"] | stats max(hasBeenSearched) as hasBeenSearched by sourcetype | search hasBeenSearched="0"

View solution in original post

Path Finder

How to search for all the sourcetypes, corresponding indexes, and their latest accessed time in a table format?
My prime motive is to find out sourcetypes which are not recently (may be for months) searched by any user with prime focus on Last access time.

0 Karma

SplunkTrust
SplunkTrust

btw, although there is no type="indexes" with the metadata command (I wish there was), the equivalent is to run this: | eventcount index=* summarize="false" That will run basically instantaneously and it's that search you should be using in your subsearch instead of your index!=_*

0 Karma

SplunkTrust
SplunkTrust

Your modification is going to have a number of pretty significant problems. And it makes sense that you'd get more results with "top index", because only 10 indexes are being found that have been searched. Thus more are showing up as "never searched". But this is erroneous. Put the stats count by back instead of the top. And in your modification, since you are using a raw-event search instead of the fast and cheap metadata search, you will definitely be hitting subsearch limits here, causing more innaccuracy. Use a metadata command inside the subsearch.

0 Karma

Splunk Employee
Splunk Employee

Thanks. I also modified it to look for least used indexes.

index=_audit action=search info=granted
| eval _raw=search
| eval _raw=mvindex(split(_raw,"|"),0)
| table _raw
| extract
| top index
| eval hasBeenSearched=1
| append [search index!=_*
| eval hasBeenSearched="0"]
| stats max(hasBeenSearched) as hasBeenSearched by index
| search hasBeenSearched="0"

(I actually got more results with "top index" over "stats count by index".)

0 Karma

SplunkTrust
SplunkTrust

NOTE: I updated my answer, because I had a | top sourcetype in there before instead of | stats count by sourcetype. Since top has a default of "10", this would have limited you to only 10 sourcetypes detected. I changed it to | stats count by sourcetype which wont have that limitation.

0 Karma

Splunk Employee
Splunk Employee

Thanks! This helps a lot. Our guidance is for people to specify index= and sourcetype= in their queries so this will works well.

0 Karma