Re: use stats instead

hank72 · ‎04-01-2025

Hi Community, can someone please help me by using stats instead of join for this search?

| rest /services/authentication/users splunk_server=local
   | search type=SAML
   | fields title
   | rename title AS User
   | search
       [| inputlookup 12k_line.csv
       | fields User ]
| join type=inner User
   [| rest /servicesNS/-/-/directory
   | fields author
   | dedup author
   | sort author
   | rename author AS User ]

bowesmana · ‎04-03-2025

Lots of good comments from people who know here, so just to add my thoughts specifically regarding your use of a lookup as a search constraint

Subsearches with large search constraints coming from a lookup are less efficient than using the lookup as a lookup, i.e.

| rest /services/authentication/users splunk_server=local
| search type=SAML
| fields title
| rename title AS User
| lookup 12k_line.csv User OUTPUT Found
| where isnotnull(Found)
...

The general issue is that a subsearch will first run and in your case will return the SPL phrase

(User=1 OR User=2 OR ... User=5000 OR ... User=12000)

and THEN it will add that to your SPL that gets executed, so that huge block of expanded SPL will have to be parsed whereas the lookup is likely to be far more efficient.

You can see what the subsearch will expand to by running this search

| inputlookup 12k_line.csv
| fields User
| format

yuanliu · ‎04-02-2025

In addition to everybody's speculations, the biggest problem in the SPL in my opinion is that the whole search will only return one field: User; the entire exercise/homework is to simply restrict which User values are allowed. No inner join or stats is needed for this task because plain old subsearch is designed for this.

There are a million ways to do this. Given the original SPL lavishes dedup with rest command outputs, I assume that 12k_line.csv is the largest dataset. So, I am using that as the lead search. (Any command can be used as lead search; corresponding subsearches justed need to be adjusted.)

| inputlookup 12k_line.csv
  where 
    [rest /services/authentication/users splunk_server=local
    | search type=SAML
    | fields title
    | rename title AS User]
    [rest /servicesNS/-/-/directory
    | fields author
    | dedup author
    | rename author AS User ]
| fields User

livehybrid · ‎04-01-2025

Hi @hank72 ,

Its hard to picture exactly what you need, but I think you can achieve this by appending the results from the REST calls and the lookup, then using stats to find users common to all three sources.

| rest /services/authentication/users splunk_server=local
| search type=SAML
| fields title
| rename title AS User
| eval source="saml"
| append [| inputlookup 12k_line.csv | fields User | eval source="lookup"]
| append [| rest /servicesNS/-/-/directory splunk_server=local | fields author | dedup author | rename author AS User | eval source="directory"]
| stats dc(source) as source_count by User
| where source_count = 3
| fields User

The first section retrieves SAML users and assigns source="saml".
The first append adds users from your lookup file, assigning source="lookup".
The second append adds directory authors, assigning source="directory".
stats dc(source) by User counts how many distinct sources (saml, lookup, directory) each user appears in.

Note:

Using multiple calls and append can be less performant depending on the volume of data returned by each source.

🌟 Did this answer help you? If so, please consider:

Adding kudos to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

isoutamo · ‎04-02-2025

I agree with @yuanliu that you should tell more to us to get answers.

Am I right if I assume that this lookup "12k_line.csv" contains more than 10k lines?

livehybrid · ‎04-02-2025

@isoutamo

I also presumed the 12k_line.csv would have > 10,000 events (probably 12,000!). I dont think this should be an issue here though as append supports 50,000 events by default?

@hank72 Please let us know if you have any trouble with the provided search or if I've got the wrong end of your requirements.

🌟 Did this answer help you? If so, please consider:

Adding kudos to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing.

yuanliu · ‎04-01-2025

This is a very confusing search. Is this some sort of homework? The reason why I ask is because it is always better to describe what you are trying to achieve, what the data looks like, what is the desired results based on sample input, what is the logic between input and desired output. This, as opposed to make volunteers read your mind based on some SPL snippets.

use stats instead

join

stats

subsearch

Index This | What goes away as soon as you talk about it?

What's New in Splunk Observability Cloud and Splunk AppDynamics - May 2025

Getting Started with Splunk Artificial Intelligence, Insights for Nonprofits, and ...

Are you a member of the Splunk Community?