Hi Community, can someone please help me by using stats instead of join for this search?
| rest /services/authentication/users splunk_server=local
| search type=SAML
| fields title
| rename title AS User
| search
[| inputlookup 12k_line.csv
| fields User ]
| join type=inner User
[| rest /servicesNS/-/-/directory
| fields author
| dedup author
| sort author
| rename author AS User ]
Lots of good comments from people who know here, so just to add my thoughts specifically regarding your use of a lookup as a search constraint
Subsearches with large search constraints coming from a lookup are less efficient than using the lookup as a lookup, i.e.
| rest /services/authentication/users splunk_server=local
| search type=SAML
| fields title
| rename title AS User
| lookup 12k_line.csv User OUTPUT Found
| where isnotnull(Found)
...
The general issue is that a subsearch will first run and in your case will return the SPL phrase
(User=1 OR User=2 OR ... User=5000 OR ... User=12000)
and THEN it will add that to your SPL that gets executed, so that huge block of expanded SPL will have to be parsed whereas the lookup is likely to be far more efficient.
You can see what the subsearch will expand to by running this search
| inputlookup 12k_line.csv
| fields User
| format
In addition to everybody's speculations, the biggest problem in the SPL in my opinion is that the whole search will only return one field: User; the entire exercise/homework is to simply restrict which User values are allowed. No inner join or stats is needed for this task because plain old subsearch is designed for this.
There are a million ways to do this. Given the original SPL lavishes dedup with rest command outputs, I assume that 12k_line.csv is the largest dataset. So, I am using that as the lead search. (Any command can be used as lead search; corresponding subsearches justed need to be adjusted.)
| inputlookup 12k_line.csv
where
[rest /services/authentication/users splunk_server=local
| search type=SAML
| fields title
| rename title AS User]
[rest /servicesNS/-/-/directory
| fields author
| dedup author
| rename author AS User ]
| fields User
Hi @hank72 ,
Its hard to picture exactly what you need, but I think you can achieve this by appending the results from the REST calls and the lookup, then using stats to find users common to all three sources.
| rest /services/authentication/users splunk_server=local | search type=SAML | fields title | rename title AS User | eval source="saml" | append [| inputlookup 12k_line.csv | fields User | eval source="lookup"] | append [| rest /servicesNS/-/-/directory splunk_server=local | fields author | dedup author | rename author AS User | eval source="directory"] | stats dc(source) as source_count by User | where source_count = 3 | fields User
Note:
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
I agree with @yuanliu that you should tell more to us to get answers.
Am I right if I assume that this lookup "12k_line.csv" contains more than 10k lines?
I also presumed the 12k_line.csv would have > 10,000 events (probably 12,000!). I dont think this should be an issue here though as append supports 50,000 events by default?
@hank72 Please let us know if you have any trouble with the provided search or if I've got the wrong end of your requirements.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing.
This is a very confusing search. Is this some sort of homework? The reason why I ask is because it is always better to describe what you are trying to achieve, what the data looks like, what is the desired results based on sample input, what is the logic between input and desired output. This, as opposed to make volunteers read your mind based on some SPL snippets.