Dashboards & Visualizations

How to Optimize this Search

Builder

Hello,
Still learning and getting better at it. However, I have this very complex search in one of my dashboard panels, and I would like to optimize it.

I am reviewing the links @niketnilay offered to another Splunker in this post (https://answers.splunk.com/answers/620388/how-to-make-efficient-and-fast-searches-reports-an.html) and I figure I would post my code for additional feedback and insights.

Initially, I was looking for a way to use post-processing. But unless I am mistaken, that would work between panels; and not within the same panel, because only one per panel. Correct?

I was also thinking of running a main search in a hidden panel where depends equals something that will never happen; therefore keeping the panel hidden. In this panel I would use the and then in the main (visible) panel use . Is this viable?

Here is my complex search. Note: the tokens are selected via dropdown inputs in the dashboard.

  • $SelectedTimeRange.earliest$
    $SelectedTimeRange.latest$
    $hostNametok$
    $userId
    tok$
    $linuxId_tok

        <search>
          <query>
            [ search index="*linuxevents"
                AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                AND (source="/var/log/sudo.log" OR source="/var/log/secure")
                AND host=$hostName_tok$
                AND _raw="*$userId_tok$*"
            | append
                [ search index="*linuxevents" 
                  AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                  AND (source="/var/log/sudo.log" OR source="/var/log/secure")
                  AND host=$hostName_tok$
                  AND [ search index="*linuxevents" AND source="ps" 
                          AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                          AND host=$hostName_tok$
                          AND (USER=$userId_tok$ OR (USER="root" AND "*$userId_tok$*"))
                        | dedup pid
                        | sort +pid
                        | table pid ] ]
              | dedup _raw ]
    
              | append
                [ search index="*linuxevents" 
                    AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                    AND source="/var/log/audit/audit.log" 
                    AND host=$hostName_tok$
                    AND [ search index="*linuxevents" 
                            AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                            AND source="/var/log/audit/audit.log"  
                            AND host=$hostName_tok$
                            AND ([`multi_field_search("auid euid fsuid id inode_uid oauid ouid sauid suid uid", "$linuxId_tok$")`]
                              OR [`multi_field_search("user acct cwd name", "*$userId_tok$*")`])
                          | dedup event_id
                          | sort +event_id
                          | table event_id ] 
    
              | append                    
                  [search index="*linuxevents" 
                      AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                      AND source="/var/log/audit/audit.log" 
                      AND host=$hostName_tok$
                      AND ([`multi_field_search("user acct cwd name", "*$userId_tok$*")`]
                        OR [`multi_field_search("auid euid fsuid id inode_uid oauid ouid sauid suid uid", "$linuxId_tok$")`]
                        OR [ search index="*linuxevents" 
                                AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                                AND source="/var/log/audit/audit.log"  
                                AND host=$hostName_tok$
                                AND [`multi_field_search("user acct cwd name", "*$userId_tok$*")`]
                                AND (auid!="0" AND auid!="4294967295")
                                AND addr!="?"
                                  | head limit=1
                                  | eval addr="\"".addr."\""
                                  | return $addr ])  ] 
    
              | append
                  [ search index="*linuxevents"
                      AND earliest=$Selected_Time_Range.earliest$ AND latest=$Selected_Time_Range.latest$
                      AND source="/var/log/audit/audit.log"
                      AND host=$hostName_tok$
                      AND (_raw="*new auid*" AND "$linuxId_tok$") ]
                  | dedup _raw
                  | transaction event_id ]
    
              | eval log=case(source=="/var/log/sudo.log", "sudo", source=="/var/log/secure", "secure", source=="/var/log/audit/audit.log", "audit")
              | sort +_time
              | table _time, log, host, _raw
          </query>
        </search>
    

Thanks for direction and ideas in advance.
God bless,
Genesius

PS I am overthinking this? Is the answer that obvious?

0 Karma

Builder

@woodcock
Here is what I need to provide to our *nix SysAdmins.

Current process: they open 3 separate windows on their monitors. One is to view audit.log, the other to view /var/log/secure, and the last to view sudo.log.
Requirements: a single view of all three on one screen, with the ability to drill down into each individual log if needed.
Initially, we installed the Splunk Linux Auditd app. And while a much better app than the Splunk App for Unix and Linux (utterly useless to our SysAdmins), the Auditd app does not return ALL the events associated with an ssh session as I hope to explain below.

Before I get into that, I would like to comment on the suggestions from gcusello, and why I did or did not follow his valuable advice.

check the number of results of each subsearch because in Splunk there's the limit (configurable but to avoid) of 50,000 results of subsearches;

Don’t believe any of the subsearches will return more than 50,000. But this will be dependent upon the Time Picker. Which is a question I raised on another post.

you don't need to repeat $SelectedTimeRange.earliest$ AND latest=$SelectedTimeRange.latest$ in every search except when you want a different value because they are taken by default;

Removed from the query in all subsearches and the main search.

you don't need to use raw="*$userIdtok$*" because if you use a string for search it's used in _raw, in other words you should try to forget to think to Splunk using a database approach!

Unfortunately, the events in the source="ps" are not clearly defined extractions (see explanation further down on why this search is required).
One of the ps events below shows the user=user1 as the first field. Therefore, I know these are events related to user1. However, there is also an event where the user=root, AND the string user1 appears elsewhere in the event. This is why my code includes (USER="root" AND "$userId_tok$"). Performing a deep-dive further into the ps events I found the field ARGS, which Splunk extracts using the ps.sh. Using this field instead of _raw speeds up this subsearch due to the removal of the wildcard at the beginning of the field. Further testing is required to determine if the wildcard at the end of the field ARGS can be removed as well.
This SPL line changed from
(USER=$userIdtok$ OR (USER="root" AND "*$userIdtok$"))
to
(USER=$userIdtok$ OR (USER="root" AND ARGS="$userIdtok$
")).

Sample events from source=ps. From this subsearch I am extracting a list of PIDs associated with user1.

root 7267 14 0.0 00:00:00 0.0 4032 104388 ? S 01:01 sshd: user1_[priv]
user1 7271 2 0.0 00:00:00 0.0 2712 104720 ? S 01:01 sshd: user1@notty

you don't need to use "AND" boolean operator because there's by default;

I understand this. Does this slow the SPL down? If not, I like to include it for readability.

you don't need to use "+" in | sort +_time it's a default;

My experience has been that if I don’t use | sort +_time, the table comes out with latest event first. I require first event first. Can this be modified in a conf file?

I see that you're using the same search parameters in all the searches.

At the present I don’t see any way around this, which is why I am posting. Is there is a command to state to use the same index, host, acct, addr, and auid?

Now that I have answered gcusello’s suggestions, here are the reasons for the complexity of this SPL. There would be several events missing for user1’s entire ssh session without this myriad of subsearches.
There are some field-value pairs that are not known for user1’s ssh session until other (later) events are seen.
Below is an abridged list of the events from this dashboard. I've inserted a number in front of each event for readability.

(1)    type=CRYPTO_KEY_USER msg=audit(1569008008.220:10732729): user pid=20626 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=28:6a:cb:a7:ab:67:d4:85:ff:34:99:b7:c9:f5:55:5c direction=? spid=20626 suid=0 exe="/usr/sbin/sshd" hostname=? addr=10.10.10.10 terminal=? res=success'
(2)    type=CRYPTO_KEY_USER msg=audit(1569008008.220:10732730): user pid=20626 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=48:24:f7:c4:0a:87:94:34:88:22:e1:88:47:1a:15:e2 direction=? spid=20626 suid=0 exe="/usr/sbin/sshd" hostname=? addr=10.10.10.10 terminal=? res=success'
(3)    type=CRYPTO_SESSION msg=audit(1569008008.220:10732731): user pid=20623 uid=0 auid=4294967295 ses=4294967295 msg='op=start direction=from-client cipher=aes128-ctr ksize=128 spid=20626 suid=74 rport=41507 laddr=10.1.1.1 lport=22 exe="/usr/sbin/sshd" hostname=? addr=10.10.10.10 terminal=? res=success'
(4)    type=CRYPTO_SESSION msg=audit(1569008008.220:10732732): user pid=20623 uid=0 auid=4294967295 ses=4294967295 msg='op=start direction=from-server cipher=aes128-ctr ksize=128 spid=20626 suid=74 rport=41507 laddr=10.1.1.1 lport=22 exe="/usr/sbin/sshd" hostname=? addr=10.10.10.10 terminal=? res=success'
(4)    type=USER_AUTH msg=audit(1569008008.350:10732733): user pid=20623 uid=0 auid=4294967295 ses=4294967295 msg='op=pubkey_auth rport=41507 acct="user1" exe="/usr/sbin/sshd" hostname=? addr=10.10.10.10 terminal=? res=success'

The first 4 events do not have an acct field (selected from the earlier dropdown). The only way to correlate these 4 events with the ssh session is by capturing from the 5th event of the results, which is from audit.log. This event includes the acct field and the value we are searching for, user1, as well as the value of its corresponding addr 10.10.10.10. A new subsearch will be run to include these 4 events, as well as any other events in audit, secure or sudo that contain only addr.

(12) type=LOGIN msg=audit(1569008008.358:10732739): pid=20623 uid=0 old auid=4294967295 new auid=1014 old ses=4294967295 new ses=23056

In the 12th event, also from audit.log the auid is set for user1. Prior audit.log events had auid set as the RedHat default auid=4294967295. However, Splunk doesn’t extract this field as new auid. Checking the values for auid on this event are both 4294967295 and 1014. The field auid has become a multivalued field for this event. Not good. A custom extract was created.
Now we have values for the 3 main fields required to perform this search: acct=user1; addr=10.10.10.10; and auid=1014.

(42) 2019-09-20T15:33:28.536160-04:00 ruby01-s sshd[20623]: subsystem request for sftp

In event 42, from /var/log/secure, there are no acct, addr or auid fields. This event was discovered because of the source=ps subsearch performed earlier where the pid was extracted (20623). A subsearch for this PID (and all others associated with user1) needs to be performed across all 3 logs to find any other events.

After gathering all the events from the 3 logs based on the acct, auid, or addr fields, as well as the PIDs from source=ps, a final seach is run against all events with the transaction command.

(45) type=SYSCALL msg=audit(1569008008.833:10732770): arch=c000003e syscall=2 success=yes exit=8 a0=7fff309e25b0 a1=42 a2=180 a3=8 items=2 ppid=4246 pid=20623 auid=1014 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=23056 comm="sshd" exe="/usr/sbin/sshd" key="logins" 
type=CWD msg=audit(1569008008.833:10732770): cwd="/" 
type=PATH msg=audit(1569008008.833:10732770): item=0 name="/var/log/" inode=130451 dev=fd:03 mode=040755 ouid=0 ogid=0 rdev=00:00 nametype=PARENT 
type=PATH msg=audit(1569008008.833:10732770): item=1 name="/var/log/lastlog" inode=135218 dev=fd:03 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL

In event 45 you can see the last three(sub?) events of the transaction do not include acct, auid, or addr. These events are correlated by running transaction on the eventid field (10732770).

This process repeats itself multiple times throughout the logs.

I hope this explains our SysAdmins requirements, and what missing events were discovered when the logs were deep-dived.

Follow up post will include the updated XML.

Thanks and God bless,
Genesius

0 Karma

Esteemed Legend

Really, I would start from scratch on your core search. It seems poorly constructed and I am quite sure that we can create a far smaller and more efficient search. What exactly are you trying to do with it?

Esteemed Legend

Your idea with the hidden panel does work and I have used it before. The problem is your search. Will you explain what you are trying to do? It definitely needs to be reworked. In particular, it looks like the entire thing is wrapped in a subsearch. For starters, remove the [] that surrounds everything as it is not needed.

Legend

Hi genesiusj,
use post process search in one panel has no reason, because the advantage to use pps is to run one search and use results in many panels instead running many searches one for each panel.

It's difficoult to help you without the main search and the macros you used. anyway, some little hint for your searches:

  • check the number of results of each subsearch because in Splunk there's the limit (configurable but to avoid) of 50,000 results of subsearches;
  • you don't need to repeat $SelectedTimeRange.earliest$ AND latest=$SelectedTimeRange.latest$ in every search except when you want a different value because they are taken by default;
  • you don't need to use _raw="*$userId_tok$*" because if you use a string for search it's used in _raw, in other words you should try to forget to think to Splunk using a database approach!
  • you don't need to use "AND" boolean operator because there's by default;
  • you don't need to use "+" in | sort +_time it's a default;
  • I see that you're using the same search parameters in all the searches

search index="linuxevents"
AND earliest=$SelectedTimeRange.earliest$ AND latest=$SelectedTimeRange.latest$
AND (source="/var/log/sudo.log" OR source="/var/log/secure")
AND host=$hostName_tok$
AND _raw="
$userId_tok$*"

so you should try to build your search in a different way, putting the search parameters in the main search.

Bye.
Giuseppe

0 Karma

Builder

@gcusello
I have been taking the Fund2 training this week and haven't had a chance to review your comments. I hope to beginning of next week.
In the meantime, thank you and God bless,
Genesius

0 Karma

Builder

@gcusello
Thanks for your suggestions. I just posted a deeper explanation what my requirements are, as well as responding to your suggestions.
Thanks and God bless,
Genesius

0 Karma