Splunk Dev

Splunk join not working as expected

omrnuki
Explorer

 

Hello,

I am forwarding data from an embedded system to an enterprise instance running on a Vm. The logs look like this:

  • acces_monitoring (indexed on splunk, the first empty space means still online):

     Access_IP                Access_time                    Logoff_time
1 192.168.200.55 1699814895.000000
2 192.168.200.55 1699814004.000000 1699814060.000000
3 192.168.200.55 1699811754.000000 1699812677.000000
4 192.168.200.55 1699808364.000000 1699809475.000000
5 192.168.200.55 1699806635.000000 1699806681.000000
6 192.168.200.55 1699791222.000000 1699806628.000000
7 192.168.200.55 1699791125.000000 1699791127.000000
8 192.168.200.55 1699724540.000000 1699724541.000000
9 192.168.200.55 1699724390.000000 1699724474.000000

 

  • command_monitoring:

      Access_IP              exec_time                     executed_command
1 192.168.200.55 1699813121.000000 cd ~
2 192.168.200.55 1699813116.000000 cd /opt
3 192.168.200.55 1699813110.000000 prova3
4 192.168.200.55 1699811813.000000 cat sshd_config
5 192.168.200.55 1699811807.000000 cd /etc/ssh
6 192.168.200.55 1699811801.000000 cd etc
7 192.168.200.55 1699811793.000000 cd
8 192.168.200.55 1699811788.000000 ls
9 192.168.200.55 1699811783.000000 e che riconosce le sessioni diverse
10 192.168.200.55 1699811776.000000 spero funziona
11 192.168.200.55 1699809221.000000 cat command_log.log
12 192.168.200.55 1699809210.000000 ./custom_shell.sh
13 192.168.200.55 1699808594.000000 CD /MEDIA
14 192.168.200.55 1699808587.000000 cd /medi
15 192.168.200.55 1699808584.000000 omar

when i try to join the two by running:

 

index=main source="/media/ssd1/ip_command_log/command_log.log"
| eval exec_time=strptime(exec_time, "%a %b %d %H:%M:%S %Y")
| rename ip_execut as Access_IP
| table Access_IP, exec_time, executed_command
| join type=left Access_IP
    [ search index=main source="/media/ssd1/splunk_wtmp_output.txt"
      | dedup  Access_time
      | eval Access_time=strptime(Access_time, "%a %b %d %H:%M:%S %Y")
      | eval Logoff_time=if(Logoff_time="still logged in", now(), strptime(Logoff_time, "%a %b %d %H:%M:%S %Y"))
      | table Access_IP, Access_time, Logoff_time ]
| eval session_active = if(exec_time >= Access_time AND exec_time <= coalesce(Logoff_time, now()), "true", "false")
| where session_active="true"
| table Access_IP, Access_time, Logoff_time, exec_time, executed_command

 

it does not join over every session but only the last one so the one started at 1699814895.000000 and it will not identify any of the commands ran on the embedded system in the correct session.What could be the catch?
 
Thanks in advance!
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Your search and your data don't match, in that you are parsing time in your SPL, but your data shows that as already in epoch time.

You need to use max=0 in the join statement.

Note that using join is not good practice, as it has a number of limitations and is slow. stats is generally the way to go, but in this case, you're effectively using the subsearch as a lookup table and are using it for a range search, but be aware.

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this (to avoid joins)

index=main source="/media/ssd1/ip_command_log/command_log.log"
| eval exec_time=strptime(exec_time, "%a %b %d %H:%M:%S %Y")
| rename ip_execut as Access_IP
| table Access_IP, exec_time, executed_command
| append
    [ search index=main source="/media/ssd1/splunk_wtmp_output.txt"
      | dedup  Access_time
      | eval Access_time=strptime(Access_time, "%a %b %d %H:%M:%S %Y")
      | eval Logoff_time=if(Logoff_time="still logged in", now(), strptime(Logoff_time, "%a %b %d %H:%M:%S %Y"))
      | table Access_IP, Access_time, Logoff_time ]
| eval event_time=coalesce(Access_time, exec_time)
| sort 0 event_time
| streamstats global=f latest(Access_time) as Access_time latest(Logoff_time) as Logoff_time by Access_IP
| where exec_time>=Access_time AND exec_time<=coalesce(Logoff_time,now())
| table Access_IP, Access_time, Logoff_time, exec_time, executed_command

omrnuki
Explorer

Is the only advantage that append is not limited as the join?

Thank you

0 Karma

bowesmana
SplunkTrust
SplunkTrust

In general, all subsearches have limitations, but most SQL people come to Splunk and think that join is the way to go, which is not the case. The first choice should always be to NOT use join.

Append will also have limits on the number of results - there are plenty of discussions on the topic and Splunk has documentation on these limits.

So, really, if your data size is large, you need to be aware of these limits, but also from a performance point of view, join is not the best way to go and searches using join will impact on other users on the search head.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Your search and your data don't match, in that you are parsing time in your SPL, but your data shows that as already in epoch time.

You need to use max=0 in the join statement.

Note that using join is not good practice, as it has a number of limitations and is slow. stats is generally the way to go, but in this case, you're effectively using the subsearch as a lookup table and are using it for a range search, but be aware.

omrnuki
Explorer

Thank you! The max = 0 flag is what i had missing indeed. The data i provided is the result of the search that is why it's epoch. The logs i am managing are not that big but i will keep that in mind for the future!

 

Have a nice day!

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...