Hello,
I have 2 queries where indices are different and have a common field dest_ip which is my focus(same field name in both indices). Please note that there are also some other common fields such as src_ip, action etc.
Query 1:
index=*corelight* sourcetype=*corelight* server_name="*microsoft.com*
additional fields: action, ssl_version, ssl_cipher
Query 2:
index="*firewall*" sourcetype=*traffic* src_ip=10.1.1.100
additional fields: _time, src_zone, src_ip, dest_zone, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
I'm trying to output all the corresponding server_names for each dest_ip, as a table with all the listed fields from both query outputs
I'm new to Splunk and learning my way; I've tried the following so far -
A) using join (which is usually very slow and sometimes doesn't give me a result)
index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*
| join dest_ip
[ search index="*firewall*" sourcetype=*traffic* src_ip=10.1.1.100 | fields src_ip, src_user, dest_ip, rule, action, app, transport, version, session_end_reason, dvc_name, bytes_out ]
| dedup server_name
| table _time, src_ip, dest_ip, transport, dest_port, app, rule, server_name, action, session_end_reason, dvc_name
| rename _time as "timestamp", transport as "protocol"
b)
using an OR
(index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*) OR (index="*firewall*" sourcetype=*traffic* src_ip=10.1.1.100)
| dedup src_ip, dest_ip
| table src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| rename src_zone AS From, src_ip AS Source, dest_zone AS To, dest_ip AS Destination, server_name AS SNI, transport AS Protocol, dest_port AS Port, app AS "Application", rule AS "Rule", action AS "Action", session_end_reason AS "End Reason", packets_out AS "Packets Out", packets_in AS "Packets In", src_translated_ip AS "Egress IP", dvc_name AS "DC"
My questions -
Many thanks!
Thanks @bowesmana @yuanliu @gcusello for your help and input.
One thing I'm still missing is being able to populate values in all the fields listed. Let me explain with some screenshots for better context -
The output of just corelight query shows values for the fields -
The output of just the firewall query shows values for all related fields -
However, the output of the suggested OR queries does not populate the values of many of the fields (highlighted in red)-
eg (1).
eg (2).
I only see 5 values when using stats values (*) by * in the OR query (seems like it's just the fields that are common to both indices and none of others listed will be displayed?)
Any suggestions on this?
Thanks!
Please do not use screenshot to illustrate text data. Use text table or text box. But even the two index search screenshots are inconsistent, meaning there is no common dest_ip. You cannot expect all fields to be populated when there is no matching field value. This is basic mathematics.
Like @bowesmana says, find a small number of events that you know have matching dest_ip in both indices, manually calculate what the output should be, then use the proposed searches on this small dataset.
Here is a mock dataset losely based on your screenshots but WITH matching dest_ip
src_zone | src_ip | dest_zone | dest_ip | transport | dest_port | app | rule | action | session_end_reason | packets_out | packets_in | src_translated_ip | dvc_name | index | server_name | ssl_cipher | ssl_version |
trusted | 10.80.110.8 | untrusted | 152.88.1.76 | UDP | 53 | dns_base | whatever1 | blocked | policy_deny | 1 | 0 | whateverNAT | don'tmatter | *firewall* | |||
152.88.1.76 | *corelight* | whatever2 | idon'tcare | TLSv3 |
The first row is from index=*firewall*, the second from *corelight*.
Because your two searches operators on different indices, @gcusello 's search can also be expressed with append (as opposed to OR) without much penalty. Like this
index="*firewall*" sourcetype=*traffic* src_ip=10.0.0.0/8
| append
[search index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*]
| fields src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| stats values(*) AS * BY dest_ip
| rename src_zone AS From, src_ip AS Source, dest_zone AS To, dest_ip AS Destination, server_name AS SNI, transport AS Protocol, dest_port AS Port, app AS "Application", rule AS "Rule", action AS "Action", session_end_reason AS "End Reason", packets_out AS "Packets Out", packets_in AS "Packets In", src_translated_ip AS "Egress IP", dvc_name AS "DC"
Using the mock dataset, the output is
Destination | Action | Application | DC | Egress IP | End Reason | From | Packets In | Packets Out | Port | Protocol | Rule | SNI | Source | To |
152.88.1.76 | blocked | dns_base | don'tmatter | whateverNAT | policy_deny | trusted | 0 | 1 | 53 | UDP | whatever1 | whatever2 | 10.80.110.8 | untrusted |
This is a full emulation for you to play with and compare with real data
| makeresults format=csv data="src_zone, src_ip, dest_zone, dest_ip, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
trusted, 10.80.110.8, untrusted, 152.88.1.76, UDP, 53, dns_base, whatever1, blocked, policy_deny, 1, 0, whateverNAT, don'tmatter"
| table src_zone, src_ip, dest_zone, dest_ip, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| eval index="*firewall*"
``` the above emulates
index="*firewall*" sourcetype=*traffic* src_ip=10.0.0.0/8
```
| append
[makeresults format=csv data="server_name, dest_ip, ssl_version, ssl_cipher
whatever2, 152.88.1.76, TLSv3, idon'tcare"
| eval index="*corelight*"
``` the above emulates
index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*
```]
| fields src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| stats values(*) AS * BY dest_ip
| rename src_zone AS From, src_ip AS Source, dest_zone AS To, dest_ip AS Destination, server_name AS SNI, transport AS Protocol, dest_port AS Port, app AS "Application", rule AS "Rule", action AS "Action", session_end_reason AS "End Reason", packets_out AS "Packets Out", packets_in AS "Packets In", src_translated_ip AS "Egress IP", dvc_name AS "DC"
Example 1 - you are using dedup src/dest - you can't do that as I explained in my other post.
Example 2 - dedup here is not useful - you have a multivalue src_ip and you will not have any duplicate src_ip in there relating to the dest_ip, so it's redundant. Best way to work out what's wrong here is to remove the last 2 lines and just let the stats work.
If you work on a small time zone where you KNOW what data you expect - then you can more easily validate what's wrong. As @isoutamo says, you can even remove stats and just to table * - but work with a very small data set where the results are predictable.
Then you can build back the detail again.
All fields should be there if those contains some values.
You could debug it e.g.
Also if/when you are using verbose mode you can see what values you have in Events tab. With Smart or Fast mode this tab is not available.
r Ismo
Just a note on your use of dedup, you will only end up with a single event from ONE of the indexes (whichever is found first), which is one of the reasons why your search is not working as expected.
You COULD use
| dedup index src_ip dest_ip
which would leave you one event from EACH index, however, as @yuanliu has said, fields + stats + rename is generally the optimal way to do the grouping.
However, consider what exactly do you want to see in the other fields, using dedup would only give you ONE value from the event that remains after the dedup, but the stats values(*) as * would give you all values from all events for each of the src_ip grouping.
Avoid join - it's not a Splunk way to do things, has significant limitations and will silently discard data leading to variable results. stats is always the way to join data sets.
When using OR, you cannot dedup src_ip dest_ip immediately after search. That should be performed after stats, like what you do with join.
Using the same structure @gcusello proposed, you can do
(index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*) OR (index="*firewall*" sourcetype=*traffic* src_ip=10.1.1.100)
| fields src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| stats values(*) AS * BY dest_ip
| dedup src_ip, dest_ip ``` most likely this is unnecessary after stats ```
| rename src_zone AS From, src_ip AS Source, dest_zone AS To, dest_ip AS Destination, server_name AS SNI, transport AS Protocol, dest_port AS Port, app AS "Application", rule AS "Rule", action AS "Action", session_end_reason AS "End Reason", packets_out AS "Packets Out", packets_in AS "Packets In", src_translated_ip AS "Egress IP", dvc_name AS "DC"
Hi @sdcig ,
as you experienced, don't use join because it's very slow!
use the OR conditions correlated using stats, something like this to adapt to your requirements:
(index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*) OR (index="*firewall*" sourcetype=*traffic* src_ip=10.1.1.100)
| dedup src_ip, dest_ip
| fields src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| stats values(*) AS * BY dest_ip
| rename src_zone AS From, src_ip AS Source, dest_zone AS To, dest_ip AS Destination, server_name AS SNI, transport AS Protocol, dest_port AS Port, app AS "Application", rule AS "Rule", action AS "Action", session_end_reason AS "End Reason", packets_out AS "Packets Out", packets_in AS "Packets In", src_translated_ip AS "Egress IP", dvc_name AS "DC"
If possible, try to avoid to use asterisk at the beginning of a string (as in your case in index, sourcetype and host).
Ciao.
Giuseppe
Thanks @gcusello for your response and guidance.
I tried your query and it's giving me results, however it's only populating 5 fields which are ones that are common to both indices.
How do you suggest I modify the query so the output also displays all the following fields that are under index="*firewall*"?
src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
My intent is to display the data in the following order (that includes fields from both indices) if possible -
***specific fields from index=corelight are in bold for reference
src_zone AS From, src_ip AS Source, dest_zone AS To, dest_ip AS Destination, server_name AS SNI, transport AS Protocol, dest_port AS Port, app AS "Application", rule AS "Rule", action AS "Action", session_end_reason AS "End Reason", packets_out AS "Packets Out", packets_in AS "Packets In", src_translated_ip AS "Egress IP", dvc_name AS "DC", ssl_version AS Version, ssl_cipher AS ENCRPT_ALGO
Thanks!
Hi @sdcig ,
I simplified the search, in the stats command replace the values(*) As * with the five fields you want:
(index=*corelight* sourcetype=*corelight* server_name=*microsoft.com*) OR (index="*firewall*" sourcetype=*traffic* src_ip=10.1.1.100)
| dedup src_ip, dest_ip
| fields src_zone, src_ip, dest_zone, dest_ip, server_name, transport, dest_port, app, rule, action, session_end_reason, packets_out, packets_in, src_translated_ip, dvc_name
| stats
values(src_zone) AS From
values(src_ip) AS Source,
values(dest_zone) AS To
values(server_name) AS SNI
values(transport) AS Protocol
values(dest_port) AS Port
values(app) AS Application
values(rule) AS Rule
values(action) AS Action
values(session_end_reason) AS "End Reason"
values(packets_out) AS "Packets Out"
values(packets_in) AS "Packets In"
values(src_translated_ip) AS "Egress IP"
values(dvc_name) AS DC
values(src_zone) AS src_zone
BY dest_ip
| rename dest_ip AS Destination
if there are fields with different names between the two indexes, use eval coalesce to have the same field name.
Ciao.
Giuseppe