I'm trying to understand if there is a way to improve search time. I am corrolating fields from 2 or 3 indexes where the IP is the same.
Not all indexes will have matching data.
Is there a different search method I should consider? Is there something specific I should look for in the Job Inspector? (I did see search.kv invoked 235 times/ 50 secs, not sure why)
Appreciate any suggestions or tips or directions to look in!
(sourcetype="dns" index=ipamdns NOT "notify" x.x.x) OR ( index=ipamdns sourcetype="dhcp") OR (sourcetype="eventlog_security" security_ip=*)
| eval ip_match=coalesce(dhcp_client_ip, dns_request_client_ip, security_ip)
| stats
values(dhcp_client_ip) as dhcp_ip
values(dhcp_hostname) as dhcp_hostname
values(dns_request_client_ip) as dns_client_ip
values(security_ip) as security_ip
values(Workstation_Name) as hostname
values(Account_Domain) as domain
by ip_match
| where NOT isnull(dhcp_hostname) AND NOT isnull(dns_client_ip)
|fields - src_ip, dns_client_ip, ad_client_ip, ad_client_name
The job inspector shows:
Duration (seconds) Component Invocations Input count Output count
0.24 command.addinfo 236 2,855,808 2,855,808
1.57 command.eval 236 2,855,808 2,855,808
0.22 command.fields 237 2,856,174 2,856,174
5.44 command.prestats 236 2,855,808 168,181
70.45 command.search 236 - 2,855,808
13.16 command.search.filter 235 - -
2.37 command.search.fieldalias 235 2,916,002 2,916,002
0.68 command.search.index 5 - -
0.23 command.search.calcfields 235 2,916,002 2,916,002
0.05 command.search.expand_search 1 - -
0.00 command.search.index.usec_1_8 4,221 - -
0.00 command.search.index.usec_64_512 9 - -
0.00 command.search.index.usec_8_64 685 - -
50.60 command.search.kv 235 - -
3.86 command.search.rawdata 235 - -
0.24 command.search.tags 235 2,855,808 2,855,808
0.24 command.search.typer 235 2,855,808 2,855,808
0.23 command.search.lookups 235 2,916,002 2,916,002
0.15 command.search.summary 236 - -
0.00 command.search.parse_directives 1 - -
0.93 command.stats 238 168,181 -
0.92 command.stats.execute_input 237 168,181 -
0.02 command.stats.execute_output 1 - -
0.00 command.where 1 6,494 366
0.01 dispatch.check_disk_usage 8 - -
0.00 dispatch.createdSearchResultInfrastructure 1 - -
0.07 dispatch.evaluate 1 - -
0.07 dispatch.evaluate.search 1 - -
0.00 dispatch.evaluate.eval 1 - -
0.00 dispatch.evaluate.fields 1 - -
0.00 dispatch.evaluate.stats 1 - -
0.00 dispatch.evaluate.where 1 - -
75.52 dispatch.fetch 237 - -
77.14 dispatch.localSearch 1 - -
0.08 dispatch.optimize.FinalEval 1 - -
0.18 dispatch.optimize.matchReportAcceleration 1 - -
0.00 dispatch.optimize.optimization 1 - -
0.00 dispatch.optimize.reparse 1 - -
0.00 dispatch.optimize.toJson 1 - -
0.00 dispatch.optimize.toSpl 1 - -
0.79 dispatch.preview 60 - -
0.62 dispatch.preview.stats.execute_output 60 - -
0.16 dispatch.preview.command.where 60 210,093 10,060
0.06 dispatch.preview.command.fields 60 10,060 10,060
0.06 dispatch.preview.write_results_to_disk 60 - -
77.23 dispatch.stream.local 236 - -
0.22 dispatch.writeStatus 126 - -
0.03 startup.configuration 1 - -
0.09 startup.handoff 1
I think the biggest improvement has been from changing my query so that the top level sourectype searches could find the relavent events easier, by adding the DHCPREQUEST key word.
Give this a try
(sourcetype="dns" index=ipamdns NOT "notify" x.x.x) OR ( index=ipamdns sourcetype="dhcp") OR (sourcetype="eventlog_security" security_ip=*)
| fields dhcp_client_ip, dns_request_client_ip, security_ip, dhcp_hostname,Workstation_Name,Account_Domain
| eval ip_match=coalesce(dhcp_client_ip, dns_request_client_ip, security_ip)
| stats
values(dhcp_client_ip) as dhcp_ip
values(dhcp_hostname) as dhcp_hostname
values(dns_request_client_ip) as dns_client_ip
values(security_ip) as security_ip
values(Workstation_Name) as hostname
values(Account_Domain) as domain
by ip_match
| where isnotnull(dhcp_hostname) AND isnotnull(dns_client_ip)
The only change was
FROM | where NOT isnull(dhcp_hostname) AND NOT isnull(dns_client_ip)
==> TO | where isnotnull(dhcp_hostname) AND isnotnull(dns_client_ip)
right?
It didn't seem to make any difference when looking at the Job Inspector, event if I expand out to 48 hrs. (only about 890,000 events) Could be it would as data increases though. Thanks for the suggestion.
FYI other change was adding fields command upfront to reduce amount of data to be processed. Did it have any effect on execution time?
I compared just the fields up front change, maybe it was a second faster.
Appreciate the suggestion!
On a 7d search I dropped from about 168s to 159s
1) Since you are anyways checking for NOT isnull(dns_client_ip) later in your Search, it implies that you are only expecting events with dns_request_client_ip. So, I dont see a need of coalesce command. You can perform stats by dns_request_client_ip directly.
2) You should evaluate your base search to see whether you really need NOT or is there another way to find required events. (inclusion is better than exclusion)
3) I have added dc(dhcp_hostname) as dc_dhcp_hostname to allow search instead of where clause.
(sourcetype="dns" index=ipamdns dns_client_ip=* NOT "notify" x.x.x) OR ( index=ipamdns sourcetype="dhcp" dhcp_client_ip=*) OR (sourcetype="eventlog_security" security_ip=*)
| fields dns_client_ip dhcp_client_ip security_ip dhcp_hostname hostname domain
| stats
values(dhcp_client_ip) as dhcp_ip
values(dhcp_hostname) as dhcp_hostname
dc(dhcp_hostname) as dc_dhcp_hostname
values(dns_request_client_ip) as dns_client_ip
values(security_ip) as security_ip
values(Workstation_Name) as hostname
values(Account_Domain) as domain
by dhcp_client_ip
| search dc_dhcp_hostname>0
| fields - dc_dhcp_hostname
I appreciate #2 very much, I think I have found some savings there.
I'm still trying to test the search you gave me (partially obfuscated, not sure why I did that lol). Initial tests I either get no results, or only get dns results and no security or dhcp results included. But I may have a bad field name or something.
I think I also found some savings by | fields - eventtype, another thread suggested that.
Thank you!!
fields inclusion (fields + or simply fields) may improve search performance but not field exclusion (field -). Since field exclusion occurs after field discovery. That is what I did in first pipe after base search.
If you feel coalesce command is required, then your where filterNOT isnull(dns_client_ip)
is contradicting the same. However, I might have misunderstood if that is not the case.
Can you put some of the sample events from the three data sources?
Thank you @niketn.
Since my fields with IP address are different in each sourcetype, if I try to do stats by any one of them results from the other two are dropped out. The NOT filters out results that only have one sourcetype (not enough data to really be helpful here).
security event samples:
--
Timestamp
Workstation Name: PC1
Source Network Address: 10.1.1.1
Account Domain: Domain1
Timestamp
dns_request_client_ip 10.1.1.1
dns_request_name_server x.x.x.x
timestamp
dhcpd DHCPREQUEST for 10.1.1.1 from mac_add (hostname1) via router_ip uid
My goal is to find DNS queries that contain a specific string, and then find the matching fields from the security log and the dhcp log if they exist. If they don't exist we don't have administrative control and can ignore. Matching fields may exist only on the security log, only on the dhcp log, or both.
I really have not done a lot of complicated searches before; the feedback I've gotten so far has been really great! Not sure if there are further refinements anyone would suggest or if the things so far are it.
Change this:
| where NOT isnull(dhcp_hostname) AND NOT isnull(dns_client_ip)
To this:
| search dhcp_hostname="*" AND dns_client_ip="*"
oddly enough that took a few seconds longer although its a bit easier to read for sure.
(Running just prior to making your change took 70.something seconds)
Duration (seconds) Component Invocations Input count Output count
0.23 command.addinfo 227 2,740,564 2,740,564
1.55 command.eval 227 2,740,564 2,740,564
0.21 command.fields 228 2,740,584 2,740,584
5.50 command.prestats 227 2,740,564 159,585
66.84 command.search 228 6,339 2,740,584
9.20 command.search.filter 227 - -
2.35 command.search.fieldalias 226 2,799,235 2,799,235
0.63 command.search.index 5 - -
0.22 command.search.calcfields 226 2,799,235 2,799,235
0.03 command.search.expand_search 1 - -
0.00 command.search.index.usec_1_8 3,762 - -
0.00 command.search.index.usec_8_64 632 - -
51.07 command.search.kv 226 - -
3.81 command.search.rawdata 226 - -
0.23 command.search.tags 226 2,740,564 2,740,564
0.22 command.search.lookups 226 2,799,235 2,799,235
0.22 command.search.typer 226 2,740,564 2,740,564
0.14 command.search.summary 227 - -
0.00 command.search.parse_directives 1 - -
0.91 command.stats 229 159,585 -
0.90 command.stats.execute_input 228 159,585 -
0.02 command.stats.execute_output 1 - -
0.01 dispatch.check_disk_usage 8 - -
0.00 dispatch.createdSearchResultInfrastructure 1 - -
0.04 dispatch.evaluate 1 - -
0.04 dispatch.evaluate.search 2 - -
0.00 dispatch.evaluate.eval 1 - -
0.00 dispatch.evaluate.fields 1 - -
0.00 dispatch.evaluate.stats 1 - -
71.89 dispatch.fetch 228 - -
73.58 dispatch.localSearch 1 - -
0.04 dispatch.optimize.FinalEval 1 - -
0.17 dispatch.optimize.matchReportAcceleration 1 - -
0.00 dispatch.optimize.optimization 1 - -
0.00 dispatch.optimize.reparse 1 - -
0.00 dispatch.optimize.toJson 1 - -
0.00 dispatch.optimize.toSpl 1 - -
0.84 dispatch.preview 60 - -
0.18 dispatch.preview.command.search 60 208,603 701
0.06 dispatch.preview.command.fields 60 701 701
0.18 dispatch.preview.command.search.filter 60 - -
0.66 dispatch.preview.stats.execute_output 60 - -
0.06 dispatch.preview.write_results_to_disk 60 - -
73.66 dispatch.stream.local 227 - -
0.22 dispatch.writeStatus 127 - -
0.02 startup.configuration 1 - -
0.00 startup.handoff 1 - -