<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to make my query efficient ? in Knowledge Management</title>
    <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299589#M5337</link>
    <description>&lt;P&gt;Give this a try. In this version, we've a filter based on dest_hostname retrieved from mwb_feeds, so only relevant firewall records are loaded. If it still doesn't give any records, could you run the query part by part and see on which step you loose all the data?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; (index=pan_logs sourcetype="pan:threat" [search sourcetype=mwb_feeds earliest=-24h@h | stats count by dest_hostname | table dest_hostname] earliest=-1h@h ) OR (sourcetype=mwb_feeds earliest=-24h@h )
 | fields src_ip,dest_ip,url,dest_hostname,_time, Classification
 | eventstats values(Classification) as Classification by dest_hostname
 | where sourcetype="pan:threat" AND isnotnull(Classification)
 | dedup src_ip
 | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
 | rename dest_hostname AS HOST | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 29 Sep 2020 13:30:24 GMT</pubDate>
    <dc:creator>somesoni2</dc:creator>
    <dc:date>2020-09-29T13:30:24Z</dc:date>
    <item>
      <title>How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299579#M5327</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;

&lt;P&gt;Here is my query that I am currently running. Is there a way to make it more efficient? I am joining 2 sourcetypes from different indexes on a common field value. sourcetype=mwb_feeds are feeds that come in via rss feeds. I only want the inner sub query to look for data that came in for last 14 days to get more current feeds. Please guide on how to proceed. &lt;/P&gt;

&lt;P&gt;index=pan_logs sourcetype="pan:threat"  |  fields src_ip,dest_ip,url,dest_hostname,_time | eval NEWTIME=strftime(_time, "%b %d %H:%M:%S") | join dest_hostname [search sourcetype=mwb_feeds | fields dest_hostname,Classification | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification] | dedup src_ip | rename dest_hostname AS HOST | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST&lt;/P&gt;

&lt;P&gt;Thanks in advance for all the help !&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 13:26:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299579#M5327</guid>
      <dc:creator>dmenon84</dc:creator>
      <dc:date>2020-09-29T13:26:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299580#M5328</link>
      <description>&lt;P&gt;Give this a try&lt;BR /&gt;
&lt;STRONG&gt;Updated field name for Classification&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=pan_logs sourcetype="pan:threat") OR (sourcetype=mwb_feeds)
| fields src_ip,dest_ip,url,dest_hostname,_time Classification
| eventstats values(Classification) as Classificationby dest_hostname
| where sourcetype="pan:threat" AND isnotnull(Classification)
| dedup src_ip
| replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
| rename dest_hostname AS HOST | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Mar 2017 16:04:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299580#M5328</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-03-31T16:04:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299581#M5329</link>
      <description>&lt;P&gt;And, just in case anyone is interested in &lt;STRONG&gt;HOW&lt;/STRONG&gt; this works, and &lt;STRONG&gt;WHY&lt;/STRONG&gt; somesoni's version is more efficient, here's the blow by blow.  &lt;/P&gt;

&lt;P&gt;1) It's usually more efficient in splunk to read all the records at the same time rather than joining...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; (index=pan_logs sourcetype="pan:threat") OR (sourcetype=mwb_feeds)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;2) It's always more efficient to dump all the data you don't need at the earliest possible point, so this command keeps only the fields that you would need on any kind of record... although with the "fields" command, there's no need to explicitly list _time...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | fields src_ip,dest_ip,url,dest_hostname,_time Classification sourcetype
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;3) This command calculates all the Classifications for each host and attaches them to every event with that host on it...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | eventstats values(Classification) as Classification by dest_hostname
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;4) This eliminates the records from the mwb_feeds, since their info has already been rolled onto the other events, and keeps only the pan:threat records that are against a desthost with a threat Classification....&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | where sourcetype="pan:threat" AND isnotnull(Classification)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;5) and back to your original reporting, which was just fine as it was...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | dedup src_ip
 | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
 | rename dest_hostname AS HOST 
 | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;updated capitalization of the fieldname &lt;CODE&gt;Classification&lt;/CODE&gt;.&lt;BR /&gt;
added &lt;CODE&gt;sourcetype&lt;/CODE&gt; to the fields list&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 16:51:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299581#M5329</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T16:51:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299582#M5330</link>
      <description>&lt;P&gt;be sure to mark your code with the "code" button when you post.  It's the button marked 101 010.  Except if you're using  Internet Explorer, then you're stuck with using a grave accent (`) before and after.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 16:56:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299582#M5330</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T16:56:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299583#M5331</link>
      <description>&lt;P&gt;Another option is select the whole query and press Ctrl + K. Be sure to leave a new line between your text para and code/query block.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 17:59:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299583#M5331</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-03-31T17:59:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299584#M5332</link>
      <description>&lt;P&gt;I couldn't have explained it better.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 17:59:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299584#M5332</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-03-31T17:59:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299585#M5333</link>
      <description>&lt;P&gt;Strong, the pedantic mode, in this one is. - Yoda&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 18:58:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299585#M5333</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T18:58:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299586#M5334</link>
      <description>&lt;P&gt;Upvote this, I must.  &lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 18:59:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299586#M5334</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-03-31T18:59:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299587#M5335</link>
      <description>&lt;P&gt;Thanks a lot for quick response and solution and wonderful explanation. I ran the query you suggested but it doesn't return any data. Just to clarify I am getting in rss threat feeds every 4 hours . I modified my query a bit and when I run it I find 1 match &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=pan_logs sourcetype="pan:threat"  |  fields src_ip,dest_ip,url,dest_hostname,_time | eval NEWTIME=strftime(_time, "%b %d %H:%M:%S") | join dest_hostname [search sourcetype=mwb_feeds earliest=-24h@h  | fields dest_hostname,Classification | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification] | dedup src_ip | rename dest_hostname AS HOST | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Result returned for last 60 minutes &lt;BR /&gt;
ENGAGED IN MALWARE DISTRIBUTION computersecuritybreached-  callsupport18885691655.xyz    src_ipvalues   dest_ipvalues   computersecuritybreached-callsupport18885691655.xyz/favicon.ico Mar 31 14:26:53&lt;/P&gt;

&lt;P&gt;I ran query recommended by you but it did not return anything. Is it because they are in difference indexes. Also the firewall data is a lot so even first line takes a lot of time to run. I also tried &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=pan_logs sourcetype="pan:threat" earliest=-1h@h ) OR (sourcetype=mwb_feeds earliest=-24h@h )
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;but no luck. &lt;/P&gt;

&lt;P&gt;Classification is field in sourcetype=mwb_feeds but doesnt exist in sourcetype="pan:threat". dest_hostname is common field and can sometimes have similar data and thats what I want to match on. sourcetype=mwb_feeds has very few events compared to  pan threat logs. &lt;/P&gt;

&lt;P&gt;Thanks in advance !&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 13:26:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299587#M5335</guid>
      <dc:creator>dmenon84</dc:creator>
      <dc:date>2020-09-29T13:26:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299588#M5336</link>
      <description>&lt;P&gt;capitalization error.  in somesoni2's code, change all &lt;CODE&gt;classification&lt;/CODE&gt; to &lt;CODE&gt;Classification&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 20:00:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299588#M5336</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T20:00:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299589#M5337</link>
      <description>&lt;P&gt;Give this a try. In this version, we've a filter based on dest_hostname retrieved from mwb_feeds, so only relevant firewall records are loaded. If it still doesn't give any records, could you run the query part by part and see on which step you loose all the data?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; (index=pan_logs sourcetype="pan:threat" [search sourcetype=mwb_feeds earliest=-24h@h | stats count by dest_hostname | table dest_hostname] earliest=-1h@h ) OR (sourcetype=mwb_feeds earliest=-24h@h )
 | fields src_ip,dest_ip,url,dest_hostname,_time, Classification
 | eventstats values(Classification) as Classification by dest_hostname
 | where sourcetype="pan:threat" AND isnotnull(Classification)
 | dedup src_ip
 | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
 | rename dest_hostname AS HOST | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 29 Sep 2020 13:30:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299589#M5337</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2020-09-29T13:30:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299590#M5338</link>
      <description>&lt;P&gt;You are awesome !!!! - this works I took out the "where" clause since that was not returning anything &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=pan_logs sourcetype="pan:threat" [search sourcetype=mwb_feeds earliest=-24h@h | stats count by dest_hostname | table dest_hostname] earliest=-1h@h ) OR (sourcetype=mwb_feeds earliest=-24h@h ) | fields src_ip,dest_ip,url,dest_hostname,_time, Classification  | eventstats values(Classification) as Classification by dest_hostname   | dedup src_ip
  | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
  | rename dest_hostname AS HOST | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE", values(NEWTIME) AS "TIME CONNECTED" by Classification,HOST
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Mar 2017 20:20:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299590#M5338</guid>
      <dc:creator>dmenon84</dc:creator>
      <dc:date>2017-03-31T20:20:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299591#M5339</link>
      <description>&lt;P&gt;ROFL.  We screened out &lt;CODE&gt;sourcetype&lt;/CODE&gt; in the &lt;CODE&gt;fields&lt;/CODE&gt; command.  No wonder the &lt;CODE&gt;where&lt;/CODE&gt; was giving you nothing.&lt;/P&gt;

&lt;P&gt;Yes, the &lt;CODE&gt;where&lt;/CODE&gt; clause was unneeded once the &lt;CODE&gt;mwb_feeds&lt;/CODE&gt; was moved to a subsearch prior to the search, eliminating all records from unneeded &lt;CODE&gt;src_ip&lt;/CODE&gt;s before they were even selected.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 20:24:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299591#M5339</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T20:24:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299592#M5340</link>
      <description>&lt;P&gt;Okay, wait.  now that it's working, why are we &lt;CODE&gt;dedup&lt;/CODE&gt;ing the &lt;CODE&gt;src_ip&lt;/CODE&gt; but keeping &lt;CODE&gt;_time&lt;/CODE&gt;??? That's not intuitively obvious.  &lt;/P&gt;

&lt;P&gt;I believe, based on how splunk is defined as returning the most recent records first, that it will give you the most recent time connected for each host, but those values will not be aligned with the src_ip values that they are returned alongside.    Each set of values will have been sorted semantically.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 20:35:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299592#M5340</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T20:35:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299593#M5341</link>
      <description>&lt;P&gt;I did something like this for time &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=pan_logs sourcetype="pan:threat" [search sourcetype=mwb_feeds earliest=-168h@h | stats count by dest_hostname | table dest_hostname] earliest=-24h@h ) OR (sourcetype=mwb_feeds earliest=-168h@h ) | fields src_ip,dest_ip,url,dest_hostname,_time, Classification   | eventstats values(Classification) as Classification by dest_hostname   | dedup src_ip  | eval NEWTIME=strftime(_time, "%b %d")
   | replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
   | rename dest_hostname AS HOST | rename NEWTIME as "TIME CONNECTED" | stats values(src_ip) as "SOURCE IP",values(dest_ip) AS "DESTINATION IP",values(url) AS "WEBSITE" by "TIME CONNECTED",Classification,HOST | sort -"TIME CONNECTED"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Mar 2017 20:43:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299593#M5341</guid>
      <dc:creator>dmenon84</dc:creator>
      <dc:date>2017-03-31T20:43:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to make my query efficient ?</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299594#M5342</link>
      <description>&lt;P&gt;Given that code, deduping by &lt;CODE&gt;src_ip&lt;/CODE&gt; at that point makes no sense to me.  We have records that each connect a &lt;CODE&gt;src_ip&lt;/CODE&gt; to a &lt;CODE&gt;dest_ip&lt;/CODE&gt; at some particular point in time, and we're going to randomly take the first one for any given &lt;CODE&gt;src_ip&lt;/CODE&gt; and discard the rest?&lt;/P&gt;

&lt;P&gt;Also, just as a general practice, you should do all your renaming and reformatting at the very end, after any &lt;CODE&gt;stats&lt;/CODE&gt;-type commands (other than prepwork for the &lt;CODE&gt;stats&lt;/CODE&gt;, like binning &lt;CODE&gt;_time&lt;/CODE&gt; etc).  If you reformat each record, and then do &lt;CODE&gt;stats&lt;/CODE&gt;, then you've cost the computer extra work changing details that it could have just changed once after the sums.  &lt;/P&gt;

&lt;P&gt;One last thing - get in the habit of using &lt;CODE&gt;sort 0&lt;/CODE&gt; rather than just &lt;CODE&gt;sort&lt;/CODE&gt;.  Splunk has a default number of records to return from a sort, the top 100 or something, and if you don't explicitly tell it otherwise, you may lose some results without ever noticing.  &lt;/P&gt;

&lt;P&gt;Compare the output from this to the output from the &lt;CODE&gt;dedup&lt;/CODE&gt; and &lt;CODE&gt;sort&lt;/CODE&gt; version, and see whether anything significant was lost in the other version.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(index=pan_logs sourcetype="pan:threat" 
    [ search sourcetype=mwb_feeds earliest=-168h@h 
    | stats count by dest_hostname 
    | table dest_hostname] 
    earliest=-24h@h ) 
OR (sourcetype=mwb_feeds earliest=-168h@h ) 
| table src_ip, dest_ip, url, dest_hostname, _time, Classification   
| eventstats values(Classification) as Classification by dest_hostname  
| bin _time span=1d
| stats count as hitcount, values(url) as url, values(Classification) as Classification, values(src_ip) as src_ip , values(dest_ip) as dest_ip by dest_hostname _time
| replace "PSH" WITH "PHISHING", "EMD" WITH "ENGAGED IN MALWARE DISTRIBUTION", "PUP"  WITH "POTENTIALLY UNWANTED PROGRAM", "EXP" WITH "EXPLOITS", "FSA" WITH "FRAUDULENT SERVICES AND APPLICATIONS" IN Classification
| eval _time=strftime(_time, "%b %d")
| rename dest_hostname AS HOST, _time as "TIME CONNECTED", src_ip as "SOURCE IP", dest_ip AS "DESTINATION IP",  url AS "WEBSITE"
| sort 0 - "TIME CONNECTED"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Mar 2017 21:36:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/How-to-make-my-query-efficient/m-p/299594#M5342</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-03-31T21:36:41Z</dc:date>
    </item>
  </channel>
</rss>

