<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: simulating a SQL JOIN in Splunk in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10484#M472</link>
    <description>&lt;P&gt;Splunk 4.0 executes lookups more efficiently than &lt;CODE&gt;join&lt;/CODE&gt;. Making the lookup automatic lets you efficiently search using the &lt;CODE&gt;username&lt;/CODE&gt; (e.g. &lt;CODE&gt;source=costtable username=myname&lt;/CODE&gt;), while with &lt;CODE&gt;join&lt;/CODE&gt; you must execute the join on the entire &lt;CODE&gt;costtable&lt;/CODE&gt;. If there is no match on (e.g.) &lt;CODE&gt;userid&lt;/CODE&gt;, then &lt;CODE&gt;join&lt;/CODE&gt; will drop your &lt;CODE&gt;costtable&lt;/CODE&gt; row, while &lt;CODE&gt;lookup&lt;/CODE&gt; will keep it (&lt;CODE&gt;join&lt;/CODE&gt;=inner join vs &lt;CODE&gt;lookup&lt;/CODE&gt;=left join). Lookups can be transparently replaced with a script. Note, with &lt;CODE&gt;join&lt;/CODE&gt;, you need to specify option &lt;CODE&gt;max=0&lt;/CODE&gt;, or you'll only get one result per row in the &lt;CODE&gt;usertable&lt;/CODE&gt;.&lt;/P&gt;</description>
    <pubDate>Wed, 24 Mar 2010 21:31:05 GMT</pubDate>
    <dc:creator>gkanapathy</dc:creator>
    <dc:date>2010-03-24T21:31:05Z</dc:date>
    <item>
      <title>simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10480#M468</link>
      <description>&lt;P&gt;I have indexed the contents of a relational database along with a log file. My log contains these fields:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;cost - this is an integer that I want to report on)&lt;/LI&gt;
&lt;LI&gt;userid - this is an ID into the Users table that I've indexed from my database&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;My users table, now indexed in Splunk, contains these fields:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;userid - same ID as above&lt;/LI&gt;
&lt;LI&gt;username - human-readable text of their name&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I want to end up with a chart which shows the top 20 users ordered by how much they've spent (sum of cost field) over a specific time range (e.g. last 2 weeks). I want to show them by username, not by ID.  What's the best way to do this in Splunk?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2010 08:25:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10480#M468</guid>
      <dc:creator>Justin_Grant</dc:creator>
      <dc:date>2010-03-23T08:25:00Z</dc:date>
    </item>
    <item>
      <title>Re: simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10481#M469</link>
      <description>&lt;P&gt;The best way is to set your users table as a lookup. Periodically (or whenever you update it, or whenever you need it, run):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source=usertable | outputlookup usertable.csv
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You could also just skip putting the user table into the Splunk index and just export it as CSV and place it into a lookup table directly. If you define the lookup table in &lt;CODE&gt;transforms.conf&lt;/CODE&gt;, you can use that name rather than &lt;CODE&gt;usertable.csv&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Then:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source=logfile | lookup usertable.csv userid OUTPUT username | stats sum(cost) by username | sort 20 -sum(cost)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You could also set the lookup as an automatic lookup in &lt;CODE&gt;props.conf&lt;/CODE&gt; so it runs every time you view your log source (or sourcetype or whatever).&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2010 09:07:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10481#M469</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-03-23T09:07:50Z</dc:date>
    </item>
    <item>
      <title>Re: simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10482#M470</link>
      <description>&lt;P&gt;Alternatively you could use the JOIN command. Assuming this first search returns the usernames&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source=usertable | dedup username | fields + userid username
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And your cost data has the common field userid, you could use the following to join the outputs.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source=costtable | join userid [source=usertable | fields + userid username]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Bob&lt;/P&gt;

&lt;P&gt;&lt;IMG src="http://www.eqalis.com/Resources/EqalisLogo_trans.jpg" alt="alt text" /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2010 22:13:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10482#M470</guid>
      <dc:creator>BobM</dc:creator>
      <dc:date>2010-03-23T22:13:50Z</dc:date>
    </item>
    <item>
      <title>Re: simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10483#M471</link>
      <description>&lt;P&gt;how does a lookup approach differ (in performance, ease of management, etc.) from @Equalis's suggestion above to use the JOIN command?&lt;/P&gt;</description>
      <pubDate>Wed, 24 Mar 2010 01:24:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10483#M471</guid>
      <dc:creator>Justin_Grant</dc:creator>
      <dc:date>2010-03-24T01:24:24Z</dc:date>
    </item>
    <item>
      <title>Re: simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10484#M472</link>
      <description>&lt;P&gt;Splunk 4.0 executes lookups more efficiently than &lt;CODE&gt;join&lt;/CODE&gt;. Making the lookup automatic lets you efficiently search using the &lt;CODE&gt;username&lt;/CODE&gt; (e.g. &lt;CODE&gt;source=costtable username=myname&lt;/CODE&gt;), while with &lt;CODE&gt;join&lt;/CODE&gt; you must execute the join on the entire &lt;CODE&gt;costtable&lt;/CODE&gt;. If there is no match on (e.g.) &lt;CODE&gt;userid&lt;/CODE&gt;, then &lt;CODE&gt;join&lt;/CODE&gt; will drop your &lt;CODE&gt;costtable&lt;/CODE&gt; row, while &lt;CODE&gt;lookup&lt;/CODE&gt; will keep it (&lt;CODE&gt;join&lt;/CODE&gt;=inner join vs &lt;CODE&gt;lookup&lt;/CODE&gt;=left join). Lookups can be transparently replaced with a script. Note, with &lt;CODE&gt;join&lt;/CODE&gt;, you need to specify option &lt;CODE&gt;max=0&lt;/CODE&gt;, or you'll only get one result per row in the &lt;CODE&gt;usertable&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Mar 2010 21:31:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10484#M472</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-03-24T21:31:05Z</dc:date>
    </item>
    <item>
      <title>Re: simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10485#M473</link>
      <description>&lt;P&gt;It is easy to look at this problem and think that since this would definitely be a `join` in SQL, therefore you need to use the `join` command in Splunk. &lt;/P&gt;

&lt;P&gt;However you do not need the join command here and you will be much better off without it.  Not only is join slower for having to run the second search and have a second process getting events off disk, but its searches will quietly truncate at I think 50,000 events, and will quietly self-finalize in some number of seconds in limits.conf whichever comes first.  Both of these problems will bite you and it's usually easy to avoid joins by thinking about how to solve it with lookups or with the plain old stats command and some grouping.&lt;/P&gt;

&lt;P&gt;In your particular case I'll echo Gerald's answer that the lookup is probably the way to go, particularly if the userids and usernames dont change very much relative to the scale of the scheduled search maintaining the lookup table.  &lt;/P&gt;

&lt;P&gt;But here's how to do it just with search and stats.    One big OR clause in search mashes the two data sets together and then we do a little bit of stats command to merge it all into what we want.  &lt;/P&gt;

&lt;P&gt;Run this search over the last 2 weeks: &lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;source=costtable OR source=usertable | stats first(username) as name sum(cost) as total by userid | sort - total | fields username total&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Apr 2010 05:39:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10485#M473</guid>
      <dc:creator>sideview</dc:creator>
      <dc:date>2010-04-23T05:39:57Z</dc:date>
    </item>
    <item>
      <title>Re: simulating a SQL JOIN in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10486#M474</link>
      <description>&lt;P&gt;You might want to read "Splunk for SQL Users"...&lt;/P&gt;

&lt;P&gt;If you are familiar with SQL this quick comparison might be helpful to jump-starting you into using Splunk.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.innovato.com/splunk/SQLSplunk.html"&gt;http://www.innovato.com/splunk/SQLSplunk.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Apr 2010 06:58:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/simulating-a-SQL-JOIN-in-Splunk/m-p/10486#M474</guid>
      <dc:creator>carasso</dc:creator>
      <dc:date>2010-04-23T06:58:13Z</dc:date>
    </item>
  </channel>
</rss>

