Splunk Search

Splunk query to join two searches

asharmaeqfx
Path Finder

Hi Splunkers,

 

I have a complex query to extract the IDs from first search and join it using that to the second search and then calculate the response times

index=xxxml source=module "matrix-v4" NOT host="xyz.dmz" NOT "somefield1" NOT "somefield2" "<nt3:overall-outcome>*</nt3:overall-outcome>" |where isnotnull(AccuiteCode) |xmlkv | eval MSUserid=AccessCode | eval source1="MS" | join ip [search index=xxxml source="/var/log/production.log" urlPath="/com/system*" "/org/system" NOT "somefield1" NOT "somefield2" | where isnotnull(accessCode) | eval ProdUserID=accessCode| eval source2="Prod"] | where source1="MS" AND source2="Prod" | eval responsetime = Latency/1000 | stats count(responsetime) as Requests avg(responsetime) perc50(responsetime) perc60(responsetime) perc70(responsetime) perc80(responsetime) perc90(responsetime) perc95(responsetime) perc99(responsetime) max(responsetime) by date_mday,date_month,rIdentifier,nt3:overall-outcome,accessCode 

This query only returns the first matched content but we have thousands to rows for the first query. It somehow unable to join it.

Kindly advise.

 

Thanks,

Amit

Labels (3)
0 Karma

MuS
Legend

Hi there,

Take a look at this answer https://community.splunk.com/t5/Splunk-Search/How-to-compare-fields-over-multiple-sourcetypes-withou... it provides examples how it can be done. Basically get a common field over all events and use stats to do the final calculations.

Hope this helps ...

cheers, MuS

0 Karma

asharmaeqfx
Path Finder

Does not helps much. Can you suggest like how to perform join in normal scenario.

0 Karma

thambisetty
SplunkTrust
SplunkTrust

can you try using transforming command (like stats, timechart or chart) before performing join 

make sure you have "ip" field available in subsearch and try to remove duplicates. when join is used as subsearch  in SPL, it has 50000 limitation on results it can return . if you can take only distinct results this would improve search and give you expected results otherwise you will end up seeing partial results.

————————————
If this helps, give a like below.
0 Karma

kennetkline
Path Finder

Would help to see like a single record Json of each source type;  This goes back to the one .conf talk; I have done this a lot us stats as stated. 

Here is how I would go about it;  search verbose to try an get to a single record of source you are looking to join.  I am making some assumption based on your search as posted.

Here is how i do this;  Iterative;  write it one line at a time;  Focus on getting a sample of 2 records to join;  (Hint don't use Join;  It was 2018 before I learned to do this; as I was stuck in SQL Mindset;  It took me a long time to deprogram myself from this bad habit)

1.  Write a single search to show two records to join; 
I am assuming you are not masking your intended search and index, and NOT somefield 1 2 is common across both searches:

2.  where (isnotnull)   I have found just say Field=*   (that removes any null records from the results.  Less results is more;  the more explicit to write the search better  (time, index, sourcetype, host, etc)  just don't wildcard a field=*<something> has to end with star for performance  field=<something>* or just field=* (not null)

(index=xxxml ((source=module "matrix-v4" NOT host="xyz.dmz" "<nt3:overall-outcome>*</nt3:overall-outcome>" AccuiteCode=* ) OR (source="/var/log/production.log" urlPath="/com/system*" "/org/system" accessCode=*)) NOT "somefield1" NOT "somefield2")

3.  you then are defining a sourc1 / 2 variable unique to the source type:  (something like this would work)

| eval sourceA=if(match(source,"module"), "MS", if(match(source,"/var/log/production.log"), "Prod", "Invalid"))

4.  What fields do you need to pass through for calculation:  use stats

I use the following all the time   (vaules(field) as field,  earliest(field) as field, first(field) as field, latest(field) as field, last(field) as field.

Also if you have the same field in different sourcetypes   you can do stuff to match in a stats.  I do this one all the time;  stats count(eval(match(sourcetype, "abc"))) as count_abc, etc.  I think can   stats (values(eval(match(sourcetype,"abc"))) as

I am not sure what all fields you need to pass through I believe the nt3 only is in the first source:

values is needed when only one of the two sources will have the data point

| stats values(nt3:overall-outcome) as nt3:overall-outcome, valuest(Latency) as latency by date_mday,date_month,rIdentifier,accessCode

5> add the response time

| eval responsetime = Latency/1000

6.   final calculation

| stats count(responsetime) as Requests avg(responsetime) perc50(responsetime) perc60(responsetime) perc70(responsetime) perc80(responsetime) perc90(responsetime) perc95(responsetime) perc99(responsetime) max(responsetime) by date_mday,date_month,rIdentifier,nt3:overall-outcome,accessCode


7:  put it all together.

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...