Hi Splunkers,
I have a complex query to extract the IDs from first search and join it using that to the second search and then calculate the response times
index=xxxml source=module "matrix-v4" NOT host="xyz.dmz" NOT "somefield1" NOT "somefield2" "<nt3:overall-outcome>*</nt3:overall-outcome>" |where isnotnull(AccuiteCode) |xmlkv | eval MSUserid=AccessCode | eval source1="MS" | join ip [search index=xxxml source="/var/log/production.log" urlPath="/com/system*" "/org/system" NOT "somefield1" NOT "somefield2" | where isnotnull(accessCode) | eval ProdUserID=accessCode| eval source2="Prod"] | where source1="MS" AND source2="Prod" | eval responsetime = Latency/1000 | stats count(responsetime) as Requests avg(responsetime) perc50(responsetime) perc60(responsetime) perc70(responsetime) perc80(responsetime) perc90(responsetime) perc95(responsetime) perc99(responsetime) max(responsetime) by date_mday,date_month,rIdentifier,nt3:overall-outcome,accessCode
This query only returns the first matched content but we have thousands to rows for the first query. It somehow unable to join it.
Kindly advise.
Thanks,
Amit
Hi there,
Take a look at this answer https://community.splunk.com/t5/Splunk-Search/How-to-compare-fields-over-multiple-sourcetypes-withou... it provides examples how it can be done. Basically get a common field over all events and use stats to do the final calculations.
Hope this helps ...
cheers, MuS
Does not helps much. Can you suggest like how to perform join in normal scenario.
can you try using transforming command (like stats, timechart or chart) before performing join
make sure you have "ip" field available in subsearch and try to remove duplicates. when join is used as subsearch in SPL, it has 50000 limitation on results it can return . if you can take only distinct results this would improve search and give you expected results otherwise you will end up seeing partial results.
Would help to see like a single record Json of each source type; This goes back to the one .conf talk; I have done this a lot us stats as stated.
Here is how I would go about it; search verbose to try an get to a single record of source you are looking to join. I am making some assumption based on your search as posted.
Here is how i do this; Iterative; write it one line at a time; Focus on getting a sample of 2 records to join; (Hint don't use Join; It was 2018 before I learned to do this; as I was stuck in SQL Mindset; It took me a long time to deprogram myself from this bad habit)
1. Write a single search to show two records to join;
I am assuming you are not masking your intended search and index, and NOT somefield 1 2 is common across both searches:
2. where (isnotnull) I have found just say Field=* (that removes any null records from the results. Less results is more; the more explicit to write the search better (time, index, sourcetype, host, etc) just don't wildcard a field=*<something> has to end with star for performance field=<something>* or just field=* (not null)
(index=xxxml ((source=module "matrix-v4" NOT host="xyz.dmz" "<nt3:overall-outcome>*</nt3:overall-outcome>" AccuiteCode=* ) OR (source="/var/log/production.log" urlPath="/com/system*" "/org/system" accessCode=*)) NOT "somefield1" NOT "somefield2")
3. you then are defining a sourc1 / 2 variable unique to the source type: (something like this would work)
| eval sourceA=if(match(source,"module"), "MS", if(match(source,"/var/log/production.log"), "Prod", "Invalid"))
4. What fields do you need to pass through for calculation: use stats
I use the following all the time (vaules(field) as field, earliest(field) as field, first(field) as field, latest(field) as field, last(field) as field.
Also if you have the same field in different sourcetypes you can do stuff to match in a stats. I do this one all the time; stats count(eval(match(sourcetype, "abc"))) as count_abc, etc. I think can stats (values(eval(match(sourcetype,"abc"))) as
I am not sure what all fields you need to pass through I believe the nt3 only is in the first source:
values is needed when only one of the two sources will have the data point
| stats values(nt3:overall-outcome) as nt3:overall-outcome, valuest(Latency) as latency by date_mday,date_month,rIdentifier,accessCode
5> add the response time
| eval responsetime = Latency/1000
6. final calculation
| stats count(responsetime) as Requests avg(responsetime) perc50(responsetime) perc60(responsetime) perc70(responsetime) perc80(responsetime) perc90(responsetime) perc95(responsetime) perc99(responsetime) max(responsetime) by date_mday,date_month,rIdentifier,nt3:overall-outcome,accessCode
7: put it all together.