i have two indexes:
index#1 contain raw event log.
from this event log i calc for every domain the number of events so that i have:
domain_name description Event_count
in this index i look at time span in the time span selection in search
index#2 contain aggregated information on domain meaning for each domain i have query_count on each day
in this index i calc more data for every domain_name
in this index i look at time span of last 30 days
what i want at the end of the day is that the query will return for me a table that will contain:
event_domain Event_count Dates_Count SumQueries MaxQueries avg30Days
i tried to use join but some domain that appear in index2 don't appear after the join
the query i use:
index="event_raw_data" | join event_domain [search index="domain_agg_info" earliest=-30d | eval epoch33days_ago=relative_time(now(), "-33d@d" ) | eval epochEventDays = strptime(date,"%Y-%m-%d") | where epochEventDays > epoch33days_ago | eventstats dc(date) as "Dates_Count" by event_domain| eventstats count(date) as "Record_count" by event_domain | eventstats max(query_count) as "MaxQueries" by date | eventstats max(Dates_Count) as "MaxDatesCount"| eventstats sum(query_count) as "SumQueries" by event_domain | eventstats avg(customer_count) as "AvgCustomerCount" by event_domain | eval AvgCustomerCount=round(AvgCustomerCount,0)| eval avg30Days=round(if(Record_count < 30,SumQueries/MaxDatesCount,SumQueries/Record_count)) | eval avg30Days=avg30Days+1 | eventstats max(query_count) as "MaxQueries" by event_domain | eval Ratio = round(MaxQueries/avg30Days,3) | where Ratio <= 5 ] | eventstats count as "Event_count" by event_domain | table event_domain,Event_count,Dates_Count,AvgCustomerCount,Ratio,SumQueries,MaxQueries,avg30Days | dedup event_domain,Event_count,Dates_Count,AvgCustomerCount,Ratio,SumQueries,MaxQueries,avg30Days | sort by Event_count desc | head 10
what i need to change to get this data for every domain on index1?
hi, the default settings of join is inner, which is to say if you have 2 indexes A and B, you will get A intersection B if you use default join.
If you want ALL values for the first index you need to specify | join type=left.
I would advise you to look at other options in the join docs - http://docs.splunk.com/Documentation/Splunk/7.1.1/SearchReference/Join
AND
do you really need a join? it has its own limitations
thats not answering my problem
i am familiar with join type=left but that also doesn't work properly.
some domain names that are in both indexes (i checked) doesn't appear after the join (no matter which type of join)
how many rows of data is the join running on?
Also, you have a head10 at the end, could it be that the domain name you are expecting is getting trimmed by the head command?
Suggest, run the query without head and see the job inspector, if the join is not able to pick data due to large volumes the job inspector will state the same.
It will be helpful to get a mock of your data and expected output