Splunk Search

append data from two indexes

mcohen13
Loves-to-Learn

i have two indexes:
index#1 contain raw event log.
from this event log i calc for every domain the number of events so that i have:
domain_name description Event_count
in this index i look at time span in the time span selection in search

index#2 contain aggregated information on domain meaning for each domain i have query_count on each day
in this index i calc more data for every domain_name
in this index i look at time span of last 30 days

what i want at the end of the day is that the query will return for me a table that will contain:
event_domain Event_count Dates_Count SumQueries MaxQueries avg30Days

i tried to use join but some domain that appear in index2 don't appear after the join

the query i use:
index="event_raw_data" | join event_domain [search index="domain_agg_info" earliest=-30d | eval epoch33days_ago=relative_time(now(), "-33d@d" ) | eval epochEventDays = strptime(date,"%Y-%m-%d") | where epochEventDays > epoch33days_ago | eventstats dc(date) as "Dates_Count" by event_domain| eventstats count(date) as "Record_count" by event_domain | eventstats max(query_count) as "MaxQueries" by date | eventstats max(Dates_Count) as "MaxDatesCount"| eventstats sum(query_count) as "SumQueries" by event_domain | eventstats avg(customer_count) as "AvgCustomerCount" by event_domain | eval AvgCustomerCount=round(AvgCustomerCount,0)| eval avg30Days=round(if(Record_count < 30,SumQueries/MaxDatesCount,SumQueries/Record_count)) | eval avg30Days=avg30Days+1 | eventstats max(query_count) as "MaxQueries" by event_domain | eval Ratio = round(MaxQueries/avg30Days,3) | where Ratio <= 5 ] | eventstats count as "Event_count" by event_domain | table event_domain,Event_count,Dates_Count,AvgCustomerCount,Ratio,SumQueries,MaxQueries,avg30Days | dedup event_domain,Event_count,Dates_Count,AvgCustomerCount,Ratio,SumQueries,MaxQueries,avg30Days | sort by Event_count desc | head 10

what i need to change to get this data for every domain on index1?

Tags (1)
0 Karma

Sukisen1981
Champion

hi, the default settings of join is inner, which is to say if you have 2 indexes A and B, you will get A intersection B if you use default join.
If you want ALL values for the first index you need to specify | join type=left.
I would advise you to look at other options in the join docs - http://docs.splunk.com/Documentation/Splunk/7.1.1/SearchReference/Join
AND
do you really need a join? it has its own limitations

0 Karma

mcohen13
Loves-to-Learn

thats not answering my problem
i am familiar with join type=left but that also doesn't work properly.
some domain names that are in both indexes (i checked) doesn't appear after the join (no matter which type of join)

0 Karma

Sukisen1981
Champion

how many rows of data is the join running on?
Also, you have a head10 at the end, could it be that the domain name you are expecting is getting trimmed by the head command?
Suggest, run the query without head and see the job inspector, if the join is not able to pick data due to large volumes the job inspector will state the same.
It will be helpful to get a mock of your data and expected output

0 Karma
Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...