Splunk Search

Join subsearch failure

victornajduch
Loves-to-Learn Everything

I have two different searches running against 2 different indexes to pull in realtime syslog data and enrich it with snmp polling data, like circuit information etc. 

My first search is looking for a specific syslog text and returning with all necessary results, while my second search is doing the exact same thing but does not show any stats. 

Each one of these searches and sub searches function individually so I don't understand why one works and not the other. The only ostensible change in either search are the explicit syslog text queries and manual evals to push into the table so I can't make sense of why it's failing. Any ideas or recommendations?

first example - (working)

syslog text - <28>Jun 29 18:22:25 DEVICE mib2d[2775]: SNMP_TRAP_LINK_DOWN: , ifAdminStatus up(1), ifOperStatus down(2), ifName ge-5/0/3

index=syslog "ifOperStatus down"

| rex field=_raw "ifName (?<ifDescr>.+)"
| eval deviceName = host
| eval TriggerDescription = message

| eval Environment="prod"
| eval SourceEventID = ""
| eval AlarmType = "Router"
| eval Domain = "XO"
| eval SourceSystem = "NI Splunk"
| eval SendtoNOC = "Y"
| eval EventStatus = "NEW"
| eval ProductName = "XO"
| eval ElementType = "Device"
| eval TriggerUnitsofMeasure = ""
| eval KPIMeasure = ""
| eval CaseDescription = "Backbone Interface Down"
| eval StateCode = "XO"
| eval Severity = "Major"

| lookup xo-cili-lookup device as deviceName output cili as NEID

| eval Port = ifDescr
| eval TriggerType = "Interface Down"
| eval Cause = TriggerType
| eval DeviceClli = NEID
| eval Vendor = "Juniper"
| eval model=case(match(deviceName, "MCR*|CIR*|mcr*|cir*"), "MX960", match(deviceName, "CTR*|ctr*"), "MX2020", match(deviceName, "RCA*|RCB*|rca*|rcb*"), "PTX5000", match(deviceName, "LCA*|LCB*|lca*|lcb*"), "PTX3000")
| eval DeviceModel = model

| join deviceName, ifDescr [search index=SNMP ifDescr=ae* OR ifDescr=et-* OR ifDescr=xe-* OR ifDescr=ge-* AND ifAlias=*bone*
| eval no_circuitid=""
| rex field=ifAlias ":(?<circuitID>\d+\s?\/[^\/]+[^\/]+\/[^\/]+\/[^\/|\:|\s]+)"
| eval circuitID=coalesce(circuitID, no_circuitid)
| eval AID = circuitID

| eval AlarmKey = deviceName." ".ifAlias." Down"
| stats latest(ifAlias) as ifAlias values latest(_time) as LatestAlertedTS, earliest(_time) as FirstAlertedTS by AlarmKey,deviceName, ifDescr, AID,circuitID]
| table Environment,AlarmKey,FirstAlertedTS, LatestAlertedTS,EventStatus,deviceName,ifDescr,NEID,AID,circuitID,Port, Severity, TriggerType,TriggerDescription,Cause, DeviceClli, Vendor, DeviceModel, SourceEventID, SourceSystem, Domain,ProductName, ElementType, TriggerUnitsofMeasure, KPIMeasure, CaseDescription, SendtoNOC, StateCode, AlarmType
Collapse | dedup AlarmKey

Second example (not working)

Syslog text - <28>Jun 29 18:56:38 DEVICE lfmd[17284]: LFMD_3AH_THRESHOLD_EVENT: Threshold event happened for ifd et-8/0/8(snmpid 525):

index=SYSLOG LFMD_3AH_THRESHOLD_EVENT

| rex field=_raw "event happened for ifd (?<ifDescr>\S+)\(snmpid"
| rare host
| eval deviceName = upper(host)
| eval TriggerDescription = message

| eval Environment="prod"
| eval SourceEventID = ""
| eval AlarmType = "Router"
| eval Domain = "XO"
| eval SourceSystem = "NI Splunk"
| eval SendtoNOC = "Y"
| eval EventStatus = "NEW"
| eval ProductName = "XO"
| eval ElementType = "Device"
| eval TriggerUnitsofMeasure = ""
| eval KPIMeasure = ""
| eval CaseDescription = "Backbone Interface Errors"
| eval StateCode = "XO"
| eval Severity = "Major"

| lookup xo-cili-lookup device as deviceName output cili as NEID

| eval Port = ifDescr
| eval TriggerType = "PCS Errors"
| eval Cause = TriggerType
| eval DeviceClli = NEID
| eval Vendor = "Juniper"
| eval model=case(match(deviceName, "MCR*|CIR*|mcr*|cir*"), "MX960", match(deviceName, "CTR*|ctr*"), "MX2020", match(deviceName, "RCA*|RCB*|rca*|rcb*"), "PTX5000", match(deviceName, "LCA*|LCB*|lca*|lcb*"), "PTX3000")
| eval DeviceModel = model

| join deviceName, ifDescr [search index=SNMP deviceName=RCA* OR deviceName=RCB* OR deviceName=LCA* OR deviceName=RCB* ifAlias=*bone*
| eval no_circuitid=""
| rex field=ifAlias ":(?<circuitID>\d+\s?\/[^\/]+[^\/]+\/[^\/]+\/[^\/|\:|\s]+)"
| eval circuitID=coalesce(circuitID, no_circuitid)
| eval AID = circuitID

| eval AlarmKey = deviceName." ".ifAlias." PCS Errors"

| stats latest(ifAlias) as ifAlias values latest(_time) as LatestAlertedTS, earliest(_time) as FirstAlertedTS by AlarmKey,deviceName, ifDescr, AID, circuitID]
| table Environment,AlarmKey,FirstAlertedTS, LatestAlertedTS,EventStatus,deviceName,ifDescr,NEID,AID,circuitID,Port, Severity, TriggerType,TriggerDescription,Cause, DeviceClli, Vendor, DeviceModel, SourceEventID, SourceSystem, Domain,ProductName, ElementType, TriggerUnitsofMeasure, KPIMeasure, CaseDescription, SendtoNOC, StateCode, AlarmType
Collapse

 




Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

join and subsearches have limits - perhaps you're exceeding those limits?

How many results does the join subsearch return - it will have a default 50,000 result limit and also time limits on the amount of time the subsearch can run.

 

0 Karma

victornajduch
Loves-to-Learn Everything

@bowesmana 
Running the subsearch individually (without adding a | dedup) produces 16700 events but I can trim that number down considerably by using dedup on ifAlias. Are we looking specifically for the number of results/events or how many potential results there were before filtering?

index=SNMP deviceName=RCA* OR deviceName=RCB* OR deviceName=LCA* OR deviceName=LCB* ifDescr=et-* ifAlias=*bone* | dedup ifAlias
| eval no_circuitid=""
| rex field=ifAlias ":(?<circuitID>\d+\s?\/[^\/]+[^\/]+\/[^\/]+\/[^\/|\:|\s]+)"
| eval circuitID=coalesce(circuitID, no_circuitid)
| eval AID = circuitID

| eval AlarmKey = deviceName." ".ifAlias." PCS Errors"

| stats latest(ifAlias) as ifAlias values latest(_time) as LatestAlertedTS, earliest(_time) as FirstAlertedTS by AlarmKey,deviceName, ifDescr, AID, circuitID


16,704 events (6/30/21 8:27:27.000 AM to 6/30/21 8:58:27.000 AM)
Tags (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Using dedup in your search looks like it may break the latest/earliest aggregations in your stats command as you will have filtered out a load of data. Without understanding your data though, I can't say for sure.

Anyway, the limits are the numbers of results passed to the join, so as you are doing the stats you are reducing the numbers.

Does reducing the scope of your search result in consistently correct results (i.e. using a smaller time window). If so, the issue is generally a subsearch limit. 

0 Karma

victornajduch
Loves-to-Learn Everything

@bowesmana 

 

I can restrict each search to explicit variables I know exist within a particular timeframe and this also fails. 

Example - 
search -          index=SYSLOG host=Router_1 LFMD_3AH_THRESHOLD_EVENT et-7/0/20
subsearch -   index=SNMP dataType=IP deviceName=Router_1 ifDescr=et-7/0/20 ifAlias=*bone*

 

This creates only 337 events under the main search and even fewer under the subsearch 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

OK, so it's diagnostics to work out why the join is not working.

Can you try something simple, like this, i.e. just do first search - table the join fields, append the second search and table those join fields and make sure that you get both data sets, with the Origin field differentiating the results

 

<<FIRST_SEARCH>>
| eval Origin="Initial Search"
| table Origin deviceName ifDescr

| append [search index=SNMP deviceName=RCA* OR deviceName=RCB* OR deviceName=LCA* OR deviceName=RCB* ifAlias=*bone*
| eval no_circuitid=""
| rex field=ifAlias ":(?<circuitID>\d+\s?\/[^\/]+[^\/]+\/[^\/]+\/[^\/|\:|\s]+)"
| eval circuitID=coalesce(circuitID, no_circuitid)
| eval AID = circuitID

| eval AlarmKey = deviceName." ".ifAlias." PCS Errors"

| stats latest(ifAlias) as ifAlias values latest(_time) as LatestAlertedTS, earliest(_time) as FirstAlertedTS by AlarmKey,deviceName, ifDescr, AID, circuitID
| eval Origin="Second Search"
| table Origin deviceName ifDescr
]

 

If that works, you can replace the 'append' with your 'join deviceName ifDescr' and see if that works.

I can't see anything that would obviously stop the search working. There are some slightly odd things you are doing, e.g.

| eval circuitID=coalesce(circuitID, no_circuitid)
| eval AID = circuitID

you don't need to create a field 'no_circuitid', just put "" there instead - also there seems little need to duplicate AID as circuitID and then use that in the split by 

Anyway, diagnosing situations where you get NO results - is often down to field names not being present in both data sets for the join, or case of the values - e.g. your deviceName is uppercase in your first search - but you are not upper casing that in your second search - that would make it fail if the second search has lower case deviceNames. Same with ifDescr - check that the case is the same.

 

 

 

0 Karma

victornajduch
Loves-to-Learn Everything

@bowesmana I got this partially figured out by adding lower to one of the evals because the devices within the snmp index are all in lower case - 

| eval deviceName = lower(host)

While this actually gets results to populate, I am now seeing only the top 2 results within the initial search. I'll tinker around with your recommendations later but if you have any ideas that would be greatly appreciated. 
Thanks. 

0 Karma

victornajduch
Loves-to-Learn Everything

@bowesmana 

Tried using append but this seemed to just show me results from both searches as opposed to joining the searches and outputting the common ifDescr field. I think this could work with the join function if I could figure out why I'm only getting the first two routers/device reporting their first search output. 
I played around with removing dedup and changing stats by removing any aggregated or time functions but still nothing. Any ideas?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

The point of this search is to diagnose why search 1 + search 2 are not joining correctly.

You can see the Origin value of each row so all you need to do is to find the same data from each origin and then compare the field names/case of the data, which should given you the reason why it's not joining correctly.

0 Karma

victornajduch
Loves-to-Learn Everything

@bowesmana 
It is joining correctly now that I set the deviceName - Host to (lower) 
I'm trying to figure out why it's only showing 2-3 results while it should have dozens. 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You should identify one that you know is missing and then adapt your search to understand why - you can put in the filters in both searches for the entity you know is missing and you can either use the append mechanism or use an outer join to see what the differences are that cause it not to show up

 

0 Karma
Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...