Splunk Search

Search not giving correct results (Join,Append,Union)

punyanit
Path Finder

Hello All,
I am working the below search - When I am running these two main which joined using join command are giving me intended results. But, when the searches are joined, the results from the second search doesn’t come up and yields incorrect results. This "join Hour_Of_Day" join I have highlighted in the search Could anyone help or let me know my mistake.

Little description of the search,
This search captures user login information from VPC oss_logging of the all the users who have successfully authenticated via OKTA.

index=auto_vsc host=dfw* sourcetype=oss_logging userid | rex "userId\=(?P\w+)" | rex "userId\=\s+(?P\w+)"| rex "cuid\=(?P\w+)" | transaction userId source  maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)" |search userId!=null| eval time1=_time | eval time=round(time1)| convert ctime(time)| eval fullname =time.".".cuid| table fullname

Description: Once user is authenticated via OKTA and gathers information successfully from VPC. There are chances that iWarranty response fails(e.g., iWarranty DB is down) and this query gives failures results from iWarranty managed logs.

index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" | rex "ID\:\s(?P[^\s]+)\s" | rex "dealerId\"\:\"(?P\w*)" | rex "dealerBAC\"\:\"(?P\w*)" | rex "Address\:\s(?P.*)" | rex "cuid\"\:\"(?P\w*)" | rex "resultMessage\"\:\"(?P(.*?)(?=\"))" | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true | where requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-businessPartner/businesspartner/dealer/"   | search resultMessage="E9010:  Error – unable to establish a connection to the service.  If this continues, please call the NAMSC or your account manager." | eval customer=if(match(cuid, "^(?!G65UFBC9$)(?!BLINKER1$)(?!UserId00$)(?!BLINKER1$)(?!ROUTE001$)(?!Test1234$)(?!CARVANA1$)(?!CREDITIQ$)(?!SHIFT123$)([A-Z]+\d+.*)$"), "UI", "ThirdParty") | search customer="UI" | eval time=_time| convert ctime(time)| eval fullname =time.".".cuid| dedup fullname | table fullname | union [ search index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" | rex "ID\:\s(?P[^\s]+)\s" | rex "dealerId\"\:\"(?P\w*)" | rex "dealerBAC\"\:\"(?P\w*)" | rex "Address\:\s(?P.*)" | rex "cuid\"\:\"(?P\w*)" | rex "resultMessage\"\:\"(?P(.*?)(?=\"))" | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true | where requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-rating/dealer/programs"  | search resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSource is Suspended, cannot allocate resources to applications.." OR resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSourceWithTimeout is Suspended, cannot allocate resources to applications.." | eval customer=if(match(cuid, "^(?!G65UFBC9$)(?!BLINKER1$)(?!UserId00$)(?!BLINKER1$)(?!ROUTE001$)(?!Test1234$)(?!CARVANA1$)(?!CREDITIQ$)(?!SHIFT123$)(?!ABCDEFGH$)([A-Z]+\d+.*)$"), "UI", "ThirdParty")|search customer="UI" |  eval time1=_time | eval time=round(time1)|convert ctime(time)| eval fullname =time.".".cuid| eval fullname1 =time.".".dealer.".".cuid] | dedup fullname1 |table fullname

Description: Once user is authenticated via OKTA, there are chances that VPC request fails (e.g., session issue). This query gives failure results from VPC managed logs.

           index=auto_vsc sourcetype="*managed_log*" IllegalStateException earliest=-4w latest=@d | rex "session\sid\s\"(?P<mysession>\w+)" | rex "invalid\:\s(?P<mysession>\w+)" | join  mysession  [search index=auto_vsc host=dfw* sourcetype=oss_logging earliest=-4w latest=@d | rex "sessionId\s\=\s(?P<mysession>.*?)\!"  | rex "userId\s=\s(?P<userID>\w+)"| rex "cuid\=(?P<cuid>\w+)" | rex "sessionId\s\=\s(?P<mysession>\w+)"| transaction userId source  maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)"  ]| eval time=_time| convert ctime(time)|eval fullname =time.".".cuid | table fullname

Below is the full search:

index=auto_vsc host=dfw* sourcetype=oss_logging userid earliest=-4w latest=@d | rex "userId\=(?P\w+)" | rex "userId\=\s+(?P\w+)"| rex "cuid\=(?P\w+)" | transaction userId source  maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)" |search userId!=null| eval time1=_time | eval time=round(time1)| convert ctime(time)| eval fullname =time.".".cuid| table fullname   | append [search index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" earliest=-4w latest=@d  | rex "ID\:\s(?P[^\s]+)\s" | rex "dealerId\"\:\"(?P\w*)" | rex "dealerBAC\"\:\"(?P\w*)" | rex "Address\:\s(?P.*)" | rex "cuid\"\:\"(?P\w*)" | rex "resultMessage\"\:\"(?P(.*?)(?=\"))" | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true | where requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-businessPartner/businesspartner/dealer/"   | search resultMessage="E9010:  Error – unable to establish a connection to the service.  If this continues, please call the NAMSC or your account manager." | eval customer=if(match(cuid, "^(?!G65UFBC9$)(?!BLINKER1$)(?!UserId00$)(?!BLINKER1$)(?!ROUTE001$)(?!Test1234$)(?!CARVANA1$)(?!CREDITIQ$)(?!SHIFT123$)([A-Z]+\d+.*)$"), "UI", "ThirdParty") | search customer="UI" | eval time=_time| convert ctime(time)| eval fullname =time.".".cuid| dedup fullname | table fullname | union [ search index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" earliest=-4w latest=@d | rex "ID\:\s(?P[^\s]+)\s" | rex "dealerId\"\:\"(?P\w*)" | rex "dealerBAC\"\:\"(?P\w*)" | rex "Address\:\s(?P.*)" | rex "cuid\"\:\"(?P\w*)" | rex "resultMessage\"\:\"(?P(.*?)(?=\"))" | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true | where requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-rating/dealer/programs"  | search resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSource is Suspended, cannot allocate resources to applications.." OR resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSourceWithTimeout is Suspended, cannot allocate resources to applications.." | eval customer=if(match(cuid, "^(?!G65UFBC9$)(?!BLINKER1$)(?!UserId00$)(?!BLINKER1$)(?!ROUTE001$)(?!Test1234$)(?!CARVANA1$)(?!CREDITIQ$)(?!SHIFT123$)(?!ABCDEFGH$)([A-Z]+\d+.*)$"), "UI", "ThirdParty")|search customer="UI" |  eval time1=_time | eval time=round(time1)|convert ctime(time)| eval fullname =time.".".cuid| eval fullname1 =time.".".dealer.".".cuid] | dedup fullname1 |table fullname] |  append [search index=auto_vsc sourcetype="*managed_log*" IllegalStateException earliest=-4w latest=@d | rex "session\sid\s\"(?P\w+)" | rex "invalid\:\s(?P\w+)" | join  mysession  [search index=auto_vsc host=dfw* sourcetype=oss_logging earliest=-4w latest=@d | rex "sessionId\s\=\s(?P.*?)\!"  | rex "userId\s=\s(?P\w+)"| rex "cuid\=(?P\w+)" | rex "sessionId\s\=\s(?P\w+)"| transaction userId source  maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)"  ]| eval time=_time| convert ctime(time)|eval fullname =time.".".cuid | table fullname]| eventstats count as result by fullname | where result=1 | eval name=split(fullname, ".")| eval FieldB=mvindex(name,0)| eval FieldC=mvindex(name,1)|eval Hour_Of_Day = strftime(strptime(FieldB, "%m/%d/%Y %H:%M:%S"), "%H") |eval Week_Day = strftime(strptime(FieldB, "%m/%d/%Y %H:%M:%S"),"%a") |eval Today=strftime(now(),"%a")  |eval Current_Day=if(Week_Day=Today,Today,null())  |stats count AS "Total_Login" by Hour_Of_Day,Current_Day |eval DailyLogins=(Total_Login/4)  | stats values(DailyLogins) AS "Current_Day_Average" by Hour_Of_Day,Current_Day
| eval Current_Day_Average=ceil(Current_Day_Average)| ***join Hour_Of_Day***  [search  index=auto_vsc host=dfw* sourcetype=oss_logging userid earliest=-d@d latest=@d  | rex "userId\=(?P\w+)" | rex "userId\=\s+(?P\w+)"| rex "cuid\=(?P\w+)" | transaction userId source  maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)" |search userId!=null| eval time1=_time | eval time=round(time1)| convert ctime(time)| eval completename =time.".".cuid| table completename   | append [search index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" earliest=-d@d latest=@d | rex "ID\:\s(?P[^\s]+)\s" | rex "dealerId\"\:\"(?P\w*)" | rex "dealerBAC\"\:\"(?P\w*)" | rex "Address\:\s(?P.*)" | rex "cuid\"\:\"(?P\w*)" | rex "resultMessage\"\:\"(?P(.*?)(?=\"))" | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true | where requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-businessPartner/businesspartner/dealer/"   | search resultMessage="E9010:  Error – unable to establish a connection to the service.  If this continues, please call the NAMSC or your account manager." | eval customer=if(match(cuid, "^(?!G65UFBC9$)(?!BLINKER1$)(?!UserId00$)(?!BLINKER1$)(?!ROUTE001$)(?!Test1234$)(?!CARVANA1$)(?!CREDITIQ$)(?!SHIFT123$)([A-Z]+\d+.*)$"), "UI", "ThirdParty") | search customer="UI" | eval time=_time| convert ctime(time)| eval completename =time.".".cuid| dedup completename | table completename | union [ search index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" earliest=-d@d latest=@d | rex "ID\:\s(?P[^\s]+)\s" | rex "dealerId\"\:\"(?P\w*)" | rex "dealerBAC\"\:\"(?P\w*)" | rex "Address\:\s(?P.*)" | rex "cuid\"\:\"(?P\w*)" | rex "resultMessage\"\:\"(?P(.*?)(?=\"))" | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true | where requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-rating/dealer/programs"  | search resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSource is Suspended, cannot allocate resources to applications.." OR resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSourceWithTimeout is Suspended, cannot allocate resources to applications.." | eval customer=if(match(cuid, "^(?!G65UFBC9$)(?!BLINKER1$)(?!UserId00$)(?!BLINKER1$)(?!ROUTE001$)(?!Test1234$)(?!CARVANA1$)(?!CREDITIQ$)(?!SHIFT123$)(?!ABCDEFGH$)([A-Z]+\d+.*)$"), "UI", "ThirdParty")|search customer="UI" |  eval time1=_time | eval time=round(time1)|convert ctime(time)| eval completename =time.".".cuid| eval fullname1 =time.".".dealer.".".cuid] | dedup fullname1 |table completename] |  append [search index=auto_vsc sourcetype="*managed_log*" IllegalStateException earliest=-d@d latest=@d  | rex "session\sid\s\"(?P\w+)"| rex "invalid\:\s(?P\w+)"|  join mysession  [search index=auto_vsc host=dfw* sourcetype=oss_logging  earliest=-d@d latest=@d | rex field=_raw "sessionId\s\=\s(?P.*?)\!"  | rex "userId\s=\s(?P\w+)"| rex "cuid\=(?P\w+)" | rex "sessionId\s\=\s(?P\w+)"| transaction userId source  maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)"]| eval time=_time| convert ctime(time)|eval completename =time.".".cuid | table completename]| eventstats count as result by completename | where result=1 | eval name1=split(completename, ".")| eval FieldD=mvindex(name1,0)| eval FieldE=mvindex(name1,1)| eval Hour_Of_Day = strftime(strptime(FieldD, "%m/%d/%Y %H:%M:%S"), "%H") |eval time_date = strftime(strptime(FieldD, "%m/%d/%Y %H:%M:%S"),"%w")  |  stats count AS "Current_Day_Actual" by Hour_Of_Day, time_date |chart values(Current_Day_Actual) AS Current_Day_Actual by Hour_Of_Day] | eval DifferenceFromAverage=(Current_Day_Actual-Current_Day_Average) |  table Hour_Of_Day, Current_Day_Actual,Current_Day_Average,DifferenceFromAverage
0 Karma

Richfez
SplunkTrust
SplunkTrust

OK, I've done some looking into this search and I've spotted some places where it might be going wrong.

Splunk has a variety of command types. Some by default limit the number of items returned and others don't. Some are streaming distributed and can run on the indexers (fastest!) and others cannot but instead mean you have to pull all the data back to the search head and run it (much slower because you aren't spreading the work out!).

I think much of what's going on here is that you are using union, transaction, join and append all mixed up in such a way that occasionally records get truncated, ignored, dropped, or removed.

Join, union and append are all limited by various settings in your limits.conf before they give up and just return what they have and move on with life. Some commands, like transaction, aren't affected by this but WILL affect the other commands and convert them from distributable streaming to centralized streaming, which means in this case they're probably bringing back all results to the SH again and ... making stuff go weird. It is not generally recommended to alter limits.conf, though it can be done in some certain cases. A better way is to simply rewrite the search so that it's more efficient.

When I trim out all the rex and eval commands I can see the structure of your command, and it's very SQL like. VERY SQL like.

How Splunk and SQL work are entirely different. It's unfortunate, though, that you can write SPL more or less like you write SQL, except in Splunk it's nearly always the worst, slowest, most error prone way. As you are finding out. For instance, outside of training classes and the DB Connect app (where I write actual SQL because that's what it's for), in 5+ years I've never found a use case for join myself, and I've only come across a couple of places in answers/slack/irc where I think it was the right answer.

Also for those numbers of events, IMO (and this is a guess, but it is based on having working in splunk for half a decade) the searches you are running should only take a couple of seconds probably. Maybe tens of seconds, may only hundredths of seconds, but not minutes.

My take on how to fix this:

If I were you, the first thing I'd do is to move all the rex into search time automatic extractions. You have the regex, so you can either manually build your props.conf and whatnot (there are examples!), or you could use the interactive field extractor and just tell it you want to write your own extractions.

Please be sure to be in a specific app - create a new one if you must! - so that your configurations all remain in one area!

Then I'd start over from square one.

In the new way of doing this, my first take is to ask myself - would any of these smaller pieces be better served as a lookup?

Regardless of that answer, I'd then proceed using stats to group things instead of join and generally just trying to convert your process over to more Splunk native thinking than SQL. Transaction may or may not be needed as well - it's one of those commands that usually isn't needed, but sometimes it is.

It'll not be a fast or easy process - I feel like you probably have a fairly strong background in SQL, which makes this much harder in some ways. But the payoffs will be across your whole system- things will be less weird, less error prone, more consistent, and way, way faster.

0 Karma

Richfez
SplunkTrust
SplunkTrust

For instance, I was intrigued by what was actually in the search.

That first part (up to the second top-level append) could possibly be reduced to

index=auto_vsc host=dfw* sourcetype=oss_logging userid earliest=-4w latest=@d 
| transaction userId source maxspan=1s startswith="SAMLSessionInterceptor(getLoginUser:247)" 
| search userId!=null 
| append 
    [ search index=auto_iwarranty sourcetype="*managed_log*" "Content-Type: application/json" earliest=-4w latest=@d 
    | transaction ID host requestURL maxspan=10m "Content-Type: application/json" keepevicted=true 
    | search customer="UI" AND (requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-businessPartner/businesspartner/dealer/" AND resultMessage="E9010:  Error – unable to establish a connection to the service.  If this continues, please call the NAMSC or your account manager.") OR (requestURL="https://prism-ins-osb-prod.int.ally.com:443/cps-iw-services/services/cps-rating/dealer/programs" AND (resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSource is Suspended, cannot allocate resources to applications.." OR resultMessage="null:  java.sql.SQLException: weblogic.jdbc.extensions.PoolDisabledSQLException: weblogic.common.resourcepool.ResourceDisabledException: Pool CLMDataSourceWithTimeout is Suspended, cannot allocate resources to applications.."))
    | stats count BY fullname
    | fields - count ]
| append ....

And this is just a start.

What I do there is still use append and transaction, but instead of using union to connect in two separate pieces of the same set of data, I just search directly for what I need in this subset of data. I believe the transaction can probably be converted to a stats, but it'll take seeing some of the resulting data possibly, or at least a bit more work than I feel like doing in this first pass of the first section of the search.

Instead of dedup in your subsearch, I did a stats count BY fullname then removed the extra field. This is just as fast as dedup much of the time, and often it's faster. It's certain to be generally faster than using table.

Why did I do all of that? It's because your appended in two-sets-of-data were really all from the same source. It took a bit of extra search logic, but it wasn't like you didn't have that search logic already in there, you just had it scattered between two subsearches connected with a union that had only a single field involved (fullname). So I combined it into one.

It's VERY possible I got the actual words, quotes, pipes or something a bit off. I can't run your search, and if you'll noticed I depend hard on having all those extractions moved out of inline and into props.conf. Also do that with the evals, or put them back in? But from what I see they're not needed in there because you don't use that information inside there at all, but threw it all away with the table command you ran previously. So why bother with making Splunk do the work?

0 Karma

punyanit
Path Finder

Hi @rich7177,

Thank you so for your efforts , i really appreciate this.
i have removed union from the search- with the help of your comment.
i will start working in this again, will keep you updated

0 Karma

cmerriman
Super Champion

Based on your comment about the subsearch event count, you're hitting a limit that's truncating the results.
https://docs.splunk.com/Documentation/Splunk/7.3.1/SearchReference/Join

Join commands, such as join, append, etc. have a default limit set in limits.conf of 10000.

My suggestion is to see about either reworking your search to exclude the joins, moving the searches around so that the first subsearch is actually the base search (so long that the base search wouldn't hit the subsearch limit), or massively increase the limit in limits.conf. It isn't recommended to always increase limits.conf unless you understand the architecture of your environment and know that it can handle it.

0 Karma

Richfez
SplunkTrust
SplunkTrust

I added code tags to your code - I think there was a lot of special characters that were already eaten by the editor before I did so, so I'm not sure it was effective. You might want to double-check (or edit and repaste the code bits, but this time using the little code button in the tool bar for each).

0 Karma

Richfez
SplunkTrust
SplunkTrust

Speaking of which, I can take away those code tags again, but if i do it'll go back to being unreadable.

0 Karma

cmerriman
Super Champion

this is INCREDIBLY long and i thank you for taking so much time writing all of this, but i'm just going to ask you two simple question:
how long do these searches take individually
how many results do each of these searches yield individually?

0 Karma

punyanit
Path Finder

168,977 events- For the first search
8,673 events - For second search

And both searches are taking 1-2 mins of time

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...