Splunk Search
Highlighted

What alternative can be used for the join command in my search to avoid the subsearch limit?

Communicator

I am basically doing two searches where the results of the 1st search serves as input for the 2nd search.

There are also two sourcetypes: TICKETOPENED & TICKETCLOSED, both with a common field TICKETNUMBER. Search on TICKETOPENED is filtered using the field TYPE=EQUIPMENT, which is not on TICKETCLOSED. The resulting TICKETNUMBER from TICKETOPENED is used to do a subsearch on TICKETCLOSED. Search command below:

index=xyz sourcetype=TICKET_OPENED TYPE=EQUIPMENT | dedup TICKET_NUMBER | join TICKET_NUMBER [ search index=xyz sourcetype=TICKET_CLOSED | fields TICKET_NUMBER, CLOSE_DATE ] | table TICKET_NUMBER CREATE_DATE CLOSE_DATE

As from above, the fields CLOSEDATE from TICKETCLOSED and CREATEDATE from TICKETOPENED are also being extracted since the main purpose of the search is to find the open to close duration of tickets.

The problem is that the above search does not return all results. Using "All time"(Jan to now) only returns x events, however, some per month searches yield more than the all time. I've read that "join" has a certain limit. As additional info, TICKETNUMBER are in the millions (both for TICKETOPENED & TICKETCLOSED) and the TYPE=EQUIPMENT filter on TICKETOPENED narrows it to thousands.

Sorry for the long narrative and thank you in advance.

Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

SplunkTrust
SplunkTrust

Well, for many reasons you're better off taking a step back and trying to do this without join. Here's a way forward, although I have a couple questions and I'll have to come back and update this answer once or twice before we get there.

First, run this search -

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT ) OR sourcetype=TICKET_CLOSED | 

stats values(CREATEDATE) values(CLOSEDATE) by TICKET_NUMBER

questions - 1) For each row (ie each TICKETNUMBER), how many values come back for CREATEDATE and CLOSE DATE ? Is it just one of each per ticket? 2) what do the values of CREATEDATE and CLOSEDATE look like, exactly? If you have multiple values coming back for each TICKET_NUMBER here, then we'll have to convert them to epochtime values with the eval command, so we can give back only the earliest create date and the latest close date.

UPDATES:
1) Incorporating findings that there's only ever one CREATEDATE and one CLOSEDATE per ticket.
technically you can use earliest() and latest() instead of values() in the stats command. Since that's more common I've done that below. However there's a certain virtue in using values() even stil, because if that assumption ever breaks, you'll notice it right away.
2) Incorporating the requirement I missed before, -- that type=equipment is present only on the opened side of the equation, but that you need to only end up with those ticket numbers. Note that we pass along the type field to stats, and then filter after the aggregation.
3) Below I'm also calculating the number of days each ticket was open.

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT ) OR sourcetype=TICKET_CLOSED | 
stats latest(CREATE_DATE) as CREATE_DATE earliest(CLOSE_DATE) as CLOSE_DATE values(TYPE) as type by TICKET_NUMBER 
| search type="EQUIPMENT"
| eval created_epochtime=strptime(CREATE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval closed_epochtime=strptime(CLOSE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval secondsOpen=closed_epochtime-created_epochtime
| eval daysOpen=round(secondsOpen/(60*60*24),2) 

Extra ideas, directions.
4) if you want a graph of the average number of days tickets were open, tack a | timechart avg(daysOpen) on the end.

5) If the TYPE=EQUIPMENT events are only a tiny subset of the overall tickets, you might also use a subsearch in the search clause to improve efficiency, and I mean a true subsearch, not just square bracket subsearch syntax as implemented by join/append.

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT ) OR ( sourcetype=TICKET_CLOSED [ search index=xyz sourcetype=TICKET_OPENED TYPE=EQUIPMENT |dedup TICKET_NUMBER] | 
stats latest(CREATE_DATE) as CREATE_DATE earliest(CLOSE_DATE) as CLOSE_DATE values(TYPE) as type by TICKET_NUMBER 
| search type="EQUIPMENT"
| eval created_epochtime=strptime(CREATE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval closed_epochtime=strptime(CLOSE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval secondsOpen=closed_epochtime-created_epochtime
| eval daysOpen=round(secondsOpen/(60*60*24),2)
0 Karma
Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

Communicator

Thank you for the response. Yes, just one result each for CREATE_DATE and CLOSE DATE, sample result below:

TICKETNUMBER CREATEDATE CLOSE_DATE

12345607 2016-08-09 04:37:02.0202 2016-08-30 02:45:30.3030

12345608 2016-08-16 06:01:18.1818 2016-08-29 05:11:46.4646

12345609 2016-08-18 21:31:05.0505 2016-08-26 11:54:39.3939

12345610 2016-08-19 18:22:25.2525 2016-08-30 20:34:33.3333

12345611 2016-08-23 15:51:35.3535 2016-08-26 15:45:15.1515

12345612 2016-08-25 08:56:39.3939 2016-08-27 02:04:08.0808

12345613 2016-08-25 09:32:58.5858 2016-08-27 01:42:17.1717

12345614 2016-08-25 19:06:55.5555 2016-08-25 19:37:45.4545

CREATEDATE & CLOSEDATE are actually derived from the timestamps of the events.

0 Karma
Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

SplunkTrust
SplunkTrust

Oh cool. Well that makes it nice and simple. I assume you'd like to calculate the number of days each ticket was open, so I'll update my answer with that.

0 Karma
Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

Communicator

Thank you for the detailed explanation, however, this does not work as your suggested search pulls all TICKETCLOSED TICKETNUMBERs. Keep i mind that I am using sourcetype=TICKETOPENED TYPE=EQUIPMENT to filter TICKETNUMBER and use that as input for TICKETCLOSED. The result I am getting shows a lot of TICKETNUMBER without CREATE_DATEs.

0 Karma
Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

Super Champion

how about something like this? The remove eval will search for TICKETNUMBERS that only have a TICKETCLOSED

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT) OR (sourcetype=TICKET_CLOSED)
|stats latest(CREATE_DATE) as CREATE_DATE earliest(CLOSE_DATE) as CLOSE_DATE values(sourcetype) as sourcetypes dc(souretype) as numberSourcetypes by TICKET_NUMBER 
|eval remove=if(numberSourcetypes=1 AND sourcetypes="TICKET_CLOSED",1,0)
|search remove=0
0 Karma
Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

Communicator

Again, only TICKETNUMBER that are being filtered using the "TYPE=EQUIPMENT" of the sourcetype=TICKETOPENED are being used as input for sourcetype=TICKET_CLOSED.

The suggested searches are pulling tickets that do not have the primary criteria of sourcetype=TICKET_OPENED TYPE=EQUIPMENT and missing the point of finding an alternative for the subsearch I am doing while avoiding the limitation of the JOIN command.

0 Karma
Highlighted

Re: What alternative can be used for the join command in my search to avoid the subsearch limit?

Super Champion

I accidentally put |search remove=1 instead of |search remove=0

in the search I provided above, the initial search looks for TICKETOPEN events with TYPE=EQUIPMENT and any TICKETCLOSED events. the stats command should grab the latest CREATEDATE and the earliest CLOSEDATE for each TICKETNUMBER as well as which sourcetypes and how many sourcetypes reported for each TICKETNUMBER. By running the remove eval and searching for 0, you should be removing any TICKETNUMBERS that only have a TICKETCLOSED TICKETNUMBER and should be left with any TICKETNUMBER that has TYPE=EQUIPMENT.

If the above truly doesn't work and I'm not understanding clearly what the issue is, the only other option I can think of would be to reverse the joins, although you still might run into your limit issue.

index=xyz sourcetype=TICKET_CLOSED | fields TICKET_NUMBER, CLOSE_DATE| join TICKET_NUMBER [ search index=xyz sourcetype=TICKET_OPENED TYPE=EQUIPMENT | dedup TICKET_NUMBER  ] | table TICKET_NUMBER CREATE_DATE CLOSE_DATE
0 Karma