Splunk Search

What alternative can be used for the join command in my search to avoid the subsearch limit?

christopheryu
Communicator

I am basically doing two searches where the results of the 1st search serves as input for the 2nd search.

There are also two sourcetypes: TICKET_OPENED & TICKET_CLOSED, both with a common field TICKET_NUMBER. Search on TICKET_OPENED is filtered using the field TYPE=EQUIPMENT, which is not on TICKET_CLOSED. The resulting TICKET_NUMBER from TICKET_OPENED is used to do a subsearch on TICKET_CLOSED. Search command below:

index=xyz sourcetype=TICKET_OPENED TYPE=EQUIPMENT | dedup TICKET_NUMBER | join TICKET_NUMBER [ search index=xyz sourcetype=TICKET_CLOSED | fields TICKET_NUMBER, CLOSE_DATE ] | table TICKET_NUMBER CREATE_DATE CLOSE_DATE

As from above, the fields CLOSE_DATE from TICKET_CLOSED and CREATE_DATE from TICKET_OPENED are also being extracted since the main purpose of the search is to find the open to close duration of tickets.

The problem is that the above search does not return all results. Using "All time"(Jan to now) only returns x events, however, some per month searches yield more than the all time. I've read that "join" has a certain limit. As additional info, TICKET_NUMBER are in the millions (both for TICKET_OPENED & TICKET_CLOSED) and the TYPE=EQUIPMENT filter on TICKET_OPENED narrows it to thousands.

Sorry for the long narrative and thank you in advance.

cmerriman
Super Champion

how about something like this? The remove eval will search for TICKET_NUMBERS that only have a TICKET_CLOSED

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT) OR (sourcetype=TICKET_CLOSED)
|stats latest(CREATE_DATE) as CREATE_DATE earliest(CLOSE_DATE) as CLOSE_DATE values(sourcetype) as sourcetypes dc(souretype) as numberSourcetypes by TICKET_NUMBER 
|eval remove=if(numberSourcetypes=1 AND sourcetypes="TICKET_CLOSED",1,0)
|search remove=0
0 Karma

christopheryu
Communicator

Again, only TICKET_NUMBER that are being filtered using the "TYPE=EQUIPMENT" of the sourcetype=TICKET_OPENED are being used as input for sourcetype=TICKET_CLOSED.

The suggested searches are pulling tickets that do not have the primary criteria of sourcetype=TICKET_OPENED TYPE=EQUIPMENT and missing the point of finding an alternative for the subsearch I am doing while avoiding the limitation of the JOIN command.

0 Karma

cmerriman
Super Champion

I accidentally put |search remove=1 instead of |search remove=0

in the search I provided above, the initial search looks for TICKET_OPEN events with TYPE=EQUIPMENT and any TICKET_CLOSED events. the stats command should grab the latest CREATE_DATE and the earliest CLOSE_DATE for each TICKET_NUMBER as well as which sourcetypes and how many sourcetypes reported for each TICKET_NUMBER. By running the remove eval and searching for 0, you should be removing any TICKET_NUMBERS that only have a TICKET_CLOSED TICKET_NUMBER and should be left with any TICKET_NUMBER that has TYPE=EQUIPMENT.

If the above truly doesn't work and I'm not understanding clearly what the issue is, the only other option I can think of would be to reverse the joins, although you still might run into your limit issue.

index=xyz sourcetype=TICKET_CLOSED | fields TICKET_NUMBER, CLOSE_DATE| join TICKET_NUMBER [ search index=xyz sourcetype=TICKET_OPENED TYPE=EQUIPMENT | dedup TICKET_NUMBER  ] | table TICKET_NUMBER CREATE_DATE CLOSE_DATE
0 Karma

sideview
SplunkTrust
SplunkTrust

Well, for many reasons you're better off taking a step back and trying to do this without join. Here's a way forward, although I have a couple questions and I'll have to come back and update this answer once or twice before we get there.

First, run this search -

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT ) OR sourcetype=TICKET_CLOSED | 

stats values(CREATE_DATE) values(CLOSE_DATE) by TICKET_NUMBER

questions - 1) For each row (ie each TICKET_NUMBER), how many values come back for CREATE_DATE and CLOSE DATE ? Is it just one of each per ticket? 2) what do the values of CREATE_DATE and CLOSE_DATE look like, exactly? If you have multiple values coming back for each TICKET_NUMBER here, then we'll have to convert them to epochtime values with the eval command, so we can give back only the earliest create date and the latest close date.

UPDATES:
1) Incorporating findings that there's only ever one CREATE_DATE and one CLOSE_DATE per ticket.
technically you can use earliest() and latest() instead of values() in the stats command. Since that's more common I've done that below. However there's a certain virtue in using values() even stil, because if that assumption ever breaks, you'll notice it right away.
2) Incorporating the requirement I missed before, -- that type=equipment is present only on the opened side of the equation, but that you need to only end up with those ticket numbers. Note that we pass along the type field to stats, and then filter after the aggregation.
3) Below I'm also calculating the number of days each ticket was open.

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT ) OR sourcetype=TICKET_CLOSED | 
stats latest(CREATE_DATE) as CREATE_DATE earliest(CLOSE_DATE) as CLOSE_DATE values(TYPE) as type by TICKET_NUMBER 
| search type="EQUIPMENT"
| eval created_epochtime=strptime(CREATE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval closed_epochtime=strptime(CLOSE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval secondsOpen=closed_epochtime-created_epochtime
| eval daysOpen=round(secondsOpen/(60*60*24),2) 

Extra ideas, directions.
4) if you want a graph of the average number of days tickets were open, tack a | timechart avg(daysOpen) on the end.

5) If the TYPE=EQUIPMENT events are only a tiny subset of the overall tickets, you might also use a subsearch in the search clause to improve efficiency, and I mean a true subsearch, not just square bracket subsearch syntax as implemented by join/append.

index=xyz (sourcetype=TICKET_OPENED TYPE=EQUIPMENT ) OR ( sourcetype=TICKET_CLOSED [ search index=xyz sourcetype=TICKET_OPENED TYPE=EQUIPMENT |dedup TICKET_NUMBER] | 
stats latest(CREATE_DATE) as CREATE_DATE earliest(CLOSE_DATE) as CLOSE_DATE values(TYPE) as type by TICKET_NUMBER 
| search type="EQUIPMENT"
| eval created_epochtime=strptime(CREATE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval closed_epochtime=strptime(CLOSE_DATE,"%Y-%m-%d %H:%M:%S.%Q")
| eval secondsOpen=closed_epochtime-created_epochtime
| eval daysOpen=round(secondsOpen/(60*60*24),2)
0 Karma

christopheryu
Communicator

Thank you for the detailed explanation, however, this does not work as your suggested search pulls all TICKET_CLOSED TICKET_NUMBERs. Keep i mind that I am using sourcetype=TICKET_OPENED TYPE=EQUIPMENT to filter TICKET_NUMBER and use that as input for TICKET_CLOSED. The result I am getting shows a lot of TICKET_NUMBER without CREATE_DATEs.

0 Karma

christopheryu
Communicator

Thank you for the response. Yes, just one result each for CREATE_DATE and CLOSE DATE, sample result below:

TICKET_NUMBER CREATE_DATE CLOSE_DATE

12345607 2016-08-09 04:37:02.0202 2016-08-30 02:45:30.3030

12345608 2016-08-16 06:01:18.1818 2016-08-29 05:11:46.4646

12345609 2016-08-18 21:31:05.0505 2016-08-26 11:54:39.3939

12345610 2016-08-19 18:22:25.2525 2016-08-30 20:34:33.3333

12345611 2016-08-23 15:51:35.3535 2016-08-26 15:45:15.1515

12345612 2016-08-25 08:56:39.3939 2016-08-27 02:04:08.0808

12345613 2016-08-25 09:32:58.5858 2016-08-27 01:42:17.1717

12345614 2016-08-25 19:06:55.5555 2016-08-25 19:37:45.4545

CREATE_DATE & CLOSE_DATE are actually derived from the timestamps of the events.

0 Karma

sideview
SplunkTrust
SplunkTrust

Oh cool. Well that makes it nice and simple. I assume you'd like to calculate the number of days each ticket was open, so I'll update my answer with that.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...