For example i have such event
PassengerID=F123 Origin=LHR Destination=BER Flight=1121 DepartureDate=07AUG
PassengerID=2RE3 Origin=BER Destination=MOW Flight=5434 DepartureDate=08AUG
CheckinStatus=Success
So i use
|search CheckinStatus=Success
|stats dc(PassengerID) as "Checked IN" by Origin,Destination,Flight,DepartureDate
|sort by Flight
For case where only one flight segment and one passengerID it's work well.
But it didnt work as i want to, if one event got two or more flight segments.
In the statistics results, all possible combinations are counted and i have duplicates with wrong information like:
LHR - BER 1121 07AUG 2checkedin
LHR - BER 1121 08AUG 2checkedin
LHR - MOW 1121 07AUG 2checkedin
LHR - MOW 1121 08AUG 2checkedin
BER - BER 1121 07AUG 2checkedin
BER - BER 1121 08AUG 2checkedin
BER - MOW 1121 07AUG 2checkedin
BER - MOW 1121 08AUG 2checkedin
LHR - MOW 5434 07AUG 2checkedin
etc.
How i should count, that if one passenger registered on few flights at one time (one event)
Okay, the design of your incoming events is problematic, because if someone's trip has three segments, and they check in for the first, there is no way to identify whether they have checked in for the first, the second, or all three.
We're going to assume that Success indicates that all segments have been checked in.
We're also going to assume that PassengerID, Origin, Destination, Flight and DepartureDate are all multivalue fields that will always
have exactly the same number of members.
The following strategy zips together the relevant values into a single mv field, mvexpands that mv field into one record per item, pulls off the passenger ID and does the distinct count, then finally pulls each field apart again for presentation.
your base search
| search CheckinStatus=Success
| eval myFlightData = mvzip(mvzip(mvzip(mvzip(PassengerID,Origin,">>>"),Destination,","),Flight,","),DepartureDate,",")
| fields myFlightData
| mvexpand myFlightData
| makemv delim=">>>" myFlightData
| eval PassengerID = mvindex(myFlightData,0)
| eval myFlightData = mvindex(myFlightData,1)
| stats dc(PassengerID) as "Checked IN" by myFlightData
| rex field=myFlightData "^(?<Origin>[^,]+),(?<Destination>[^,]+),(?<Flight>[^,]+),(?<DepartureDate>.*)$"
| table Origin Destination Flight DepartureDate "Checked IN"
Okay, the design of your incoming events is problematic, because if someone's trip has three segments, and they check in for the first, there is no way to identify whether they have checked in for the first, the second, or all three.
We're going to assume that Success indicates that all segments have been checked in.
We're also going to assume that PassengerID, Origin, Destination, Flight and DepartureDate are all multivalue fields that will always
have exactly the same number of members.
The following strategy zips together the relevant values into a single mv field, mvexpands that mv field into one record per item, pulls off the passenger ID and does the distinct count, then finally pulls each field apart again for presentation.
your base search
| search CheckinStatus=Success
| eval myFlightData = mvzip(mvzip(mvzip(mvzip(PassengerID,Origin,">>>"),Destination,","),Flight,","),DepartureDate,",")
| fields myFlightData
| mvexpand myFlightData
| makemv delim=">>>" myFlightData
| eval PassengerID = mvindex(myFlightData,0)
| eval myFlightData = mvindex(myFlightData,1)
| stats dc(PassengerID) as "Checked IN" by myFlightData
| rex field=myFlightData "^(?<Origin>[^,]+),(?<Destination>[^,]+),(?<Flight>[^,]+),(?<DepartureDate>.*)$"
| table Origin Destination Flight DepartureDate "Checked IN"
Yep, i think almost like you, and used mvindex/mvfilter/mvzip and it was just what i want
now it seems like:
index=prod* (sourcetype=sb_sabre sabre_service=ACS_CheckInPassengerRQ sabre_direction="<<<") earliest=-1d@d latest=@d
| rex "(?m).*\<\?xml version=\"1.0\" encoding=\"UTF-8\"\?\>\n(?<sabre_xml_body><.*>)"
| spath input=sabre_xml_body | search "soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.Result.ns2:Status"=Success
| spath input=sabre_xml_body output=PassengerID path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:PassengerDetailList.ns2:PassengerDetail.ns2:PassengerID"
| spath input=sabre_xml_body output=Flight path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Flight"
| spath input=sabre_xml_body output=Airline path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Airline"
| spath input=sabre_xml_body output=Departure path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Origin"
| spath input=sabre_xml_body output=Arrival path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:PassengerDetailList.ns2:PassengerDetail.ns2:Destination"
| spath input=sabre_xml_body output=Date path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:DepartureDate"
|replace "2017-*" with * in Date
|eval Flight_1_1=mvindex(Flight,0)
|eval Flight_2_2=mvindex(Flight,1)
|eval Origin_1=mvindex(Departure,0)
|eval Origin_2=mvindex(Departure,1)
|eval Destination_1=mvindex(Arrival,0)
|eval Destination_2=mvindex(Arrival,1)
|eval Flight1="SU"." ".Flight_1_1
|eval Date_1=mvindex(Date,0)
|eval Date_2=mvindex(Date,1)
|eval Flight2="SU"." ".Flight_2_2." / ".Date_2
|eval Flight_1=mvzip(Flight1,Date_1," / ")
|eval Segment_1=mvzip(Origin_1,Destination_1,"-")
|eval segment2=mvzip(Origin_2,Destination_2,"-")
|eval Flight_2=if(mvcount(Flight2)>=0,Flight2," ")
|eval Segment_2=if(mvcount(segment2)>=0,segment2,"-")
|dedup PassengerID
|stats dc(PassengerID) as "Checkin_count" by Segment_1,Flight_1,Segment_2,Flight_2
|sort by Check_in_count |reverse
Huge code but it's my first expirience with Splunk)
Now i see records for any event (two segment that max what i found). With this report i see how people ussualy fly, also i see which composite flight is popular etc.
Next task will be to create real-time based comparsing current stats with previous to compare if last time one of composite flight was popular, but today is not and start to find out what wrong with such flights.
Main trouble, that not each flight is daily repeated, don't want to see 100% deviation with previous day if flight wasn't in schedule.
Thanks everyone for attention and help.
@Baguvik - thanks for posting your code. It's good for people to see how others solved their problems.
You've hardcoded only for two flights, based on what you found in the data. Unless that is a system-defined limitation, it's best to code it more generically.
You probably don't want to compare with "previous", meaning the immediately prior similar flight, but with the AVERAGE and xth percentile of prior similar flights over some length of time.
For that, you might want to look at putting your historical results into a summary index, so you can pull it out directly rather than re-calculating it every time. You might also consider daily loading it FROM the summary index into a lookup file of the exact values you want to compare against today, so that no calculation of prior stats is being done at all in real time - only the calculation of the realtime side of the equation.
Finally, do NOT use realtime unless the business NEED is actually to be realtime. Like, will someone need up-to-the-second information and act on it that exact second?
This use case seems like a typical report to browse a couple of times an hour to see what is going on. If something seems to be up, the user will spend some time poking around and thinking about what might be issues... and moment by moment data is not going to change what they do.
Pseudo-real-time is better for almost all similar business use cases. Basically, calculate a trailing stat every few minutes... 2,3,5,6,10,12,15 are useful numbers to consider, with 5 10 and 15 far-and-away the most common choices. The very most common choice I see for near-realtime is the 5 minute span, with the calculation delayed either 1,2 or 3 minutes to allow the events to get indexed. (The data from 12:00:00 to 12:04:59 gets calculated at 12:07:00 for example.)
Almost all near-real-time use cases do not require anything more than that.
So your final data doesn't have the passenger information in it. If it did, you could use the dedup
function on the passenger info to remove the duplicates, but you may need to figure out which flight is the one you want to count. Do you have a way of determining that?
Well,
[15/Aug/2017:23:53:49] ACS_CheckInPassengerRQ Servis1_id_1 Session_id <<< RESPONSE http://my_site/websvc
Response Data:
<?xml version="1.0" encoding="UTF-8"?>
<soap-env:Envelope xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/"><soap-env:Header><eb:MessageHeader xmlns:eb="http://www.ebxml.org/namespaces/messageHeader" eb:version="1.0" soap-env:mustUnderstand="1"><eb:From><eb:PartyId eb:type="URI">None</eb:PartyId></eb:From><eb:To><eb:PartyId eb:type="URI">SYSTEM</eb:PartyId></eb:To><eb:CPAId>SU</eb:CPAId><eb:ConversationId>Service_id</eb:ConversationId><eb:Service>ACS_CheckInPassengerRQ</eb:Service><eb:Action>ACS_CheckInPassengerRS</eb:Action><eb:MessageData><eb:MessageId>123456</eb:MessageId><eb:Timestamp>2017-08-15T20:53:49</eb:Timestamp><eb:RefToMessageId>54321</eb:RefToMessageId></eb:MessageData></eb:MessageHeader><wsse:Security xmlns:wsse="http://schemas.xmlsoap.org/ws/2002/12/secext"><wsse:BinarySecurityToken valueType="String" EncodingType="wsse:Base64Binary">sharedid</wsse:BinarySecurityToken></wsse:Security></soap-env:Header><soap-env:Body><ns3:ACS_CheckInPassengerRS xmlns:ns3="service_url_xxx" xmlns:ns2="http://services.xxx.com/STL/v3"><ItineraryPassengerList><ns2:ItineraryPassenger><ns2:ItineraryDetail><ns2:Airline>SU</ns2:Airline><ns2:Flight>101</ns2:Flight><ns2:Origin>JFK</ns2:Origin><ns2:DepartureDate>2017-08-16</ns2:DepartureDate><ns2:DepartureTime>02:25PM</ns2:DepartureTime><ns2:DepartureGate>GATE</ns2:DepartureGate><ns2:AircraftType>777</ns2:AircraftType></ns2:ItineraryDetail><ns2:PassengerDetailList><ns2:PassengerDetail><ns2:LineNumber>1</ns2:LineNumber><ns2:LastName>TEST</ns2:LastName><ns2:FirstName>TEST</ns2:FirstName><ns2:PassengerID>UniqueID_1</ns2:PassengerID><ns2:BookingClass>N</ns2:BookingClass><ns2:Cabin>Y</ns2:Cabin><ns2:Destination>SVO</ns2:Destination><ns2:DepartureGate>GATE</ns2:DepartureGate><ns2:Seat>25D</ns2:Seat><ns2:SmokingRowFlag>N</ns2:SmokingRowFlag><ns2:BoardingPassFlag>*</ns2:BoardingPassFlag><ns2:GroupCode>AR</ns2:GroupCode><ns2:GroupCount>2</ns2:GroupCount><ns2:PassengerType>F</ns2:PassengerType><ns2:CheckInNumber>90</ns2:CheckInNumber><ns2:BagCount>NB</ns2:BagCount><ns2:EditCodeList><ns2:EditCode>F</ns2:EditCode><ns2:EditCode>OB</ns2:EditCode><ns2:EditCode>WB</ns2:EditCode><ns2:EditCode>ET</ns2:EditCode><ns2:EditCode>DOCS</ns2:EditCode><ns2:EditCode>ETI</ns2:EditCode></ns2:EditCodeList><ns2:FreeTextInfoList><ns2:FreeTextInfo><ns2:TextLine><ns2:Text>NHPX</ns2:Text></ns2:TextLine></ns2:FreeTextInfo></ns2:FreeTextInfoList></ns2:PassengerDetail></ns2:PassengerDetailList></ns2:ItineraryPassenger><ns2:ItineraryPassenger><ns2:ItineraryDetail><ns2:Airline>SU</ns2:Airline><ns2:Flight>10</ns2:Flight><ns2:Origin>SVO</ns2:Origin><ns2:DepartureDate>2017-08-17</ns2:DepartureDate><ns2:DepartureTime>08:25AM</ns2:DepartureTime><ns2:DepartureGate>GATE</ns2:DepartureGate><ns2:AircraftType>321</ns2:AircraftType></ns2:ItineraryDetail><ns2:PassengerDetailList><ns2:PassengerDetail><ns2:LineNumber>1</ns2:LineNumber><ns2:LastName>TEST</ns2:LastName><ns2:FirstName>TEST</ns2:FirstName><ns2:PassengerID>UniqueID_2</ns2:PassengerID><ns2:BookingClass>N</ns2:BookingClass><ns2:Cabin>Y</ns2:Cabin><ns2:Destination>LED</ns2:Destination><ns2:DepartureGate>GATE</ns2:DepartureGate><ns2:Seat>11D</ns2:Seat><ns2:SmokingRowFlag>N</ns2:SmokingRowFlag><ns2:BoardingPassFlag>*</ns2:BoardingPassFlag><ns2:GroupCode>AE</ns2:GroupCode><ns2:GroupCount>2</ns2:GroupCount><ns2:PassengerType>F</ns2:PassengerType><ns2:CheckInNumber>8</ns2:CheckInNumber><ns2:BagCount>NB</ns2:BagCount><ns2:EditCodeList><ns2:EditCode>F</ns2:EditCode><ns2:EditCode>IB</ns2:EditCode><ns2:EditCode>WB</ns2:EditCode><ns2:EditCode>ET</ns2:EditCode><ns2:EditCode>DOCS</ns2:EditCode><ns2:EditCode>ETI</ns2:EditCode></ns2:EditCodeList><ns2:FreeTextInfoList><ns2:FreeTextInfo><ns2:TextLine><ns2:Text>NHPX</ns2:Text></ns2:TextLine></ns2:FreeTextInfo></ns2:FreeTextInfoList></ns2:PassengerDetail></ns2:PassengerDetailList></ns2:ItineraryPassenger></ItineraryPassengerList><Result messageId="uniq_3" timeStamp="2017-08-15T20:53:49.284Z"><ns2:Status>Success</ns2:Status><ns2:CompletionStatus>Complete</ns2:CompletionStatus><ns2:System>ACS-BSO</ns2:System></Result></ns3:ACS_CheckInPassengerRS></soap-env:Body></soap-env:Envelope>
host = xxxx
soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Flight = 101 soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Flight = 10
That what i got for true example, one passenger registered at two Flight
index=* (sourcetype=sb_sabre sabre_service=ACS_CheckInPassengerRQ sabre_direction="<<<") earliest=-1d@d latest=@d
| rex "(?m).*\<\?xml version=\"1.0\" encoding=\"UTF-8\"\?\>\n(?<.*>)"
| spath input=sabre_xml_body
| search "soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.Result.ns2:Status"=Success
| spath input=sabre_xml_body output=PartyID "soap-env:Envelope.soap-env:Header.eb:MessageHeader.eb:To.eb:PartyId"
| spath input=sabre_xml_body output=PassengerID path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:PassengerDetailList.ns2:PassengerDetail.ns2:PassengerID"
| spath input=sabre_xml_body output=Flight path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Flight"
| spath input=sabre_xml_body output=Airline path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Airline"
| spath input=sabre_xml_body output=Origin path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Origin"
| spath input=sabre_xml_body output=Destination path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:PassengerDetailList.ns2:PassengerDetail.ns2:Destination"
| spath input=sabre_xml_body output=Departure_date path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:DepartureDate"
| eval Trace=Origin."-".Destination
| eval Flight=Flight."/".Departure_date
| stats dc(PassengerID) as "Registered" by Trace,Flight
| sort by - "Flight"
This search, which I use, to display information that is understandable to the customer. How to be careful to use a dedup
that would not lose the necessary information
@Baguvik, I have re-posted your data and query with code button 101010
so that special characters do not escape.
Using next search now
index=* (sourcetype=sb_sabre sabre_service=ACS_CheckInPassengerRQ sabre_direction="<<<") earliest=-1d@d latest=@d
| rex "(?m).*\<\?xml version=\"1.0\" encoding=\"UTF-8\"\?\>\n(?<sabre_xml_body><.*>)"
| spath input=sabre_xml_body
| search "soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.Result.ns2:Status"=Success
| spath input=sabre_xml_body output=PartyID "soap-env:Envelope.soap-env:Header.eb:MessageHeader.eb:To.eb:PartyId"
| spath input=sabre_xml_body output=PassengerID path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:PassengerDetailList.ns2:PassengerDetail.ns2:PassengerID"
| spath input=sabre_xml_body output=Flight_1 path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Flight"
| spath input=sabre_xml_body output=Airline path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Airline"
| spath input=sabre_xml_body output=Departure path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:Origin"
| spath input=sabre_xml_body output=Arrival path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:PassengerDetailList.ns2:PassengerDetail.ns2:Destination"
| spath input=sabre_xml_body output=Departure_date path="soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:DepartureDate"
| eval Route=Departure."-".Arrival
| eval Flight=Airline." ".Flight
| stats dc(PassengerID) as "Checked in" by Route,Flight,Departure_date
| sort by - "Flight"
seem like it count every check in now. But if i try to search all result by clicking Flight cell value it will exclude results where two flight segment or more (because 2 airlines and flight records at one event). It is not a big trouble, because customer won't use this). I can use Flight_1 to find every event for any flight.
Try the following things:
1) Add success criteria to your base search. Filtering success events afterwards will impact performance
index=* (sourcetype=sb_sabre sabre_service=ACS_CheckInPassengerRQ sabre_direction="<<<") "<ns2:Status>Success</ns2:Status>" earliest=-1d@d latest=@d
2) Extract Departure Time as well to be added to your Departure Date
soap-env:Envelope.soap-env:Body.ns3:ACS_CheckInPassengerRS.ItineraryPassengerList.ns2:ItineraryPassenger.ns2:ItineraryDetail.ns2:DepartureTime
3) Use mvzip()
to concatenate fields you are interested in (even if they are multi-valued)
| eval Route=mvzip(Origin,Destination,"-")
| eval Flight=mvzip(Airline,Flight,"-")
| eval DapartureDateTime=mvzip(DepartureDate,DepartureTime," ")
4) Use eventstats to calculate distinct count of Passenger IDs by Flight Route DapartureDateTime and leaving the multivalued fields intact:
| table PassengerID Flight Route DapartureDateTime
| eventstats dc(PassengerID) as "Checked In" by Flight Route DapartureDateTime
PS: Please mask all sensitive information in mocked xml data you have provided like IDs and Security tokens etc.
Thanks for your help, but it still wrong, it count every combination and that is result for flight where 2 passenger registered at two segments in one time, so in result i see that for each flight counted 4 passengers, but it was only 2 passengers at one Flight and 2 at another (Right route SU 10 SVO-LED / SU 25 LED-SVO but it combine first Destination and second Origin values at SU-25 ROUTE and i got LED-LED route)
http://take.ms/2I4yb
http://take.ms/X3nwZ
P.S. If result Table will see like that it will be also right (as customer want to)
LED--(SU23/07-08-2017)-->SVO--(SU100/08-08-2017)-->JFK 5 passengers (five PassengerID for LED-SVO and five for SVO-JFK)
LED--(SU23/07-08-2017)-->SVO 15 passengers
For example:
There are people who fly by draft, for example, from Saint-Petersburg(LED) to New York(JFK) (via SVO), but other fly only direct to Moscow(SVO) if i divide and count such cases right it also will be nice option, but i don't know how to do that)
Sorry I am not able to get your original request. Since you are planning to aggregate events by Flight
, Route
, and DapartureDateTime
, I was expecting you wanted unique passenger count for the same. Please let me know what you want instead.
Your use case and the corresponding data/table output that you expect.
Further, for the final request if you need to link connecting flight as well, you would need to provide data for the same. We should be able to create a link to correlate these like using Booking ID or something. Please make sure you anonymize/mock data.
My goal to create report wich will show daily statistic about registered passengers for each Flight. But i have such logs wich have two Flight segment, and it's a trouble because duplicate same fields are a common story for xml and json. I can't determine that first half of PassengerIDs belongs to first Flight, and second part of PassengerIDs belong to another Flight, i have no unique field name for each flight.
I see two ways to solve this and both of them unreachable:
1.Split logging (one log - one Flight segment)
2.Give unique names for second segment fields like Origin_2/Destination_2/DepartueDate_2/PassengerID_2(can be more than 1) etc.
Split logging is not an option. Logs - primary, changing the primary is bad.
Do not rename for the same reason.
I hope that someone solve this case without logging changes
(I masked/replace all sensetive data, thanks for your worry)