Splunk Search

How to expand columns with mvfields if count of values are different for each column

Baguvik
Explorer

I ll show example it's much easier than explain:

index=* <base_search>
|eval Flight=mvzip(date,route,"/")
|eval Passenger=mvzip(Last,Name,Seat," / ")
|table _time,Field1,Field2 

In one event we can find one or two Flight segment but count of passengers can be 1-6.
In this example i show how my search react to event with Two flight segment and Two passengers wich checked in to both flight
so in result i got

    _time                                        Flight                    Passenger
  2017-09-01                                09-02/JFK-LHR                MR KEY / 15F
                                            09-02/LHR-CDG                MRS KEY /15E
                                                                         MR KEY / 10A
                                                                         MRS KEY / 10 B

If i mvexpand passenger field it will lead to duplicates of Flight, if I mvexpand flight it will show 4 passenegers for each flight.
My goal to connect first half of passengers with first flight, second part with second flight, than expand to create single record for each registereted passenger (I wan to create dashboard with searching by flight/LastName/Seat etc.)
so perfect result is:

_time            Flight             Passenger
2017-09-01    09-02/JFK-LHR        MR KEY / 15F
2017-09-01    09-02/JFK-LHR        MRS KEY / 15F
2017-09-01    09-02/LHR-CDG        MR KEY / 10A
2017-09-01    09-02/LHR-CDG        MRS KEY / 10B

That is main trouble, pls help with it.
The second minor question that is:
I got maximum two records of mvfield "Departure" airport in my event but "Destination" airport for each passenger
So for this example it seems like:

   Departure=JFK Departure=LHR  Destination=LHR Destination=LHR  Destination=CDG Destination=CDG 

When I trying to route=mvzip(Departure,Destination) it seems like:

 route=JFK-LHR route=LHR-LHR 

The first is correct but second value isn't correct, is it possible to ignore duplicates of Destination values, because i need only first value of Destination and first value of second half of values if there is two flight segment in one event?
Then i will use mvzip to get correct combinations of Departure-Destination.

0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

update - the problem is still in the design of your extract, but this will solve it for the moment.

This is coded for a maximum of 6 passenger seats per record, but updating the eval line that creates passsegs can take that up to any size. More than about 10 passengers would make it better to rewrite the code.

The assumption is that all seats for the first flight are located before all seats for the second flight, and all passengers on multi-segment flights continue on the second segment. Everything else is handled by the code.


| makeresults 
| eval mydata="09-02,09-02 JFK-LHR,LHR-CDG JOHN,JANE,JOHN,JANE KEYSTONE,KEYSTONE,KEYSTONE,KEYSTONE 15E,15F,10A,10B!!!!09-03,09-03 JFK-LHR,LHR-CDG ULAG,PIOTR,JANET,ULAG,PIOTR,JANET SMIRNOFF,NABOCEK,HENDERSON,SMIRNOFF,NABOCEK,HENDERSON 15D,15E,15F,10A,10B,10C!!!!09-04,09-04 JFK-LHR HUEY,DEWEY,LOUIE,SCROOGE DUCK,DUCK,DUCK,DUCK 15D,15E,15F,15C"
| makemv delim="!!!!" mydata
| mvexpand mydata
| makemv delim=" " mydata
| eval date=mvindex(mydata,0)
| eval _time= strptime("2017-".date,"%Y-%m-%d")
| eval route=mvindex(mydata,1)
| eval FirstName=mvindex(mydata,2)
| eval LastName=mvindex(mydata,3)
| eval Seat=mvindex(mydata,4)
| makemv delim="," date
| makemv delim="," route
| makemv delim="," FirstName
| makemv delim="," LastName
| makemv delim="," Seat
| fields - mydata
| rename COMMENT as "The above makes test data for 2*2, 3*2 and 4*1 cases." 

| rename COMMENT as "Count and label the route segments." 
| eval numsegs = mvcount(route)
| eval routesegs=mvrange(1,numsegs+1)
| eval Flight=mvzip(routesegs,mvzip(date,route," / "),": ") 

| rename COMMENT as "Count and label the passenger segments." 
| eval numpass=mvcount(Seat)/numsegs 
| eval passsegs=mvappend(mvindex(mvappend("0","1","1","1","1","1","1"),1,numpass),
        if(mvcount(route)>1,mvindex( mvappend("0","2","2","2"),1,numpass),null()),
        if(mvcount(route)>2,mvindex(mvappend("0","3","3"),1,numpass),null()))   
| eval Passenger=mvzip(passsegs,mvzip(mvzip(FirstName,LastName," "),Seat," / "),": ")
| table _time Flight Passenger
| mvexpand Flight
| mvexpand Passenger 
| where substr(Flight,1,2) = substr(Passenger,1,2)

Output as follows...

2017-09-02  1: 09-02 / JFK-LHR  1: JOHN KEYSTONE / 15E  
2017-09-02  1: 09-02 / JFK-LHR  1: JANE KEYSTONE / 15F  
2017-09-02  2: 09-02 / LHR-CDG  2: JOHN KEYSTONE / 10A  
2017-09-02  2: 09-02 / LHR-CDG  2: JANE KEYSTONE / 10B  
2017-09-03  1: 09-03 / JFK-LHR  1: ULAG SMIRNOFF / 15D  
2017-09-03  1: 09-03 / JFK-LHR  1: PIOTR NABOCEK / 15E  
2017-09-03  1: 09-03 / JFK-LHR  1: JANET HENDERSON / 15F  
2017-09-03  2: 09-03 / LHR-CDG  2: ULAG SMIRNOFF / 10A  
2017-09-03  2: 09-03 / LHR-CDG  2: PIOTR NABOCEK / 10B  
2017-09-03  2: 09-03 / LHR-CDG  2: JANET HENDERSON / 10C  
2017-09-04  1: 09-04 / JFK-LHR  1: HUEY DUCK / 15D  
2017-09-04  1: 09-04 / JFK-LHR  1: DEWEY DUCK / 15E  
2017-09-04  1: 09-04 / JFK-LHR  1: LOUIE DUCK / 15F  
2017-09-04  1: 09-04 / JFK-LHR  1: SCROOGE DUCK / 15C 

The problem is in your base search, or in your data creation.

You have accreted data together into Name and Seat that do not belong together, because the seats are not on the same plane. The seats from the first flight segment should not be in the same event and the same field as the seats from the second flight segment, without being connected somehow to which flight segment they are part of.

If your base search just selects the data, which is already in that (unfortunate) format, then let us know that and we can decide on a workaround. If it is in another format, such as the checkin format from this one ... https://answers.splunk.com/answers/564480/how-to-collect-statistics-if-same-field-can-have-m.html#an... ... then we can help you get it into a workable form.

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

update - the problem is still in the design of your extract, but this will solve it for the moment.

This is coded for a maximum of 6 passenger seats per record, but updating the eval line that creates passsegs can take that up to any size. More than about 10 passengers would make it better to rewrite the code.

The assumption is that all seats for the first flight are located before all seats for the second flight, and all passengers on multi-segment flights continue on the second segment. Everything else is handled by the code.


| makeresults 
| eval mydata="09-02,09-02 JFK-LHR,LHR-CDG JOHN,JANE,JOHN,JANE KEYSTONE,KEYSTONE,KEYSTONE,KEYSTONE 15E,15F,10A,10B!!!!09-03,09-03 JFK-LHR,LHR-CDG ULAG,PIOTR,JANET,ULAG,PIOTR,JANET SMIRNOFF,NABOCEK,HENDERSON,SMIRNOFF,NABOCEK,HENDERSON 15D,15E,15F,10A,10B,10C!!!!09-04,09-04 JFK-LHR HUEY,DEWEY,LOUIE,SCROOGE DUCK,DUCK,DUCK,DUCK 15D,15E,15F,15C"
| makemv delim="!!!!" mydata
| mvexpand mydata
| makemv delim=" " mydata
| eval date=mvindex(mydata,0)
| eval _time= strptime("2017-".date,"%Y-%m-%d")
| eval route=mvindex(mydata,1)
| eval FirstName=mvindex(mydata,2)
| eval LastName=mvindex(mydata,3)
| eval Seat=mvindex(mydata,4)
| makemv delim="," date
| makemv delim="," route
| makemv delim="," FirstName
| makemv delim="," LastName
| makemv delim="," Seat
| fields - mydata
| rename COMMENT as "The above makes test data for 2*2, 3*2 and 4*1 cases." 

| rename COMMENT as "Count and label the route segments." 
| eval numsegs = mvcount(route)
| eval routesegs=mvrange(1,numsegs+1)
| eval Flight=mvzip(routesegs,mvzip(date,route," / "),": ") 

| rename COMMENT as "Count and label the passenger segments." 
| eval numpass=mvcount(Seat)/numsegs 
| eval passsegs=mvappend(mvindex(mvappend("0","1","1","1","1","1","1"),1,numpass),
        if(mvcount(route)>1,mvindex( mvappend("0","2","2","2"),1,numpass),null()),
        if(mvcount(route)>2,mvindex(mvappend("0","3","3"),1,numpass),null()))   
| eval Passenger=mvzip(passsegs,mvzip(mvzip(FirstName,LastName," "),Seat," / "),": ")
| table _time Flight Passenger
| mvexpand Flight
| mvexpand Passenger 
| where substr(Flight,1,2) = substr(Passenger,1,2)

Output as follows...

2017-09-02  1: 09-02 / JFK-LHR  1: JOHN KEYSTONE / 15E  
2017-09-02  1: 09-02 / JFK-LHR  1: JANE KEYSTONE / 15F  
2017-09-02  2: 09-02 / LHR-CDG  2: JOHN KEYSTONE / 10A  
2017-09-02  2: 09-02 / LHR-CDG  2: JANE KEYSTONE / 10B  
2017-09-03  1: 09-03 / JFK-LHR  1: ULAG SMIRNOFF / 15D  
2017-09-03  1: 09-03 / JFK-LHR  1: PIOTR NABOCEK / 15E  
2017-09-03  1: 09-03 / JFK-LHR  1: JANET HENDERSON / 15F  
2017-09-03  2: 09-03 / LHR-CDG  2: ULAG SMIRNOFF / 10A  
2017-09-03  2: 09-03 / LHR-CDG  2: PIOTR NABOCEK / 10B  
2017-09-03  2: 09-03 / LHR-CDG  2: JANET HENDERSON / 10C  
2017-09-04  1: 09-04 / JFK-LHR  1: HUEY DUCK / 15D  
2017-09-04  1: 09-04 / JFK-LHR  1: DEWEY DUCK / 15E  
2017-09-04  1: 09-04 / JFK-LHR  1: LOUIE DUCK / 15F  
2017-09-04  1: 09-04 / JFK-LHR  1: SCROOGE DUCK / 15C 

The problem is in your base search, or in your data creation.

You have accreted data together into Name and Seat that do not belong together, because the seats are not on the same plane. The seats from the first flight segment should not be in the same event and the same field as the seats from the second flight segment, without being connected somehow to which flight segment they are part of.

If your base search just selects the data, which is already in that (unfortunate) format, then let us know that and we can decide on a workaround. If it is in another format, such as the checkin format from this one ... https://answers.splunk.com/answers/564480/how-to-collect-statistics-if-same-field-can-have-m.html#an... ... then we can help you get it into a workable form.

View solution in original post

0 Karma

Baguvik
Explorer

You right, event format is bad, actually it is same events like in https://answers.splunk.com/answers/564480/how-to-collect-statistics-if-same-field-can-have-m.html#an... and i cant separate one event into two if there two flight segment. Easy to manage event with only one flight segment, but hard to define fields if there is two flight segment.Can't change logs it's primal.

0 Karma

Baguvik
Explorer

Sorry, I forgot to say thanks ), your query was very helpful. Mvappend with segment number and |where substr triks helped me. Then i just used |rex to split necessary fields and add search rows with tokens for dashboard.And its working as i want. My result query:

|<my search and field extraction>
|replace "2017-*" with * in Date
|eval Me=mvappend("1","1","1","1","1","1")
|eval Ne=mvappend("2","2","2","2","2","2")
|eval Start_Time=strftime(_time,"%T")
|eval Start_Date=strftime(_time, "%Y-%m-%d") 
|eval num=mvcount(Flight) 
|eval Origin=mvindex(Departure,0)
|eval Origin_2=mvindex(Departure,1)
|eval Destination=mvindex(Arrival,0)
|eval Destination_2=mvindex(Arrival,-1)
|eval route=mvappend(mvzip(Origin,Destination,"-"),
(mvzip(Origin_2,Destination_2,"-")))
| eval numsegs= mvcount(route)
| eval routesegs=mvrange(1,numsegs+1)
| eval Flight5=mvzip(mvzip(routesegs,mvzip(Date,route,"#"),":"),Flight,"##")
|eval numseats=mvcount(Seat)/num
|eval numseats_2=numseats-1
|eval SeatClass=mvzip(Seat,Class_of_service,"#")
|eval passeg1=mvindex(SeatClass,0,numseats_2)
|eval passeg2=mvindex(SeatClass,numseats,-1)
|eval passegs=mvappend(mvzip(Me,passeg1,": "),
if(num>1,mvzip(Ne,passeg2,": "),null))
|eval FirstLast=mvzip(LastName,FirstName,"#")
|eval passenger=mvzip(passegs,FirstLast,";")
|table Start_Date,Start_Time,Flight5,passenger
| mvexpand Flight5
| mvexpand passenger
| where substr(Flight5,1,2) = substr(passenger,1,2)
| rex field=passenger "(?<CheckedInSeat>\w+)#(?<Class_Of_Service>\w+);(?<LastName>\w+)"#"(?<FirstName>\w+)"
|rex field=Flight5 "(?<Segment>\w+):(?<datem>\w+)-(?<dated>\w+)#(?<Departure>\w+)-(?<Destination>\w+)##(?<Flight>\w+)"
|eval Flight_Date=datem."-".dated
|fields - Flight5,passenger,datem,dated
|table  Last* First* Start* Fl*  Seg* Flight_Date Dep* Des* Class_Of_Service CheckedInSeat

DalJeanis
SplunkTrust
SplunkTrust

@Baguvik - thanks for posting the final code. It's great to see how things work out.

0 Karma

somesoni2
Revered Legend

Give this a try (assuming there is always two flight segments in your events, but number of passengers can be different, but will be an even number due to two flight segments)

index=* <base_search>
 |eval Flight=mvzip(date,route,mvrange(1,3),"/")
 |eval Passenger=mvzip(Last,Name,Seat," / ")
| mvexpand Flight 
| rex field=Flight "(?<Flight>.+),(?<sno>\d)" 
| eval Passenger1=if(sno=1, mvindex(Passenger,0,(mvcount(Passenger)/2)-1),mvindex(Passenger,(mvcount(Passenger)/2),-1)) | mvexpand Passenger1
 |table _time,Flight ,Passenger1 | rename Passenger1  as Passenger
0 Karma

Baguvik
Explorer

made a mistake in

   |table _time,Field1,Field2 

sorry, should be

|table _time,Flight,Passenger

0 Karma

somesoni2
Revered Legend

Can you provide how your data looks like from this search?

index=* <base_search>
 | table date,route,Last,Name,Seat
0 Karma

Baguvik
Explorer

As previous result but more columns: 4 results for Last name, First Name and Seat, 2 results for route, and one for _time

0 Karma

somesoni2
Revered Legend

So it won't be always 4 passengers and two routes, how do you decide on how many passengers (and which ones) belong to which route?

0 Karma

Baguvik
Explorer

count of passengers for first segment=count of passenger for second segment
That is same people but they registered in two flight segments at one time(at one log event)

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!