Situation : I have fields sessionId and personName. This session ID has many-to-may mapping with personName.
Need is : I want the count of personName associated with sessionId.
Query I am using : | table sessionId, personName, it gives following
sessionId personName
1. 1234 Name1
2. 1234 Name 2
3. 1234 Name1
4. 1234 Name3
5. 4321 Name1
6. 4321 Name3
7. 4321 Name3
What I want :
sessionId personName count
1. 1234 Name1 2
2. 1234 Name2 1
3. 1234 Name3 1
4. 4321 Name1 1
5. 4321 Name3 2
If I'm correctly understanding your goal, you want to be able to see the top x number of personName
values per sessionId
. Try this:
your base search | top 2 personName BY sessionId showperc=false
In this example, I used x=2
, but you can replace the 2 there with another number and, depending on your use case, perhaps replace it with a token that is set elsewhere.
sorry, I am not getting the logic of you query. You can ref. above post for detailed explanation, I just posted
Did this work for you? Or did you find another solution?
As I understand it, you want to:
1. Collapse instances where the sessionId and personName are the same, appending to the end of such collapsed instance the count of times they were the same (transforming, for example, seven identical lines of 92S38LP31QG4E93Z0A0 NAME1
into 92S38LP31QG4E93Z0A0 NAME1 7
)
2. Sort the resulting events for each sessionId in descending order by the count values
3. Set a threshold of events to retain for each sessionId - top 2, top 3, etc. So if you have a total of five sessionIds in the original data, then you will have a maximum of 5x events retained (possibly less than 5x if, for example, you set a threshold of 3 but you only have 2 personName values for a particular sessionId).
So the logic of the query I posted is quite simple: top
does all three at once. It applies a count value of personName instances per sessionId, sorts the events in descending order within each sessionId, and then retains the top x items.
Have you tried applying my query to your data to see if it works? In my sample data, it achieved what I detailed here.
Maybe it would help to have a shared dataset that we could reference to track your goals against the behavior of the query. Here's a run-anywhere command that will generate >75 events with sessionId and personName values:
| makeresults count=5 | eval sessionId=1234, personName="Name1" | append [ | makeresults count=4 | eval sessionId=1234, personName="Name2" ] | append [ | makeresults count=6 | eval sessionId=1234, personName="Name3" ] | append [ | makeresults count=5 | eval sessionId=4321, personName="Name1" ] | append [ | makeresults count=4 | eval sessionId=4321, personName="Name2" ] | append [ | makeresults count=6 | eval sessionId=4321, personName="Name3" ] | append [ | makeresults count=9 | eval sessionId=5678, personName="Name1" ] | append [ | makeresults count=54 | eval sessionId=5678, personName="Name2" ]
If I append to that command the following:
| top 2 personName BY sessionId showperc=false
The result is a table like this:
sessionId personName count
1234 Name3 6
1234 Name1 5
4321 Name3 6
4321 Name1 5
5678 Name2 54
5678 Name1 9
Does that match your goal/requirements?
I also want to have a where clause -- where I want all the sessionId with highest number of occurrence. And this occurrence value should be configurable.
So, lets say, if I want the occurrence threshold is 3 I should all the below three
1. 1234 Name1 20
2. 1234 Name2 15
3. 1234 Name3 1
4. 4321 Name1 2
5. 4321 Name3 6
6. 4321 Name3 9
if I want the occurrence threshold is 2 I should all the below three
1. 1234 Name1 20
2. 1234 Name2 15
3. 4321 Name3 6
4. 4321 Name3 9
So, if there is sessionID (lets say 3243) with just 1 occurrence, then it should not come in the result.
Try the updated answer.
You can just replace | table sessionId, personName
with | stats count by sessionId, personName
in your search.
Update
User following for your sorting and filter requirement.
your base search
| stats count by sessionId, personName
| where count>3
| sort -count
I also want to have a where clause -- where I want all the sessionId with highest number of occurrence. And this occurrence value should be configurable.
So, lets say, if I want the occurrence threshold is 3 I should all the below three
1. 1234 Name1 20
2. 1234 Name2 15
3. 1234 Name3 1
4. 4321 Name1 2
5. 4321 Name3 6
6. 4321 Name3 9
if I want the occurrence threshold is 2 I should all the below three
1. 1234 Name1 20
2. 1234 Name2 15
3. 4321 Name3 6
4. 4321 Name3 9
So, if there is sessionID (lets say 3243) with just 1 occurrence, then it should not come in the result.
No, this doesn't help -
It gives something like this , assuming the count is more that 10:
1. 1234 Name1 20
2. 4321 Name2 15
3. 1234 Name3 1
4. 4321 Name1 2
5. 4321 Name3 6
6. 4321 Name3 9
Change the where clause according to the threshold you want to use.
this is not helping either. Basically I want the key (sessionId in our case) to be repetitive if there are different personName occurring multiple times
The query your base search | stats count by sessionId, personName
does do that. Could you post the exact query you're using, corresponding output (which I believe have issues) and corresponding expected output?
92S38LP31QG4E93Z0A0 NAME1
92S38LP31QG4E93Z0A0 NAME1
92S38LP31QG4E93Z0A0 NAME1
92S38LP31QG4E93Z0A0 NAME2
92S38LP31QG4E93Z0A0 NAME2
92S38LP31QG4E93Z0A0 NAME2
92S38LP31QG4E93Z0A0 NAME2
92S38LP31QG4E93Z0A0 NAME2
92S38LP31QG4E93Z0A0 NAME3
92S38LP31QG4E93Z0A0 NAME3
92S38LP31QG4E93Z0A0 NAME3
92S38LP31QG4E93Z0A0 NAME3
92S38LP31QG4E93Z0A0 NAME3
1826SSQ98518QAID99A NAME1
1826SSQ98518QAID99A NAME2
1826SSQ98518QAID99A NAME3
1826SSQ98518QAID99A NAME4
1826SSQ98518QAID99A NAME3
1826SSQ98518QAID99A NAME2
1826SSQ98518QAID99A NAME2
1826SSQ98518QAID99A NAME3
0LKJ68OP19A8865A92Q NAME1
0LKJ68OP19A8865A92Q NAME1
0LKJ68OP19A8865A92Q NAME2
0LKJ68OP19A8865A92Q NAME2
0LKJ68OP19A8865A92Q NAME4
92S38LP31QG4E93Z0A0 NAME1 3
92S38LP31QG4E93Z0A0 NAME2 5
92S38LP31QG4E93Z0A0 NAME3 5
1826SSQ98518QAID99A NAME1 1
1826SSQ98518QAID99A NAME2 3
1826SSQ98518QAID99A NAME3 3
1826SSQ98518QAID99A NAME4 1
0LKJ68OP19A8865A92Q NAME1 2
0LKJ68OP19A8865A92Q NAME2 2
0LKJ68OP19A8865A92Q NAME4 1
Now, if I say my occurence count is 2 then I should get
92S38LP31QG4E93Z0A0 NAME2 5
92S38LP31QG4E93Z0A0 NAME3 5
1826SSQ98518QAID99A NAME2 3
1826SSQ98518QAID99A NAME3 2
0LKJ68OP19A8865A92Q NAME1 2
0LKJ68OP19A8865A92Q NAME2 2
Instead what I am getting is --> coz it is just checking the count(end column). But I want to check the occurence count of the sessionID
92S38LP31QG4E93Z0A0 NAME1 3
92S38LP31QG4E93Z0A0 NAME2 5
92S38LP31QG4E93Z0A0 NAME3 5
1826SSQ98518QAID99A NAME2 3
1826SSQ98518QAID99A NAME3 3
0LKJ68OP19A8865A92Q NAME1 2
0LKJ68OP19A8865A92Q NAME2 2
And I am using exact same query you mentioned
| stats count by sessionId, personName
| where count>3
| sort -count
I also want to have a where clause -- where I want all the sessionId with highest number of occurrence. And this occurrence value should be configurable.
So, lets say, if I want the occurrence threshold is 3 I should all the below three
1. 1234 Name1 20
2. 1234 Name2 15
3. 1234 Name3 1
4. 4321 Name1 2
5. 4321 Name3 6
6. 4321 Name3 9
if I want the occurrence threshold is 2 I should all the below three
1. 1234 Name1 20
2. 1234 Name2 15
3. 4321 Name3 6
4. 4321 Name3 9
So, if there is sessionID (lets say 3243) with just 1 occurrence, then it should not come in the result.