Hi
I wonder whether someone can help me please.
I'm using the code below to run a search which works fine.
index=main detail.statusCode=200 | dedup detail.ipAddress | stats count(eventId) | rename count(eventId) AS "Registrations"
But I'm told, and I'm not sure whether this is correct, that dedup
is quite labour intensive for Splunk to remove duplicate entries, so I changed the code to:
index=main detail.statusCode=200 | stats dc(detail.ipAddress) | stats count(eventId) | rename count(eventId) AS "Registrations"
The problem I have is that although the parameters for the two are exactly the same, I retrieve more events for the second search, but I have no idea why. The figure shown for the 1st search also returns a stats count of 74, whereas the second returns 0
Could someone tell me please? By default ,does 'stats dc' return another set of records?
Many thanks and kind regards
Chris
Hi All, I've gone through the query @pwmcity provided, but unfortunately I couldn't get the figures to match those coming out of the dedup query.
So I continued to work on this, and managed to get this to work by using:
| stats distinct_count(detail.ipAddress) AS "Registrations"
I hope this helps.
Kind Regards
Chris
Hi All, I've gone through the query @pwmcity provided, but unfortunately I couldn't get the figures to match those coming out of the dedup query.
So I continued to work on this, and managed to get this to work by using:
| stats distinct_count(detail.ipAddress) AS "Registrations"
I hope this helps.
Kind Regards
Chris
this is because you're removing the eventId in the first stats, try :
| stats dc(detail.ipAddress) by eventId | stats count
Also, know that that instead of this:
| stats count(eventId) | rename count(eventId) AS "Registrations"
you can do this:
| stats count(eventId) AS "Registrations"
And as far as performance, just try both and inspect the time taken
Hi @pwmcity, many thanks for taking the time to reply to my post and for the solution. Kind regards. Chris
Hi,
you are right, dedup is slow and resource heavy, espacially with a huge amount of events.
The problem with your second search is, if you run the search up to this point: index=main detail.statusCode=200 | stats dc(detail.ipAddress)
you do not have the eventId field anymore. Therefore, your following| stats count(eventId)
returns 0. So you have to refactor your search, but some infos could be useful. Is there only one eventId per ipAdress? What is the intention of your search? Do you have some sample data?
If i interpret your search correctly, i would say you get the same result (74) with just the following search:
index=main detail.statusCode=200 | stats dc(detail.ipAddress)
But however, this is the distinct count of ipadresses, which might not equal the number of registrations, but this depends on your data.
Greetings
Tom
Hi @tom_frotscher, thank you for taking the time to reply to my post and for pointing out the error over the eventId.
Kind regards
Chris
Till the following line your query is good. But you have lost "eventId" in your data due to stats
index=main detail.statusCode=200 | stats dc(detail.ipAddress)
Hence
| stats count(eventId) is returning 0.
So instead of | stats count(eventId) use | stats count
So your final query will look like this
index=main detail.statusCode=200 | stats dc(detail.ipAddress) | stats count | rename count AS "Registrations"