I understand you're trying to do something roughly equivalent to SQL query of
SELECT MAX(filesize), filename FROM whatever
With Splunk it doesn't work like this.
You have at least three different methods of doing this. Each with its pros and cons.
1. The "obvious" splunky way - use eventstats to find maximum file size value and populate your data with it and then filter the results to find those that match this size.
<initial search>
| eventstats max(filesize) as maxsize
| where filesize=maxsize
Pros - it's easy to understand and intuitive if you know some SPL
Cons - eventstats is memory-hungry which can easily lead you to resource exhaustion over bigger data sets.
2. Find maximum file size in a subsearch and then search for all files with this file size
<your search> [ <your search>
| stats max(filesize) as filesize ]
Pros - it is relatively easily understandable if you know about the subsearches. In specific cases when you can use PREFIX(), the subsearch be quite fast and the outer search then can also turn out to be quite effective
Cons - it uses subsearch which is subject to limitations, most importantly regarding run time. If you cannot optimize your subsearch with PREFIX() and you're not using indexed fields, your subsearch will have to comb through all matching data (but since it's stats it will be map-reduced). If the subsearch hits the limits it will be silently finalized which means you might get wrong/incomplete results.
3. Sort the data and pick the first one. This one might not be the best choice for this particular problem but a generalized solution (using "head X" or "dedup X") might be used for a generalized problem of finding top X events with a specific parameter.
<your search>
| sort - filesize
| head 1
Pros - if I remember correctly, the sort | head part might be optimized to not have to return all sorted data. As I wrote - it's a narrower version of the general solution. And the general solution is worth knowing since it's often better than alternatives.
Cons - If your initial search already moved the search to the SH tier and your dataset is big it might not be very efficient. Might not be obvious what it does at first glance.
I understand you're trying to do something roughly equivalent to SQL query of
SELECT MAX(filesize), filename FROM whatever
With Splunk it doesn't work like this.
You have at least three different methods of doing this. Each with its pros and cons.
1. The "obvious" splunky way - use eventstats to find maximum file size value and populate your data with it and then filter the results to find those that match this size.
<initial search>
| eventstats max(filesize) as maxsize
| where filesize=maxsize
Pros - it's easy to understand and intuitive if you know some SPL
Cons - eventstats is memory-hungry which can easily lead you to resource exhaustion over bigger data sets.
2. Find maximum file size in a subsearch and then search for all files with this file size
<your search> [ <your search>
| stats max(filesize) as filesize ]
Pros - it is relatively easily understandable if you know about the subsearches. In specific cases when you can use PREFIX(), the subsearch be quite fast and the outer search then can also turn out to be quite effective
Cons - it uses subsearch which is subject to limitations, most importantly regarding run time. If you cannot optimize your subsearch with PREFIX() and you're not using indexed fields, your subsearch will have to comb through all matching data (but since it's stats it will be map-reduced). If the subsearch hits the limits it will be silently finalized which means you might get wrong/incomplete results.
3. Sort the data and pick the first one. This one might not be the best choice for this particular problem but a generalized solution (using "head X" or "dedup X") might be used for a generalized problem of finding top X events with a specific parameter.
<your search>
| sort - filesize
| head 1
Pros - if I remember correctly, the sort | head part might be optimized to not have to return all sorted data. As I wrote - it's a narrower version of the general solution. And the general solution is worth knowing since it's often better than alternatives.
Cons - If your initial search already moved the search to the SH tier and your dataset is big it might not be very efficient. Might not be obvious what it does at first glance.
Its Working.
Thanks for solution and explanation!!
Hi @Nithiya1 ,
sorry what is your issue?
the search seems to be correct.
maybe the File_Size_MB isn't a number or what else?
Ciao.
Giuseppe