Solved: How to find the file name of largest file using sp...

Nithiya1

I have file name and file size.

I would like to find largest file name.

My query:

<search>| stats max(File_Size_MB) AS Large_File_Size by File_Name

Appreciate any help here

PickleRick

I understand you're trying to do something roughly equivalent to SQL query of

SELECT MAX(filesize), filename FROM whatever

With Splunk it doesn't work like this.

You have at least three different methods of doing this. Each with its pros and cons.

1. The "obvious" splunky way - use eventstats to find maximum file size value and populate your data with it and then filter the results to find those that match this size.

<initial search>
| eventstats max(filesize) as maxsize
| where filesize=maxsize

Pros - it's easy to understand and intuitive if you know some SPL

Cons - eventstats is memory-hungry which can easily lead you to resource exhaustion over bigger data sets.

2. Find maximum file size in a subsearch and then search for all files with this file size

<your search> [ <your search>
| stats max(filesize) as filesize ]

Pros - it is relatively easily understandable if you know about the subsearches. In specific cases when you can use PREFIX(), the subsearch be quite fast and the outer search then can also turn out to be quite effective

Cons - it uses subsearch which is subject to limitations, most importantly regarding run time. If you cannot optimize your subsearch with PREFIX() and you're not using indexed fields, your subsearch will have to comb through all matching data (but since it's stats it will be map-reduced). If the subsearch hits the limits it will be silently finalized which means you might get wrong/incomplete results.

3. Sort the data and pick the first one. This one might not be the best choice for this particular problem but a generalized solution (using "head X" or "dedup X") might be used for a generalized problem of finding top X events with a specific parameter.

<your search>
| sort - filesize
| head 1

Pros - if I remember correctly, the sort | head part might be optimized to not have to return all sorted data. As I wrote - it's a narrower version of the general solution. And the general solution is worth knowing since it's often better than alternatives.

Cons - If your initial search already moved the search to the SH tier and your dataset is big it might not be very efficient. Might not be obvious what it does at first glance.

View solution in original post

PickleRick

I understand you're trying to do something roughly equivalent to SQL query of

SELECT MAX(filesize), filename FROM whatever

With Splunk it doesn't work like this.

You have at least three different methods of doing this. Each with its pros and cons.

1. The "obvious" splunky way - use eventstats to find maximum file size value and populate your data with it and then filter the results to find those that match this size.

<initial search>
| eventstats max(filesize) as maxsize
| where filesize=maxsize

Pros - it's easy to understand and intuitive if you know some SPL

Cons - eventstats is memory-hungry which can easily lead you to resource exhaustion over bigger data sets.

2. Find maximum file size in a subsearch and then search for all files with this file size

<your search> [ <your search>
| stats max(filesize) as filesize ]

Pros - it is relatively easily understandable if you know about the subsearches. In specific cases when you can use PREFIX(), the subsearch be quite fast and the outer search then can also turn out to be quite effective

Cons - it uses subsearch which is subject to limitations, most importantly regarding run time. If you cannot optimize your subsearch with PREFIX() and you're not using indexed fields, your subsearch will have to comb through all matching data (but since it's stats it will be map-reduced). If the subsearch hits the limits it will be silently finalized which means you might get wrong/incomplete results.

3. Sort the data and pick the first one. This one might not be the best choice for this particular problem but a generalized solution (using "head X" or "dedup X") might be used for a generalized problem of finding top X events with a specific parameter.

<your search>
| sort - filesize
| head 1

Pros - if I remember correctly, the sort | head part might be optimized to not have to return all sorted data. As I wrote - it's a narrower version of the general solution. And the general solution is worth knowing since it's often better than alternatives.

Cons - If your initial search already moved the search to the SH tier and your dataset is big it might not be very efficient. Might not be obvious what it does at first glance.

Nithiya1

Its Working.

Thanks for solution and explanation!!

gcusello

Hi @Nithiya1 ,

sorry what is your issue?

the search seems to be correct.

maybe the File_Size_MB isn't a number or what else?

Ciao.

Giuseppe

How to find the file name of largest file using splunk query

eval

stats

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation