Hello,
I had a custom log from a monitoring tool and the output looks like:
Disk_Space Days Path
10G 4days /path/of/data/userA
20G 5days /path/of/data/userA/folderA
10G 4days /path/of/data/userB
20G 5days /path/of/data/userB/folderA
20G 5days /path/of/data/userB/folderB
20G 10days /path/of/data/userA/folderB/subfolder_a
.....
Is it possible to sort the data, which is over 5days and over 10G for example, and send an email to userA, userB, and so on?
Or do I have to rewrite this log to some other format like JSON?
Thank you very much!
Its possible to work with this format but we need to consider few factors before going to the final solution.
Assuming Disk_Space, Days and Path are already extracted fields in splunk, we still need to extract the information to do other numerical comparisons.
For e.g. to sort the Disk_Space, the digits should be extracted and could be done with a regex rex field=Disk_Space "(?<DU>\d+)"
.
But then, what if the disk space usage for some mounts are in TB/MB? In that case, it needs to be checked with a condition and convert to GB before doing any operation
Similarly for Days - it can be extracted with "(?<DAY>\d+)"
. But what if it changes to 1month after 30/31 days?
So with your sample data, the fields has to be extracted , compare and then an alert can be sent.
Please find a sample solution done based on dummy data.
index=_* earliest=-5m|stats count by source| eval Days=1|accum Days|eval Days=Days."Days"|eval Disk_Space=if(count > 1000,round((count/1024))."TB",count."GB")
Result
source count Days Disk_Space
C:\Program Files\Splunk\var\log\introspection\disk_objects.log 13 1Days 13GB
C:\Program Files\Splunk\var\log\introspection\kvstore.log 16 2Days 16GB
C:\Program Files\Splunk\var\log\introspection\resource_usage.log 191 3Days 191GB
C:\Program Files\Splunk\var\log\splunk\health.log 80 4Days 80GB
C:\Program Files\Splunk\var\log\splunk\metrics.log 853 5Days 853GB
C:\Program Files\Splunk\var\log\splunk\splunkd_access.log 9 6Days 9GB
C:\Program Files\Splunk\var\log\splunk\splunkd_ui_access.log 275 7Days 275GB
Now extract fields and perform UNIT (TB->GB) comparison and conversion (if needed)
index=_* earliest=-5m|stats count by source| eval Days=1|accum Days|eval Days=Days."Days"
|eval Disk_Space=if(count > 1000,round((count/1024))."TB",count."GB")
|rex field=source "C:\\\\Program Files\\\\Splunk\\\\var\\\\log\\\\(?<USER>\w+)\\\\"
|rex field=Disk_Space "(?<DU>\d+)(?<UNIT>\w+)"|rex field=Days "(?<DAY>\d+)"
|eval DU=if(UNIT=="TB",DU*1024,DU)
Finally apply the filter and send mail to the users
index=_* earliest=-5m|stats count by source| eval Days=1|accum Days|eval Days=Days."Days"
|eval Disk_Space=if(count > 1000,round((count/1024))."TB",count."GB")
|rex field=source "C:\\\\Program Files\\\\Splunk\\\\var\\\\log\\\\(?<USER>\w+)\\\\"
|rex field=Disk_Space "(?<DU>\d+)(?<UNIT>\w+)"|rex field=Days "(?<DAY>\d+)"
|eval DU=if(UNIT=="TB",DU*1024,DU)
| where DAY > 5 AND DU > 10|sendmail to=USER@mydomain.com
If you have the control over the content of the log file, suggest you to handle the Unit conversion/Data format before pushing to splunk. The most commonly used and suggested format is key=value
format which splunk understands without any extra configuration.
Its possible to work with this format but we need to consider few factors before going to the final solution.
Assuming Disk_Space, Days and Path are already extracted fields in splunk, we still need to extract the information to do other numerical comparisons.
For e.g. to sort the Disk_Space, the digits should be extracted and could be done with a regex rex field=Disk_Space "(?<DU>\d+)"
.
But then, what if the disk space usage for some mounts are in TB/MB? In that case, it needs to be checked with a condition and convert to GB before doing any operation
Similarly for Days - it can be extracted with "(?<DAY>\d+)"
. But what if it changes to 1month after 30/31 days?
So with your sample data, the fields has to be extracted , compare and then an alert can be sent.
Please find a sample solution done based on dummy data.
index=_* earliest=-5m|stats count by source| eval Days=1|accum Days|eval Days=Days."Days"|eval Disk_Space=if(count > 1000,round((count/1024))."TB",count."GB")
Result
source count Days Disk_Space
C:\Program Files\Splunk\var\log\introspection\disk_objects.log 13 1Days 13GB
C:\Program Files\Splunk\var\log\introspection\kvstore.log 16 2Days 16GB
C:\Program Files\Splunk\var\log\introspection\resource_usage.log 191 3Days 191GB
C:\Program Files\Splunk\var\log\splunk\health.log 80 4Days 80GB
C:\Program Files\Splunk\var\log\splunk\metrics.log 853 5Days 853GB
C:\Program Files\Splunk\var\log\splunk\splunkd_access.log 9 6Days 9GB
C:\Program Files\Splunk\var\log\splunk\splunkd_ui_access.log 275 7Days 275GB
Now extract fields and perform UNIT (TB->GB) comparison and conversion (if needed)
index=_* earliest=-5m|stats count by source| eval Days=1|accum Days|eval Days=Days."Days"
|eval Disk_Space=if(count > 1000,round((count/1024))."TB",count."GB")
|rex field=source "C:\\\\Program Files\\\\Splunk\\\\var\\\\log\\\\(?<USER>\w+)\\\\"
|rex field=Disk_Space "(?<DU>\d+)(?<UNIT>\w+)"|rex field=Days "(?<DAY>\d+)"
|eval DU=if(UNIT=="TB",DU*1024,DU)
Finally apply the filter and send mail to the users
index=_* earliest=-5m|stats count by source| eval Days=1|accum Days|eval Days=Days."Days"
|eval Disk_Space=if(count > 1000,round((count/1024))."TB",count."GB")
|rex field=source "C:\\\\Program Files\\\\Splunk\\\\var\\\\log\\\\(?<USER>\w+)\\\\"
|rex field=Disk_Space "(?<DU>\d+)(?<UNIT>\w+)"|rex field=Days "(?<DAY>\d+)"
|eval DU=if(UNIT=="TB",DU*1024,DU)
| where DAY > 5 AND DU > 10|sendmail to=USER@mydomain.com
If you have the control over the content of the log file, suggest you to handle the Unit conversion/Data format before pushing to splunk. The most commonly used and suggested format is key=value
format which splunk understands without any extra configuration.
@renjith.nair, thanks for your reply, for the Disk_Space and Days, they will be using GB and days instead of TB and months.
My sample log might be different from yours, which is my bad that I didn't explain it clearly at first.
The amount of the monitor log file is only one, like /log/usage.log, and in this log file, it has the content like:
Disk_Space Days Path
10G 4days /path/of/data/userA
20G 5days /path/of/data/userA/folderA
10G 4days /path/of/data/userB
20G 5days /path/of/data/userB/folderA
20G 5days /path/of/data/userB/folderB
20G 10days /path/of/data/userA/folderB/subfolder_a
30G 40days /path/of/data/userA/folderB/subfolder_a
.....
So using the way you provided, I will have below result:
source count Days Disk_Space
/log/usage.log 2 1Days 2GB
I think there must be something that still needs to be fixed in my log file, would you please share more ideas?
Thank you!
@garumaru,
The first line of SPL is just to generate the dummy data similar to yourself. So you don't need to worry about it. Since you are always using GB and Days, you may start from
"your existing search to get Disk_Space Days Path fields"
|rex field=source "\/path\/of\/data\/(?<USER>\w+)\/"
|rex field=Disk_Space "(?<DU>\d+)"|rex field=Days "(?<DAY>\d+)"
| where DAY > 5 AND DU > 10