Splunk Search

Writing a search query that validates file format

Path Finder

Hi Guys,

I have a question regarding file validation. Is it possible in Splunk to validate the structure/format of a file? For instance, if you have a table with the following columns:

Name | Surname | Date

This table is ingested into Splunk. How would I write a query to check that in a different file, it contains the same headers, and the format of the values in the second file is in the correct format. For example, a date value should be in the correct format such as dd/mm/yyyy as opposed to mm/dd/yyyy.

So is there a way in Splunk to write a query that validates this?

Thanks

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Yes, and there are probably 73 different ways to do this.

One method may be to use regular expressions and some built-in eval functions. This should be fairly flexible, though depending on your skill with regex, there may be some difficulties [See note 1].

Obviously, there will be quite a bit of work in deciding "what's proper" and what isn't. But let's say that dates should be dd/mm/yyyy. You can check any sort of date format with the eval function strptime. For instance, here's a handful of different date "styles" and if they fit into d/m/y or m/d/y. This is run anywhere - just paste it in and press enter and it should give you output.

| makeresults 
| eval Dt="4/3/2017 1/1/2017 12/21/2017 2/30/2017 30/2/2017" 
| makemv delim=" " Dt 
| mvexpand Dt 
| eval MMDDYYYYDate = strptime(Dt, "%m/%d/%Y"), DDMMYYYYDate = strptime(Dt, "%d/%m/%Y")
| eval MMDDYYYYFits = if(isnull(MMDDYYYYDate),"No","Yes"), DDMMYYYYFits = if(isnull(DDMMYYYYDate),"No","Yes")

The first four lines are just "setting the stage". The 5th line tries to fit that created field Dt into a couple of date styles. The last line then checks if it worked - a sample of the logic to find out if it matches or not.

Likewise, you can do at least some very rudimentary checking of strings for certain formatting, but without knowing what exact formatting you need to check for, I'll just throw out a couple of examples.

Let's suppose you have a field that needs to have only 12 to 14 capital letters or numbers (no lowercase, no spaces, etc...), you could test that with something like

.... | rex field="myfield" "^(?<IsValid>[A-Z0-9]{12,14})$"

So, a few sites to get more information about specific use cases:
First, this Regular Expressions example site is very useful for a lot of pattern matching examples for particular use cases.
As always, Regex 101 is also a great site - less about showing you patterns and more about letting you test them easily, but it's fabulous nevertheless.

Note 1: We love REGEX here. Ask away (though generally one question per regex or at least task). Also, if you get on the Splunk Slack group (google for instructions on signing up) there's a channel dedicated to regex where there's a handful of very knowledgeable folks who love to help!

Happy Splunking!
-Rich

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Yes, and there are probably 73 different ways to do this.

One method may be to use regular expressions and some built-in eval functions. This should be fairly flexible, though depending on your skill with regex, there may be some difficulties [See note 1].

Obviously, there will be quite a bit of work in deciding "what's proper" and what isn't. But let's say that dates should be dd/mm/yyyy. You can check any sort of date format with the eval function strptime. For instance, here's a handful of different date "styles" and if they fit into d/m/y or m/d/y. This is run anywhere - just paste it in and press enter and it should give you output.

| makeresults 
| eval Dt="4/3/2017 1/1/2017 12/21/2017 2/30/2017 30/2/2017" 
| makemv delim=" " Dt 
| mvexpand Dt 
| eval MMDDYYYYDate = strptime(Dt, "%m/%d/%Y"), DDMMYYYYDate = strptime(Dt, "%d/%m/%Y")
| eval MMDDYYYYFits = if(isnull(MMDDYYYYDate),"No","Yes"), DDMMYYYYFits = if(isnull(DDMMYYYYDate),"No","Yes")

The first four lines are just "setting the stage". The 5th line tries to fit that created field Dt into a couple of date styles. The last line then checks if it worked - a sample of the logic to find out if it matches or not.

Likewise, you can do at least some very rudimentary checking of strings for certain formatting, but without knowing what exact formatting you need to check for, I'll just throw out a couple of examples.

Let's suppose you have a field that needs to have only 12 to 14 capital letters or numbers (no lowercase, no spaces, etc...), you could test that with something like

.... | rex field="myfield" "^(?<IsValid>[A-Z0-9]{12,14})$"

So, a few sites to get more information about specific use cases:
First, this Regular Expressions example site is very useful for a lot of pattern matching examples for particular use cases.
As always, Regex 101 is also a great site - less about showing you patterns and more about letting you test them easily, but it's fabulous nevertheless.

Note 1: We love REGEX here. Ask away (though generally one question per regex or at least task). Also, if you get on the Splunk Slack group (google for instructions on signing up) there's a channel dedicated to regex where there's a handful of very knowledgeable folks who love to help!

Happy Splunking!
-Rich

View solution in original post

0 Karma