Hi All,
I have a field in my data called 'message' ,which contain information about status of the field.I'd like categorizes files either success or failure files based on content of the field.For example the message contain multiple values like(success,processed,completed) then i want to label the corresponding file as success,if it contains like(failed,failure) i want to label as failure file.How to implement this using SPL query.Below query i tried but i am not getting properly.
index=mulesoft environment=DEV applicationName="Test"
|stats values(content.FileName) as Filename1 values(content.ErrorMsg) as errormsg values(content.Error) as error values(message) as message values(priority) as priority min(timestamp) AS Logon_Time, max(timestamp) AS Logoff_Time BY correlationId
| eval SuccessFileName=case(match(message, "File put Succesfully*|Successfully created file data*|Archive file processed successfully*|Summary of all Batch*|processed successfully for file name*|SUCCESS") AND not match(priority,"ERROR|WARN"),FileName1,1=1,null())
| eval FailureFileName=case(match(message,"Failed to process file:"),FileName1,1=1,null()) |table SuccessFileName FailureFileName Response correlationId
The problem is not in use of case, but in regex you applied. (I think this very same problem was discussed recently. Is this another homework question?) There is an unnecessary asterisk (*) at the end of several expressions. But that's not necessarily a real problem. There is also a code choice of case vs if; the latter would be more expressive and concise in your use case. But that's not a problem, either.
The problem is that the regex's probably do not match data. For volunteers to help you, you need to post output from
index=mulesoft environment=DEV applicationName="Test"
|stats values(content.FileName) as Filename1 values(content.ErrorMsg) as errormsg values(content.Error) as error
values(message) as message values(priority) as priority min(timestamp) AS Logon_Time, max(timestamp) AS Logoff_Time BY correlationId
(Anonymize as needed.) If you ask a data analytics question, you need to illustrate data.
I suppose the problem lies elsewhere. Your point is valid - the "regex" is not very well written but those asterisks are actually a bit superfluous and shouldn't break anything.
From the original question (which was a bit of a "stream of conciousness" without paragraph breaks and no spaces after full stops) I suppose that the stats values() produces multivalued fields because a single correlationId can apply to several different files which can have different results each and so on.
But that's just my suspicion.
This example using makeresults can show you a case statement
| makeresults count=10
| eval _raw="message
success job,
processed job,
completed job,
failed job,"
| multikv forceheader=1
| eval status = case(
like(message, "%success%") OR like(message, "%processed%") OR like(message, "%completed%"), "success",
like(message, "%failed%") OR like(message, "%failure%"), "failure",
true(), "other"
)
| table _time, message, status