I have a logfile like this -
2024-02-15 09:07:47,770 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-202] The Upload Service /app1/service/site/upload failed in 0.124000 seconds, {comments=xxx-123, senderCompany=Company1, source=Web, title=Submitted via Site website, submitterType=Others, senderName=ROMAN , confirmationNumber=ND_50249-02152024, clmNumber=99900468430, name=ROAMN Claim # 99900468430 Invoice.pdf, contentType=Email}
2024-02-15 09:07:47,772 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-202] Exception from executeScript: 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location.
---
---
---
2024-02-15 09:41:16,762 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-200] The Upload Service /app1/service/site/upload failed in 0.138000 seconds, {comments=yyy-789, senderCompany=Company2, source=Web, title=Submitted via Site website, submitterType=Public Adjuster, senderName=Tristian, confirmationNumber=ND_52233-02152024, clmNumber=99900470018, name=Tristian CLAIM #99900470018 PACKAGE.pdf, contentType=Email}
2024-02-15 09:41:16,764 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-200] Exception from executeScript: 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf
We need to look at index=<myindex> "/alfresco/service/site/upload failed" and get the table with the following information.
_time | clmNumber | confirmationNumber | name | Exception |
2024-02-15 09:07:47 | 99900468430 | ND_50249-02152024 | ROMAN Claim # 99900468430 Invoice.pdf | 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location |
2024-02-15 09:41:16 | 99900470018 | ND_52233-02152024 | Tristian CLAIM #99900470018 PACKAGE.pdf | 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf |
Exception is in another event line in logfile but just after the line from where to get first 4 metadata. Both of the rows/ events in the logs have sessionID in common and can have DOCNAME also in common but SessionID can have multiple transactions so can have different name.
I created following script for this purpose but its providing different DocName -
(index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*") OR
(index="myindex" "Exception from executeScript")
| rex "clmNumber=(?<ClaimNumber>[^,]+)"
| rex "confirmationNumber=(?<SubmissionNumber>[^},]+)"
| rex "contentType=(?<ContentType>[^},]+)"
| rex "name=(?<DocName>[^,]+)"
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| eval EventType=if(match(_raw, "Exception from executeScript"), "Exception", "Upload Failure")
| eventstats first(EventType) as first_EventType by SessionID
| where EventType="Upload Failure"
| join type=outer SessionID [
search index="myindex" "Exception from executeScript"
| rex "Exception from executeScript: (?<Exception>[^:]+)"
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| rex "(?<ExceptionDocName>.+\.pdf)"
| eval EventType="Exception"
| eventstats first(EventType) as first_EventType by SessionID
]
| where EventType="Exception" OR isnull(Exception)
| table _time, ClaimNumber, SubmissionNumber, ContentType, DocName, Exception
| sort _time desc ClaimNumber
Here is the result that I got -
_time | clmNumber | confirmationNumber | name | Exception |
2024-02-15 09:07:47 | 99900468430 | ND_50249-02152024 | ROMAN Claim # 99900468430 Invoice.pdf | 0115105149 Duplicate Child Exception - Rakesh lease 4 already exists in the location. |
2024-02-15 09:41:16 | 99900470018 | ND_52233-02152024 | Tristian CLAIM #99900470018 PACKAGE.pdf | 0115105128 Duplicate Child Exception - Combined 4 Point signed Ramesh 399 Coral Island. disk 3 already exists in the location. |
So, although I am able to get first four metadata in the table correctly, but the exception is coming from another event in the log with same sessionID I believe.
How can we fix the script to provide the expected result?
Thanks in Advance.
First, thank you for clearly illustrating input data and desired output. Note that join is a performance killer and best avoided; in this case it is an overkill.
If I decipher your requirement from the complex SPL correctly, all you want is a correlation between INFO and ERROR logs to output exceptions correlated with failed claim, file, etc. Whereas it is not difficult to extract claim number from both types of logs given the illustrated format, an easier correlation field is SessionID because they appear in both types in the exact same form.
Additionally, there should be no need to extract clmNumber and confirmationNumber because they are automatically extracted. the name field is garbled because of unquoted white spaces.
This is a simpler search that should satisfy your requirement:
index="myindex" ("/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*")
OR ("Exception from executeScript")
| rex "\bname=(?<name>[^,]+)"
```| rex "clmNumber=(?<ClaimNumber>[^,]+)"
| rex "confirmationNumber=(?<SubmissionNumber>[^},]+)"
| rex "contentType=(?<ContentType>[^},]+)" ```
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| rex "Exception from executeScript: (?<Exception>[^:]+)"
| fields clmNumber confirmationNumber name Exception SessionID
| stats min(_time) as _time values(*) as * by SessionID
Your sample logs should give
SessionID | _time | Exception | clmNumber | confirmationNumber | name |
[http-nio-8080-exec-200] | 2024-02-15 09:41:16.762 | 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf | 99900470018 | ND_52233-02152024 | Tristian CLAIM #99900470018 PACKAGE.pdf |
[http-nio-8080-exec-202] | 2024-02-15 09:07:47.769 | 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location. | 99900468430 | ND_50249-02152024 | ROAMN Claim # 99900468430 Invoice.pdf |
Of course you can remove SessionID from display and rearrange field order.
You can play with the following emulation and compare with real data
| makeresults
| eval data = split("2024-02-15 09:07:47,770 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-202] The Upload Service /app1/service/site/upload failed in 0.124000 seconds, {comments=xxx-123, senderCompany=Company1, source=Web, title=Submitted via Site website, submitterType=Others, senderName=ROMAN , confirmationNumber=ND_50249-02152024, clmNumber=99900468430, name=ROAMN Claim # 99900468430 Invoice.pdf, contentType=Email}
2024-02-15 09:07:47,772 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-202] Exception from executeScript: 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location.
---
---
---
2024-02-15 09:41:16,762 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-200] The Upload Service /app1/service/site/upload failed in 0.138000 seconds, {comments=yyy-789, senderCompany=Company2, source=Web, title=Submitted via Site website, submitterType=Public Adjuster, senderName=Tristian, confirmationNumber=ND_52233-02152024, clmNumber=99900470018, name=Tristian CLAIM #99900470018 PACKAGE.pdf, contentType=Email}
2024-02-15 09:41:16,764 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-200] Exception from executeScript: 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf", "
")
| mvexpand data
| rename data AS _raw
| rex "^(?<_time>\S+ \S+)"
| eval _time = strptime(_time, "%F %T,%3N")
| extract
``` the above emulates
(index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*") OR
(index="myindex" "Exception from executeScript")
```
| rex "\bname=(?<name>[^,]+)"
```| rex "clmNumber=(?<ClaimNumber>[^,]+)"
| rex "confirmationNumber=(?<SubmissionNumber>[^},]+)"
| rex "contentType=(?<ContentType>[^},]+)" ```
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| rex "Exception from executeScript: (?<Exception>[^:]+)"
| fields clmNumber confirmationNumber name Exception SessionID
| stats min(_time) as _time values(*) as * by SessionID
Result is coming like this for the first query.....
SessionID | _time | Exception | clmNumber | confirmationNumber | name |
[http-nio-8080-exec-101] | 2024-02-15 00:06:38.457 | 0115100018 Could not match parameter list [names, keep] to an operation. --- (Many More) |
BTW when the first query runs, it feels like it is going to give data as it presented by query 2 (| makeresults) for a sub second and then it mixes up and provides all the jumbled up data without anything on last three columns. Not sure if this information helps.
Thanks a lot for your reply Yuanliu.
When I tried to run the below code I get very skwed result. Session ID, and Time columns gets populated. For Exception, all exception for that "day" shows up in in row itself (Since I am running a day's worth of report) whether its related to "confirmationNumber=ND_*" or not. Rest of the three fieds are empty.
index="myindex" ("/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*")
OR ("Exception from executeScript")
| rex "\bname=(?<name>[^,]+)"
```| rex "clmNumber=(?<ClaimNumber>[^,]+)"
| rex "confirmationNumber=(?<SubmissionNumber>[^},]+)"
| rex "contentType=(?<ContentType>[^},]+)" ```
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| rex "Exception from executeScript: (?<Exception>[^:]+)"
| fields clmNumber confirmationNumber name Exception SessionID
| stats min(_time) as _time values(*) as * by SessionID
Secondly, I have data that might have same sessionID but different dataset, I am not able to see _time for the second transaction for same sessionID. Here is the sample data -
| makeresults
| eval data = split("2024-02-15 09:07:47,770 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-202] The Upload Service /app1/service/citizens/upload failed in 0.124000 seconds, {comments=xxx-123, senderCompany=Company1, source=Web, title=Submitted via Site website, submitterType=Others, senderName=ROMAN , confirmationNumber=ND_50249-02152024, clmNumber=99900468430, name=ROAMN Claim # 99900468430 Invoice.pdf, contentType=Email}
2024-02-15 09:07:47,772 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-202] Exception from executeScript: 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location.
2024-02-15 09:10:47,770 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-202] The Upload Service /app1/service/citizens/upload failed in 0.124000 seconds, {comments=xxx-123, senderCompany=Company1, source=Web, title=Submitted via Site website, submitterType=Others, senderName=Bob , confirmationNumber=ND_55555-02152024, clmNumber=99900468999, name=Bob Claim # 99900468999 Invoice.pdf, contentType=Email}
2024-02-15 09:10:48,772 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-202] Exception from executeScript: 0115101000 Document not found - Bob Claim # 99900468999 Invoice.pdf already exists in the location.
2024-02-15 09:41:16,762 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-200] The Upload Service /app1/service/citizens/upload failed in 0.138000 seconds, {comments=yyy-789, senderCompany=Company2, source=Web, title=Submitted via Site website, submitterType=Public Adjuster, senderName=Tristian, confirmationNumber=ND_52233-02152024, clmNumber=99900470018, name=Tristian CLAIM #99900470018 PACKAGE.pdf, contentType=Email}
2024-02-15 09:41:16,764 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-200] Exception from executeScript: 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf", "
")
and here is the result -
SessionID | _time | Exception | clmNumber | confirmationNumber | name |
[http-nio-8080-exec-200] | 2024-02-15 09:41:16.762 | 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf | 99900470018 | ND_52233-02152024 | Tristian CLAIM #99900470018 PACKAGE.pdf |
[http-nio-8080-exec-202] | 2024-02-15 09:07:47.769 | 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location. 0115101000 Document not found - Bob Claim # 99900468999 Invoice.pdf already exists in the location. | 99900468430 99900468999 | ND_50249-02152024 ND_55555-02152024 | Bob Claim # 99900468999 Invoice.pdf ROAMN Claim # 99900468430 Invoice.pdf |
How can we fix the first query so that it provides data for all columns correctly?
Thanks in advance for your time!
Thank you for providing the emulation! It is really important to illustrate data characteristics when dealing with data analytics. I made the assumption that each session would only handle one claim. If that is not the case, we'll have to extract claim number for correlation. There are many ways to do this. Because claim number is always embedded in the file name, I will show the simplest that applies to both INFO and ERROR. (An alternative is to simply use file name for correlation.) So
(index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*") OR
(index="myindex" "Exception from executeScript")
| rex "\bname=(?<name>[^,]+)"
| rex "(?i) claim # *(?<claimNumber>\S+)"
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| rex "Exception from executeScript: (?<Exception>[^:]+)"
| fields claimNumber confirmationNumber name Exception
| stats min(_time) as _time values(*) as * by claimNumber
Here is full emulation and result
| makeresults
| eval data = split("2024-02-15 09:07:47,770 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-202] The Upload Service /app1/service/citizens/upload failed in 0.124000 seconds, {comments=xxx-123, senderCompany=Company1, source=Web, title=Submitted via Site website, submitterType=Others, senderName=ROMAN , confirmationNumber=ND_50249-02152024, clmNumber=99900468430, name=ROAMN Claim # 99900468430 Invoice.pdf, contentType=Email}
2024-02-15 09:07:47,772 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-202] Exception from executeScript: 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location.
2024-02-15 09:10:47,770 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-202] The Upload Service /app1/service/citizens/upload failed in 0.124000 seconds, {comments=xxx-123, senderCompany=Company1, source=Web, title=Submitted via Site website, submitterType=Others, senderName=Bob , confirmationNumber=ND_55555-02152024, clmNumber=99900468999, name=Bob Claim # 99900468999 Invoice.pdf, contentType=Email}
2024-02-15 09:10:48,772 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-202] Exception from executeScript: 0115101000 Document not found - Bob Claim # 99900468999 Invoice.pdf already exists in the location.
2024-02-15 09:41:16,762 INFO [com.mysite.core.app1.upload.FileUploadWebScript] [http-nio-8080-exec-200] The Upload Service /app1/service/citizens/upload failed in 0.138000 seconds, {comments=yyy-789, senderCompany=Company2, source=Web, title=Submitted via Site website, submitterType=Public Adjuster, senderName=Tristian, confirmationNumber=ND_52233-02152024, clmNumber=99900470018, name=Tristian CLAIM #99900470018 PACKAGE.pdf, contentType=Email}
2024-02-15 09:41:16,764 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] [http-nio-8080-exec-200] Exception from executeScript: 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf", "
")
| mvexpand data
| rename data AS _raw
| rex "^(?<_time>\S+ \S+)"
| eval _time = strptime(_time, "%F %T,%3N")
| extract
``` the above emulates
(index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*") OR
(index="myindex" "Exception from executeScript")
```
| rex "\bname=(?<name>[^,]+)"
| rex "(?i) claim # *(?<claimNumber>\S+)"
```| rex "clmNumber=(?<ClaimNumber>[^,]+)"
| rex "confirmationNumber=(?<SubmissionNumber>[^},]+)"
| rex "contentType=(?<ContentType>[^},]+)" ```
| rex "(?<SessionID>\[http-nio-8080-exec-\d+\])"
| rex "Exception from executeScript: (?<Exception>[^:]+)"
| fields claimNumber confirmationNumber name Exception
| stats min(_time) as _time values(*) as * by claimNumber
claimNumber | _time | Exception | confirmationNumber | name |
99900468430 | 2024-02-15 09:07:47.769 | 0115100898 Duplicate Child Exception - ROAMN Claim # 99900468430 Invoice.pdf already exists in the location. | ND_50249-02152024 | ROAMN Claim # 99900468430 Invoice.pdf |
99900468999 | 2024-02-15 09:10:47.769 | 0115101000 Document not found - Bob Claim # 99900468999 Invoice.pdf already exists in the location. | ND_55555-02152024 | Bob Claim # 99900468999 Invoice.pdf |
99900470018 | 2024-02-15 09:41:16.762 | 0115100953 Document not found - Tristian CLAIM #99900470018 PACKAGE.pdf | ND_52233-02152024 | Tristian CLAIM #99900470018 PACKAGE.pdf |
Thanks Yuanliu,
This is working but not completely. There are 75 records that I should get in the resilt get as I am getting 75 rows if I just search for
index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*"
But when I update the script to the above provided then I am getting only 23 rows.
Going back to the original requirement -
First the script needs to search all the records that it can get by providing -
index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*"
Fetch _time, clmNumber, confirmationNumber, and name from that event in the table (4 columns).
Then check the second line [for same sessionid] for an exception (Exception from executeScript) and provide whatever is after it as a fifth column in the table.
May be I was not clear on the requirements earlier.
Now we are deep into the weeds of actual data. The number of rows is dependent only on how many unique claimNumber regex "(?i) claim # *(?<claimNumber>\S+)" extracts from both source filters. A meaningful test would be
(index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*")
| rex "(?i) claim # *(?<claimNumber>\S+)"
| stats dc(clmNumber) as clmCount dc(claimNumber)claimCount
Do they give 23? 75? one give 75, one 23? (According to your description, claimCount should be 23.) If the two counts are equal, there is nothing to change.
If you get different counts for clmNumber and claimNumber, you can do another test
(index="myindex" "/app1/service/site/upload failed" AND "source=Web" AND "confirmationNumber=ND_*")
| rex "(?i) claim # *(?<claimNumber>\S+)"
| table _time clmNumber claimNumber _raw
Then, you need to refine the regex. Post sample data for which claimNumber is not extracted if you need help with regex.