All Apps and Add-ons

Application Log field extractions based on SQL

geraldcontreras
Path Finder

Hi All,

Really hoping someone out there can help me with this.

We have an in house app that generates message logs which contain SQL.

Each query can be different so simple regex extraction wont work because the query can change. 

Below are 2 _raw examples of different queries:

Example 1:

 

 

 

{"message":"Completed SQL Query","context":{"query":"INSERT INTO \"Messages\" (\"toAddress\", \"fromAddress\", \"templateId\", \"subject\", \"senderObjectTypeId\", \"senderObjectId\", \"ccAddresses\", \"entityId\", \"addresseeObjectTypeId\", \"addresseeObjectId\", \"addresseeId\", \"senderId\", \"type\", \"inbound\", \"status\", \"groupNo\", \"priority\", \"objectKey\", \"uuid\", \"created\", \"createdBy\", \"updated\", \"updatedBy\") VALUES ('somone@hotmail.com', 'something@mail.com', '12345', 'invoice', '1', '1234', '<array>\n<XML_Serializer_Tag>something@mail.co</XML_Serializer_Tag>\n</array>', '123', '12', '12347', '1234564', '123456', 'Email', 0, 'queued', '12345678', '1', 'messages/11111-1111-1111-1111-111111111', '11111-1111-1111-1111-111111111', '2020-09-02T09:10:31+04:00', 12345678, '2020-09-02T09:10:31+04:00', 12345678)

 

 

 

Example 2:

 

 

 

{"message":"Completed SQL Query","context":{"query":"INSERT INTO \"Messages\" (\"parentId\", \"subject\", \"status\", \"entityId\", \"inbound\", \"spamScore\", \"spamReport\", \"type\", \"addresseeObjectTypeId\", \"addresseeObjectId\", \"addresseeId\", \"toAddress\", \"fromAddress\", \"senderId\", \"objectKey\", \"uuid\", \"created\", \"createdBy\", \"updated\", \"updatedBy\") VALUES ('111111', 'Invoice for you', 'received', '11', 1, '1', 'Spam detection software, running on the system \"xyz.net\", has\nidentified this incoming email as possible spam. The original message\nhas been attached to this so you can view it (if it isn''t spam) or label\nsimilar future email. MIME_HTML_ONLY BODY: Message only has text/html MIME parts\n\n', 'Email', '1', '1234', '123456', 'recipient@mai.com', 'sender@hotmail.com', NULL, 'messages/111111-1111-1111-1111-11111111', '111111-1111-1111-1111-11111111', '2020-08-27T01:28:14+00:00', 1, '2020-08-27T01:28:14+00:00', 1) 

 

 

 

 

As you can see in the above example the SQl fields can different and/or the same but in a different order.

Is there a way i can extract the fields based on the the "INSERT INTO" fields?

So parentID, toAddress

will know to extract the fields in that order etc etc?

IE- it will get the field names based on the first "INSERT INTO" section and populate from the "VALUES", regardless of the order? Would i create an extraction for the "INSERT INTO" fields and "VALUES" fields then SPATH them?

EG This Regex will work for Example 2 but not example 1 (Also for now i dont need comments on my regex now...just answers on thew core of my question please)

 

 

^[^\)\n]*\)\s+\w+\s+\('(?P<parentId>\d+)[^ \n]* '(?P<subject>[^']+)',\s+'(?P<message_status>[a-z]+)',\s+'(?P<entityId>[^']+)(?:[^'\n]*'){2}(?P<spam_score>\d+\.\d+)(?:[^,\n]*,){6}\s+'(?P<type>\w+)(?:[^'\n]*'){4}(?P<addresseeObjectId>\d+)[^ \n]* '(?P<recipient_Id>\d+)[^ \n]* '(?P<RecipientAddress>[^']+)',\s+'(?P<SenderAddress>\w+@\w+\.\w+)

 

 

 

Thanks in advance, i know its a convoluted 

Any insights id really appreciate 

Labels (2)
Tags (1)
0 Karma

thambisetty
SplunkTrust
SplunkTrust

examples you shared are  complete event or partial event ?

————————————
If this helps, give a like below.
0 Karma

geraldcontreras
Path Finder

Hi, 

Thanks for responding, really appreciate it.

below is the complete event, obviously i have changed data values for privacy.

Regardless, i need to know how to get the values of "INSERT INTO" as field names... and then the corresponding field values from "VALUES" for each field name........ where the order and name of field names in "INSERT INTO" can change.

 

 

{"message":"Completed SQL Query","context":{"query":"INSERT INTO \"Messages\" (\"parentId\", \"subject\", \"status\", \"entityId\", \"inbound\", \"spamScore\", \"spamReport\", \"type\", \"addresseeObjectTypeId\", \"addresseeObjectId\", \"addresseeId\", \"toAddress\", \"fromAddress\", \"senderObjectTypeId\", \"senderObjectId\", \"senderId\", \"objectKey\", \"uuid\", \"created\", \"createdBy\", \"updated\", \"updatedBy\") VALUES ('11111111', 'myname', 'received', '111', 1, '1.1111', 'Spam detection software, running on the system \"mailmail.net\", has\nidentified this incoming email as possible spam.  The original message\nhas been attached to this so you can view it (if it isn''t spam) or label\nsimilar future email.  If you have any questions, see\n@@CONTACT_ADDRESS@@ for details.\n\nContent preview:  some preview of the email\n\nContent analysis details:   (0.0 points, 5.0 required)\n\n pts rule name              description\n---- ---------------------- --------------------------------------------------\n 0.0 FREEMAIL_FROM          Sender email is freemail (nsendername[at]gmail.com)\n 0.0 HTML_MESSAGE           BODY: HTML included in message\n 0.0 T_MIME_NO_TEXT         No text body parts\n\n', 'Email', '1', '1', '658546', 'receivername@mail.com', 'sendername@gmail.com', '11', '111111', '123456', 'messages/1111111-111111-11111-111111-11111111', '11111111-11111-11111-11111-11111111', '2020-09-02T07:26:59+00:00', 1, '2020-09-02T07:26:59+00:00', 1) SELECT SCOPE_IDENTITY()","bind":[],"timer":{"query":{"time":"0.02","totalTime":"0.05","count":111}}},"level":111,"level_name":"NAME","channel":"query","datetime":{"date":"2020-09-02 07:26:59.232241","timezone_type":3,"timezone":"UTC"},"extra":{"url":"/something/orrather/bound","ip":"1.1.1.1","http_method":"POST","server":"","referrer":null,"instanceTag":"111111111","requestId":"12jh1iu2hu1ikh2k1uh2ku1"}}

 

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
index=_internal | head 1
| eval _raw="{\"message\":\"Completed SQL Query\",\"context\":{\"query\":\"INSERT INTO \\\"Messages\\\" (\\\"parentId\\\", \\\"subject\\\", \\\"status\\\", \\\"entityId\\\", \\\"inbound\\\", \\\"spamScore\\\", \\\"spamReport\\\", \\\"type\\\", \\\"addresseeObjectTypeId\\\", \\\"addresseeObjectId\\\", \\\"addresseeId\\\", \\\"toAddress\\\", \\\"fromAddress\\\", \\\"senderObjectTypeId\\\", \\\"senderObjectId\\\", \\\"senderId\\\", \\\"objectKey\\\", \\\"uuid\\\", \\\"created\\\", \\\"createdBy\\\", \\\"updated\\\", \\\"updatedBy\\\") VALUES ('11111111', 'myname', 'received', '111', 1, '1.1111', 'Spam detection software, running on the system \\\"mailmail.net\\\", has\\\\nidentified this incoming email as possible spam.  The original message\\\\nhas been attached to this so you can view it (if it isn''t spam) or label\\\\nsimilar future email.  If you have any questions, see\\\\n@@CONTACT_ADDRESS@@ for details.\\\\n\\\\nContent preview:  some preview of the email\\\\n\\\\nContent analysis details:   (0.0 points, 5.0 required)\\\\n\\\\n pts rule name              description\\\\n---- ---------------------- --------------------------------------------------\\\\n 0.0 FREEMAIL_FROM          Sender email is freemail (nsendername[at]gmail.com)\\\\n 0.0 HTML_MESSAGE           BODY: HTML included in message\\\\n 0.0 T_MIME_NO_TEXT         No text body parts\\\\n\\\\n', 'Email', '1', '1', '658546', 'receivername@mail.com', 'sendername@gmail.com', '11', '111111', '123456', 'messages/1111111-111111-11111-111111-11111111', '11111111-11111-11111-11111-11111111', '2020-09-02T07:26:59+00:00', 1, '2020-09-02T07:26:59+00:00', 1) SELECT SCOPE_IDENTITY()\",\"bind\":[],\"timer\":{\"query\":{\"time\":\"0.02\",\"totalTime\":\"0.05\",\"count\":111}}},\"level\":111,\"level_name\":\"NAME\",\"channel\":\"query\",\"datetime\":{\"date\":\"2020-09-02 07:26:59.232241\",\"timezone_type\":3,\"timezone\":\"UTC\"},\"extra\":{\"url\":\"/something/orrather/bound\",\"ip\":\"1.1.1.1\",\"http_method\":\"POST\",\"server\":\"\",\"referrer\":null,\"instanceTag\":\"111111111\",\"requestId\":\"12jh1iu2hu1ikh2k1uh2ku1\"}}"
| spath output=query path=context.query
| rex field=query "INSERT INTO \"Messages\" \((?<columns>.*)\) VALUES \((?<values>.*)\) SELECT SCOPE"
| eval values=replace(values, "\'\'", "\'")
| rex max_match=0 field=values "(?<value>(\d+|'.*?[^\\\]')(|, ))"
| makemv delim="," columns
| eval columns=mvmap(columns,trim(columns))
| eval combined=mvzip(columns,value,"=")
| mvexpand combined
| fields combined
| rex field=combined "\"(?<name>[^\"]*)\"=(?<value>.*)"
| eval {name}=value
| fields - _raw, combined, name, value
| stats values(*) as *
| table *

Couple of things to note:

  1. first rex assumes "table" is called Messages and that there is SELECT SCOPE after the values which it might not be the case for all your events
  2. double single quote in the middle of one of the fields in the word isn't was escaped with a backslash but not converted back
  3. works on the example given and uses mvexpand to create multiple events and stats to join them back but this should be done with an event identifier if using multiple events

Might work as a starting point for you though

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...