Splunk Search

Can you help me with an XML field extraction?

Builder

I have a log file, that outputs different formats depending on the portion of the application doing the logging. Some of the events output like the XML sample data shown here. I'd like to find some way to extract the key-value pairs out of them. If a transforms/props can be put in place, that will recognize events like this one and extract the fields I need without interfering with the other single-line machine data and JSON entries in the log that would be nice. If multiple lines can be entered to account for XML and JSON even better.

Honestly, Id be happy with an inline solution with |extract |xmlkv |xpath or something like that.

Otherwise, I will be forced to write some pretty nasty |REX statements for each field.

Thanks!
Looking for:

Clientid 11111111
MemberFirstName Jane
MemberLastName Doe
Gender FEMALE
DOB 11/11/1911
EmployeeIDNum xxxxx
MentorFirstName

MentorLastName

Event Samples:

2018-09-25 12:48:23,599 [tp-bio-8001-exec-151] [  STANDARD] [                    ] [        PHSInt:01.01] (og.Domain_FW_Apollo_Int_.Action) INFO  hostname01.domain.com|10.200.200.200|HTTP|AssessmentServices|Services|SaveAssessmentAnswers|AD0A0F376B08E09090B78F37816A41733  - INSERTING INTO SERVICE REQUEST LOG:--SERVICEREQUESTTYPE -->:SaveAssessmentAnswers--SERVICEREQUESTSTATUS -->:--TRANSACTIONID-->:3740e6fc-99xx-43f2-ba47-4630da0aaeda--MEMBERELIGID-->:--PID-->:--PARTICIPANTID-->:--DEBUGMESSAGE-->:[hostname01.domain.com] --REQUEST-->:<apol:SaveAssessmentAnswersReq xmlns:apol="http://www.domain.com/xsds/ApolloAssessment.xsd"><apol:TransactionID>3740e6fc-59ee-43f2-ba47-4630da0aaeda</apol:TransactionID><apol:ClientID>11111111</apol:ClientID><apol:AssessmentTypeID>931</apol:AssessmentTypeID><apol:ParticipantAnswerList><apol:PartQuestionAnswer><apol:QuestionID>ImpersonatorDetail</apol:QuestionID><apol:AnswerList><apol:Answer><apol:LabelID>MEMBER</apol:LabelID></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>MemberFirstName</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>Jane</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>MemberLastName</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>Doe</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>Gender</apol:QuestionID><apol:AnswerList><apol:Answer><apol:LabelID>FEMALE</apol:LabelID></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>DOB</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>01/01/1911</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>EmployeeIDNum</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>35121212121212</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer></apol:ParticipantAnswerList></apol:SaveAssessmentAnswersReq>--RESPONSE-->:<ns1:SaveAssessmentAnswersRes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.domain.com/xsds/ApolloAssessment.xsd"> <ns1:TransactionID>3740e6fc-59ee-43f2-ba47-4630da0aaeda</ns1:TransactionID>   <ns1:AdditionalDataList> <ns1:AdditionalData> <ns1:Name>Message</ns1:Name> <ns1:Value>We’re sorry, we’re not able to verify your account information. Please contact your benefits  administrator.</ns1:Value> </ns1:AdditionalData> </ns1:AdditionalDataList> </ns1:SaveAssessmentAnswersRes> --REFERENCEID-->:

2018-09-25 12:47:21,248 [tp-bio-8001-exec-177] [  STANDARD] [                    ] [        PHSInt:01.01] (og.Alere_FW_Apollo_Int_.Action) INFO  hostname.domain.com|10.214.6.60|HTTP|AssessmentServices|Services|SaveAssessmentAnswers|A6E53D8C7F19456C1484D3F2307AB5FDB  - INSERTING INTO SERVICE REQUEST LOG:--SERVICEREQUESTTYPE -->:SaveAssessmentAnswers--SERVICEREQUESTSTATUS -->:--TRANSACTIONID-->:a8667bd9-2be5-4655-8d9a-dd47e8111ce4--MEMBERELIGID-->:--PID-->:--PARTICIPANTID-->:--DEBUGMESSAGE-->:[hostname.domain.com] --REQUEST-->:<apol:SaveAssessmentAnswersReq xmlns:apol="http://www.domain.com/xsds/ApolloAssessment.xsd"><apol:TransactionID>axx67bd9-2be5-4655-8d9a-dd47e8111ce4</apol:TransactionID><apol:ClientID>11121212</apol:ClientID><apol:AssessmentTypeID>931</apol:AssessmentTypeID><apol:ParticipantAnswerList><apol:PartQuestionAnswer><apol:QuestionID>ImpersonatorDetail</apol:QuestionID><apol:AnswerList><apol:Answer><apol:LabelID>PARENT</apol:LabelID></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>MentorFirstName</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>Jane</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>MentorLastName</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>Doe</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>MemberFirstName</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>Aiden</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>MemberLastName</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>Doe</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>Gender</apol:QuestionID><apol:AnswerList><apol:Answer><apol:LabelID>MALE</apol:LabelID></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>DOB</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>01/1/2001</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer><apol:PartQuestionAnswer><apol:QuestionID>EmployeeIDNum</apol:QuestionID><apol:AnswerList><apol:Answer><apol:Value>351111111111</apol:Value></apol:Answer></apol:AnswerList></apol:PartQuestionAnswer></apol:ParticipantAnswerList></apol:SaveAssessmentAnswersReq>--RESPONSE-->:<ns1:SaveAssessmentAnswersRes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns1="http://www.domain.com/xsds/ApolloAssessment.xsd"> <ns1:TransactionID>axx67bd9-2be5-4655-8d9a-dd47e8111ce4</ns1:TransactionID>   <ns1:AdditionalDataList> <ns1:AdditionalData> <ns1:Name>Message</ns1:Name> <ns1:Value>We’re sorry, we’re not able to verify your account information. Please contact your benefits  administrator.</ns1:Value> </ns1:AdditionalData> </ns1:AdditionalDataList> </ns1:SaveAssessmentAnswersRes> --REFERENCEID-->:

Other Samples in the log that are not XML.

2018-09-252018-09-25  13:17:4613:17:46,,541541 [ [tp-bio-8004-exec-171tp-bio-8004-ex ] [  STANDARD] [                    ] [        PHSInt:01.01] (lo_Data_System_BatchLog.Action) INFO  hostname06.domain.com|10.200.200.200|HTTP|HealthIndicatorsInt|Services|saveHealthData|A30AC19E66FD562E79942068C75D03XXF  - In UpdateBatchLog:ID=20001,Type=ProcessEvent,Action=P-212799085,Status=INFO,Message=Processing of EventNew HD,Exception=

JSON I think.

2018-09-25 13:17:45,929 [  PegaRULES-Batch-18] [  STANDARD] [                    ] [ ApolloCCBatch:01.01] (on.Domain_FW_Apollo_Int_.Action) INFO    - INSERTING INTO SERVICE REQUEST LOG:--SERVICEREQUESTTYPE -->:MPEAPI--SERVICEREQUESTSTATUS -->:200--TRANSACTIONID-->:DOE--MEMBERELIGID-->:99999999--PID-->:999999999--PARTICIPANTID-->:JOHN--DEBUGMESSAGE-->:[hostname04.domain.com] OK [Time Elapsed=697.0ms]--REQUEST-->:{     "MemberProductEligibilityRequest":{        "requestHeader":{           "applicationName":"APPLICATION",         "transactionId":"bc99999b547b64cf99a01cabd625e0bc7"      },      "consumerDetails":{           "firstName":"JOHN",         "lastName":"DOE",         "dateOfBirth":"1900-05-09T00:00:00Z",         "searchId":"999999999",         "contractNumber":"999999"      },      "filteringAttributes":{           "includeExtendedAttributes":"true",         "applyFilters":"true"      },      "requestDetails":{           "requestType":"BIG5",         "searchType":"ALL"      }   }}--RESPONSE-->:{"MemberProductEligibilityResponse":{"responseHeader":{"transactionId":"bc2706b547b64cf99a01cabd625e0bc7"},"consumerDetails":[{"demographics":{**** Section suppressed for logging ****},"contactDetails":{**** Section suppressed for logging ****},"idSet":{**** Section suppressed for logging ****},"populationDetails":{"populationEffectiveDate":"2018-01-01T00:00:00Z","populationCancelDate":"9999-12-31T00:00:00Z","populationId":"POP33477","populationDateAssigned":"2017-12-12T00:00:00Z","populationBrandingType":"Optum Logo","populationBrandingEffectiveDate":"2018-01-01T00:00:00Z"},"coverageDetails":{"recordType":"HEALTH_COVERAGE","employeeStatus":"A","contractNumber":"0999999","eligibilitySourceSystem":"CS","planVariation":"0106","reportingCode":"0106","customerName":"TESLA","coverageType":"M","coverageEffectiveDate":"2018-01-01T00:00:00Z","hireDate":"2001-01-04T00:00:00Z","stateOfIssue":"CA","legalEntity1":"20020","marketSite":"0004422"},"extendedAttributes":{"ecExtended":[],"elExtended":[],"euExtended":[{"typeCode":"EU3","value":"0004422","effectiveDate":"2001-01-01T00:00:00Z","cancelDate":"9999-12-31T00:00:00Z"},{"typeCode":"EU3","value":"0004422","effectiveDate":"2001-01-01T00:00:00Z","cancelDate":"9999-12-31T00:00:00Z"}],"cuExtended":[],"suExtended":[],"muExtended":[]},"productDetails":{"product":[{"source":"Optum","productEvent1":"Productname for Life","productEffectiveDate":"2018-01-01T00:00:00Z","productTerminationDate":"2199-12-31T00:00:00Z"}]}}]}}--REFERENCEID-->:999999
0 Karma

Builder

For now I ended up doing this.

| rex field=_raw "LabelID\>(?P<LabelID>[^\<]+)"
| rex field=_raw "MemberLastName\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:Value\>(?P<MemberLastName>[^\<]+)"
| rex field=_raw "MemberFirstName\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:Value\>(?P<MemberFirstName>[^\<]+)"
| rex field=_raw "MentorFirstName\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:Value\>(?P<MentorFirstName>[^\<]+)"
| rex field=_raw "MentorLastName\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:Value\>(?P<MentorLastName>[^\<]+)"
| rex field=_raw "DOB\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:Value\>(?P<DOB>[^\<]+)"
| rex field=_raw "EmployeeIDNum\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:Value\>(?P<EmployeeIDNum>[^\<]+)"
| rex field=_raw "Gender\<\/apol\:QuestionID\>\<apol\:AnswerList\>\<apol\:Answer\>\<apol\:LabelID\>(?P<EmployeeGender>[^\<]+)"
| rex field=_raw "\<apol\:ClientID\>(?P<ClientID>[^\<]+)"
| fillnull value=""
| eval Name=MemberFirstName." ".MemberLastName
| eval Name=Upper(Name)
| stats count(EmployeeIDNum) as Total by Name MemberLastName MemberFirstName MentorFirstName MentorLastName DOB EmployeeIDNum EmployeeGender LabelID ClientID

Which results in a table like this.

Name MemberLastName MemberFirstName MentorFirstName MentorLastName DOB EmployeeIDNum EmployeeGender LabelID ClientID Total
John Doe Doe John Leisha Wilker 11/22/2022 999999999999 MALE PARENT 11504455 2

0 Karma

New Member

Your xml is rather complex! I'm not sure you can use traditional spath approach here, at least because you are not able to "link" the extracted question/answer together.

Instead of writing a tons of regex, you can try this way

index="main" sourcetype="test" 
| rex max_match=1 field=_raw "<apol:(?<ClientID>ClientID)\>(?<clientid>.*?)</apol:ClientID>" 
| rex max_match=10 field=_raw "<apol:QuestionID>(?<question>.*?)</apol:QuestionID>" 
| rex max_match=10 field=_raw "<apol:Answer><.*?>(?<answer>.*?)<.*?></apol:Answer>" 
| eval tempid = mvzip(ClientID, clientid, "###")
| eval tempqa = mvzip(question, answer, "###")
| eval q_and_a = mvappend(tempid,tempqa)
| mvexpand q_and_a 
| rex field=q_and_a "(?<Question>.*)###(?<Answer>.*)" 
| table Question Answer

It outputs

Question            Answer
ClientID            11111111
ImpersonatorDetail  MEMBER
MemberFirstName     Jane
MemberLastName      Doe
Gender              FEMALE
DOB                 01/01/1911
EmployeeIDNum       35121212121212
0 Karma

Builder

Thanks for responding Paolo. I'll see if I can work my code and your code into something better.

0 Karma