Hi All -
I need help with a fairly complex search i am being asked to build by a user.
The ask is that the below fields are extracted from this XML sample:
[2024-09-10 07:27:46.424 (TID:14567876)] <subMerchantData> [2024-09-10 07:27:46.424 (TID:dad4d2e725854048)] <pfId>499072</pfId> [2024-09-10 07:27:46.424 (TID:145767627)] <subName>testname</subName> [2024-09-10 07:27:46.424 (TID:dad4d2e725854048)] <subId>123456</subId> [2024-09-10 07:27:46.424 (TID:145767627)] <subStreet>1 TEST LANE</subStreet> [2024-09-10 07:27:46.424 (TID:145767627)] <subCity>HongKong</subCity> [2024-09-10 07:27:46.424 (TID:145767627)] <subState>HK</subState> [2024-09-10 07:27:46.424 (TID:dad4d2e725854048)] <subCountryCode>344</subCountryCode> [2024-09-10 07:27:46.424 (TID:dad4d2e725854048)] <subPostalCode>1556677</subPostalCode> [2024-09-10 07:27:46.424 (TID:dad4d2e725854048)] <subTaxId>-15566777</subTaxId> [2024-09-10 07:27:46.424 (TID:14567876)] </subMerchantData>
This search doesn't pull anything back, i believe because they are not extracted fields
index=test merchantCode=MERCHANTCODE1 subCountryCode=* subState=* orderCode=* | stats count by merchantCode subCountryCode subState orderCode
In addition to these fields SubState, SubCountryCode, SubCity, PFID, SubName, SubID, SubPostalCode, SubTaxID
However i'm not sure how this can be fulfilled, could anyone support with writing a search that would allow me to extract this info within a stats count?
Thanks,
Tom
First things first. What does your raw data looks like? The sample you pasted - is it one event or are these multiple events? Where from and how are you getting this data? Because it looks as if it was XML horribly butchered by spliting into single lines and sending each line separately. And that's first thing that should be fixed instead of trying to do walkarounds in search time.
Hi @tomjb94 ,
yes obviously, yu have to extract the fields using regexes.
I can help you, with the following regex that extract all the values but the orderCode, that I don't know with part of the logs is, if you want my help about this, please, highlight this value in your logs using bold..
Anyway, you can use a search like the following (except orderCode):
index=test
| rex "^\[2024-09-10 07:27:46\.424 \(TID:(?<merchantCode>\d+).*\<subState\>(?<subState>\w+).*\<subCountryCode\>(?<subCountryCode>\d+)"
| search merchantCode=MERCHANTCODE1 subCountryCode=* subState=*
| stats count by merchantCode subCountryCode subState
You can test the regex at https://regex101.com/r/KZMUxp/1
Then it isn't so clear for me if you need also the other fields (SubState, SubCountryCode, SubCity, PFID, SubName, SubID, SubPostalCode, SubTaxID).
If yes, you have to extract all of them, if you want my help, please indicate the part of log of each of them.
Ciao.
Giuseppe
Hi Giussepe, Many thanks for your response, its greatly appreciated. I need the rex to be dynamic regardless of the particular timestamp in the original message i sent, its going to be a saved search. In addition, when i run this i get 0 results despite running exactly within the timestamp of that particular message in Splunk. I think this search may be quite expensive on our indexers, so for now i'll just get this working with the existing extracted fields. Thanks again, Tom
Hi @tomjb94 ,
good for you, see next time!
let us know if we can help you more, or, please, accept one answer for the other people of Community.
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉