Hi,
I'm indexing events in JSON format and I need a way of extracting into individual fields the values broken up by the pipe in the 'Subject' field seen below;
RecipientAddress: bla@bla.com
SenderAddress: fred@fred.com
Size: 201828
Status: FilteredAsSpam
Subject: 1|fdbe21c9-xxxxx|195.168.1.1|Comms@fred.com|([Ext]Hi, join us for the 10-year roundup) 12/11/2020 8:21:14 AM
ToIP: null
I seem to be struggling to get a regex to work, not sure whether I need to take into account the JSON formatting?
Thanks.
Sorry - Silly mistake in my regex (which I have now corrected) try the following. Serves me right for not testing it!
rex field=Subject "(?P<number>[^\|]+)\|(?P<id>[^\|]+)\|(?P<ip>[^\|]+)\|(?P<email>[^\|]+)\|(?P<subject>[^$]+)"
.
Perfect. Works like a charm! Many thanks.
I'm guessing that you have copied and pasted that example (with redaction) from the event view in search results? (please use the </> code formatter as it helps preserve formatting)
If you are applying regex on the _raw field, you will need to account for the json formatting that was in the original event so your regex might need to begin with something like :
"Subject\": ....etc..."
However, if its well formed json (such that it shows nicely in search results) and you are doing the extraction in a search you can use spath to pull out the fields for you, so then you can apply the regex just to the Subject field.
your search...|spath|table Subject
from here, assuming the | delimited fields are consistent, a simple rex command should work
rex field=Subject "(?P<number>[^\|]+)\|(?P<id>[^\|]+)\|(?P<ip>[^\|]+)\|(?P<email>[^\|]+)\|(?P<subject>[^$]+)"
So yeh, from well formed JSON, trying to run the query from a Search, where the Subject field is being extracted as expected.
I tried your Regex and it didnt seem to like it. The first field should be the 1 after 'Subject :' and before the first pipe, the second field the message ID in between the first and second pipe etc.
Whenever I tried doing it myself, it kept trying to grab the first character before any of the pipes, kinda ruining things!
Sorry - Silly mistake in my regex (which I have now corrected) try the following. Serves me right for not testing it!
rex field=Subject "(?P<number>[^\|]+)\|(?P<id>[^\|]+)\|(?P<ip>[^\|]+)\|(?P<email>[^\|]+)\|(?P<subject>[^$]+)"
.