I'm attempting to create a field extraction from the web UI (I'm not an admin and don't have access to "*.conf" files) using regex on a field that's not "_raw"). I can't seem to find any documentation on it, but I see some existing extractions with the format:
"FieldToExtractFrom":"(?<ExtractedField>\d+)"
Using that same format, though, I don't get the automatic extraction, even though the regex DOES work if I run it manually at search time with "rex". Is there some way to specify an inline regex field extraction from a field other than "_raw"?
EDIT: Field transformations did the trick, since it looks like extractions can only work on "_raw".
Extract a field from an existing field automatically:
You can use a field transform to automatically extract a new field from an existing auto-extracted field (this lets you choose a source key to extract from).
In the GUI, go to Settings --> Fields, then Field Transformations. Click New, then fill in the fields and click Save. The transformation will work automatically for all new searches in the relevant app context.
Extract a field from the raw data automatically:
If you need to automatically extract a new field from _raw in the first place, we would recommend using automatic extractions. Note that this will work on _raw, so your regex will be a little different - it will have to match on data from the whole event, so you might need something closer to FieldToExtractFrom:(?<ExtractedField>\d+)
as your regex (using your example).
You can access this from the GUI by going to Settings --> Fields, then Field Extractions. Click New, then fill in the fields and click Save. The extraction will work automatically for all new searches in the relevant app context.
Extract data inline from a field with rex:
For once off extractions, you can use rex
inline, like so:
| rex field=FieldToExtractFrom "(?<new_field_name>regexhere)"
As an example, say you wanted to extract the first name from a name field, where you have name="First Last"
, you could use:
| rex field=name "^(?<first_name>\w+?)\s"
Which would extract the first name and put it in a new field called first_name
.
If you have an example to post then I might be able to give a more specific answer for your use case, but I hope this helps.
Extract a field from an existing field automatically:
You can use a field transform to automatically extract a new field from an existing auto-extracted field (this lets you choose a source key to extract from).
In the GUI, go to Settings --> Fields, then Field Transformations. Click New, then fill in the fields and click Save. The transformation will work automatically for all new searches in the relevant app context.
Extract a field from the raw data automatically:
If you need to automatically extract a new field from _raw in the first place, we would recommend using automatic extractions. Note that this will work on _raw, so your regex will be a little different - it will have to match on data from the whole event, so you might need something closer to FieldToExtractFrom:(?<ExtractedField>\d+)
as your regex (using your example).
You can access this from the GUI by going to Settings --> Fields, then Field Extractions. Click New, then fill in the fields and click Save. The extraction will work automatically for all new searches in the relevant app context.
Extract data inline from a field with rex:
For once off extractions, you can use rex
inline, like so:
| rex field=FieldToExtractFrom "(?<new_field_name>regexhere)"
As an example, say you wanted to extract the first name from a name field, where you have name="First Last"
, you could use:
| rex field=name "^(?<first_name>\w+?)\s"
Which would extract the first name and put it in a new field called first_name
.
If you have an example to post then I might be able to give a more specific answer for your use case, but I hope this helps.
I want to do this automatically at every search, though, rather than stapling the "rex" command to every search. My understanding was that field extractions were the right way to do this, but maybe not?
Yes, we would definitely recommend using automatic extractions if possible. Note that this will work on _raw, so your regex will be a little different.
You can access this from the GUI by going to Settings --> Fields, then Field Extractions. Click New, then fill in the fields and click Save. The extraction will work automatically for all new searches in the relevant app context.
I see - and it can only work on _raw? I was hoping I could use a specific field to pull from, so I'm not married to some strict ordering in the future...
You can use a field transform to achieve the result you want. Instead of doing a new field extraction, you can go to Settings --> Fields, then Field Transformations. This lets you choose a
source key to extract from.
That said, I still don't quite follow regarding how pulling from _raw instead of a specific field limits you to a strict order for the future. Are you able to explain this in a bit more detail, or even better, give an example? If anything, pulling from _raw gives you more flexibility, hence why the GUI extraction is the way it is.
(Using a CSV with headers as input data)
The default construction of the regex using the field extractor UI was to essentially skip the first two columns of the CSV, and look for the fields to extract in the third column. If I added another column of data that went before the field I wanted to use for the regex, it would break, unlike if I'm scoping it right to the field name (which will be correctly identified by the headers in the CSV).
Ahh yes, if you're using the field extractor UI, then it will often create regexes that rely on fixed data positioning. As an alternative, you can rewrite the regex to only look at the column with that header, in which case it won't matter whereabouts the column is placed in the CSV. I completely understand what you mean though, especially if you're relying on the field extractor UI.
I've updated my original answer and the one above with some information on field transformations, which I think will solve your problem.
Got it! One more question, I think 🙂 Looking at transformations, I'm not understanding how "Format" comes into play. I have the following regex:
(?:\d*\.){2}(?<BuildNumber>\d+)\.(?<BuildRevision>\d+)\.(?<BuildArch>[^\.]*)\.(?<BuildBranch>[^\.]*)
Format says to "Specify the event format in terms of field names and values", but I'm already naming the fields in my regex extraction.
If you're already naming your fields in the extraction, then you don't need to enter anything in the format box 🙂
The format box is an alternative, so using your example, you could have this instead:
(?:\d*\.){2}(\d+)\.(\d+)\.([^\.]*)\.([^\.]*)
In which case you'd have to put: BuildNumber::$1 BuildRevision::$2 BuildArch::$3 BuildBranch::$4
in the format box.
@doweaver If this solved your problem it'd be great if you could mark this as Answered, just to help others in the future. Cheers!