How to create a regex for a log file which contain...

mssoni · ‎12-14-2022

Hello Team,

This is the first time I am posting a question and hope that I have explained it thoroughly.

I am trying to create a regex for a log file which contains multiple values throughout the log which required same field name. but splunk does not allows to use same field name again.

Here is the sample log:

/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9

Note: Text values are 4 char and Number contains 10 digits.

How can I move forward to achieve a field extraction and format like this?
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9

Thank You in Advance 😊

yuanliu · ‎12-14-2022

I am not sure what you mean by "Splunk does not allow to use the same field name again." (This is SPL, of course anything is possible®😃)

Because your field values are separated by a known, fixed string "/TXT1/TXT2", a literal solution would be

| eval samefield = split(_raw, "/TXT1/TXT2")
| eval samefield = mvfilter(len(samefield) != 0)
| eval samefield = mvmap(samefield, "/TXT1/TXT2" . samefield)

This is an emulation I used to test the above; you can play with it and compare with real data

| makeresults
| eval _raw  = "/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9"
``` data emulation above ```

There can be several variants based on fixed text values.

If you want to relax the condition that the text portion is known and fixed, you can use the text and number characteristics you described,

| rex max_match=0 "(?<samefield>(/\w{4}){2}(/d{3}){9})"

You can even generalize this to only require a total of 11 path segments:

| rex max_match=0 "(?<samefield>(/[^\/]+){11})"

mssoni · ‎12-14-2022

the provided sample is a single log and is not separated by fields, but I want to implement fields with the help of regex in this format.

/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9
/TXT1/TXT2/NMBR1/NMBR2/NMBR3/NMBR4/NMBR5/NMBR6/NMBR7/NMBR8/NMBR9

yuanliu · ‎12-14-2022

@mssoni Have you tried my code? Yes, I do understand that's one string and that's what I emulated: One very long string.

mssoni · ‎12-15-2022

Not yet, I am really confused how to place this in the query, this is the first time I am working on its front end,

yuanliu · ‎12-15-2022

really confused how to place this in the query

One way to help others help you unconfuse you is to illustrate the SPL you have at hand (simplify and anonymize as needed), then explain what confuses you, what error you get, or why the result is not what you wanted.

I already explained how I test my sample codes based on your sample data and description of your requirements. I assume that you know where the search window is. Just paste data emulation code above the manipulation code, and you can review the results. Like this:

Screen Shot 2022-12-15 at 10.08.30 PM.png

(In this example, I used the shorted code.) Does this look like what you need? Then, just replace the data emulation part with your own event search.

Hope this helps.

How to create a regex for a log file which contains multiple values throughout the log which required same field name?

field extraction

fields

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?