Getting Data In

Regex extraction with different field delimiters

michael_vi
Path Finder

Hi all,

I have a very specifc regex extraction (search time extraction)

_raw data example:

| union
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228||||||| ~newType|||"]
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228|||||||oldType~newType|||"]
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228||||||||||"]

I want to extract 2 fields from position 19 of the pipe and until 20th position that include (or may not) 2 fields that need to be extracted

new: comes right after 19th | and before ~

old: comes after ~ and before 20th |

There are 3 option that the data may appear:

|<space>~newType|

|oldType~newType|

|<null><null>|

The problem that I have is when no data is presented (3rd option) , then the props.conf doesn't parse it

In the end I need to have 2 fields, based on the example above:

oldnew
<space>newType
oldTypenewType
<space><space>

 

props.conf 

 

[user_activity]
REPORT-bb_extract = REPORT-bb_extract
EXTRACT-oldAccountType = ^(?:[^|]*\|){19}(?<old>[^\~|\|]*)
EXTRACT-newAccountType = (?:[^~]*\~){1}(?<new>[^|]*)

 

transforms.conf

 

[REPORT-bb_extract]
KEEP_EMPTY_VALS = true
DELIMS = "|"
FIELDS = "DATE","TIME","ip","branch","appName","userName","actionID","actionType","actionStatus","actionName","action","srcPortfolioId","refID","currency","TotalAmount","secondPortfolioId","multiTransfer","field18","field19","id2","field21","new","old"

 

 

 

 

 

 

1. How can I extract the field that may or may not include some value

2. How can I fix the the second regex to start with ~ after 19th | 

Thanks

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| rex "([^\|]*\|){19}(?<oldType>[^~\|]*)~?(?<newType>[^\|]*)"
0 Karma

michael_vi
Path Finder

Good idea to combine...

Is it possible to replace the NULL with space w/o using SPL? Within the porps.conf

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @michael_vi,

using my search, you replace the NULLs with space in SPL,

I'm not sure that's possible to do it in props.conf, but you could create a calculated field that automatically transforms null in space.

Ciao.

Giuseppe

0 Karma

michael_vi
Path Finder

@gcusello 

The problem is that, once props can't extract the field. it won't be shown in extracted fileds.

so..  | fillnull value=NULL will not work

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @michael_vi,

using this search, you have two evals (that you can save as calculated fields) that makes the same job:

| union
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228||||||| ~newType|||"]
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228|||||||oldType~newType|||"]
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228||||||||||"]
| rex "^([^\|]*\|){19}(?<OldType>[^\~]*)\~(?<NewType>[^\|]*)"
| eval 
   OldType=if(isnull(OldType)," ",OldType),
   NewType=if(isnull(NewType)," ",NewType)
| table OldType NewType

So you can have them in interesting fields.

Ciao.

Giuseppe

0 Karma

michael_vi
Path Finder

Something is missing.

The 'NewType' filed may or may not start with the ~, if no data then it will start with the |
In the attached picture the field 'NewType' is missing for the event that has only pipes.

michael_vi_0-1704023161369.png

 

0 Karma

michael_vi
Path Finder

Managed to fix the old and new in props.conf:

| rex "^([^\|]*\|){19}(?<OldType>[^\~|\|]*)\~|\|"
| rex "^([^\|]*\|){19}.+~(?<NewType>[^\|]*)\|"

Still having trouble with the || (null values)

0 Karma

dtburrows3
Builder

So the approach I took here is to use an EXTRACT in props.conf to target the entire value between pipe 19 and pipe 20.
And then use EVALS is props to parse out that extracted value depending on its format.
    Edit:
    Noticed that there is no need to add an additional EXTRACT to props to get the full value because there is a field already extracted named 'id2' doing the same thing. So an even simpler way of doing this would be,

 

 

props.conf entry for forcing empty string if they are null.

 

[user_activity]
...
EVAL-oldType = if(NOT (match('id2', "^\s*$") OR isnull(id2)), mvindex(split(id2, "~"), 0), "")
EVAL-newType = if(NOT (match('id2', "^\s*$") OR isnull(id2)), mvindex(split(id2, "~"), 1), "")

 

Evidence of the null values being forced to empty strings

dtburrows3_0-1704044915856.png


props.conf entries for forcing to single whitespace if values are null from _raw

[user_activity]
...
EVAL-oldType = if(NOT (match(id2, "^\s*$") OR isnull(id2)), if(mvindex(split(id2, "~"), 0)=="", " ", mvindex(split(id2, "~"), 0)), " ")
EVAL-newType = if(NOT (match(id2, "^\s*$") OR isnull(id2)), if(mvindex(split(id2, "~"), 1)=="", " ", mvindex(split(id2, "~"), 1)), " ")

Evidence for the desired values are single whitespaces.

dtburrows3_1-1704045104314.png

 

 


I believe this works against the example you provided and you can see in the screenshot below that I have evaluated some boolean value inline in the search to show if the values are actually null or not.

dtburrows3_0-1704041789793.png

 

 




0 Karma

michael_vi
Path Finder

Looks good. I'll check it.

I also thought of using EVAL after extraction to replace NULLs and spaces:

EVAL-OldType=if(isnull(OldType) OR OldType = " ", "noData", OldType)
EVAL-NewType=if(isnull(NewType) OR NewType = " ", "noData", NewType)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @michael_vi,

please try this:

| union
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228||||||| ~newType|||"]
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228|||||||oldType~newType|||"]
[| makeresults
| eval _raw = "Dec-28-2023|12:05:46,836|10.150.6.118|148:|some branch|uswer_name|d168a8b9-5647-421b-97ba-f2aa3bceb69a|1:Creation page stack|Success|action_portfolio_forms_c_save.action|8970:PORTFOLIO ONBOARDING - FORMS CAPTURE||3065254228||||||||||"]
| rex "^([^\|]*\|){19}(?<OldType>[^\~]*)\~(?<NewType>[^\|]*)"
| fillnull value=" " OldType
| fillnull value=" " NewType
| table OldType NewType

You can test the regex at https://regex101.com/r/gETm2b/1

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...