Splunk Search

Multiline Regex trouble - Can't get fields to be associated with keys

guilmxm
SplunkTrust
SplunkTrust

Hi,

I am in great troubles with a multilines events i'm trying to analyse, and associated required regex to extract fields.

An example of an event (sql query output):

---- Identification ----
Date :  Mon Apr 28 19:00:00 DFT 2014
Hostname :  MYHOST01
Script : RQ_TB.ksh
Version courante : 1.0
------------------------

   Database Connection Information

 Database server        = DB2/AIX64 9.5.3
 SQL authorization ID   = MYF0001
 Local database alias   = MYF0002

-- 
-- SUMMARY OF USER TABLE DATA SIZES
-- 

select CURRENT SERVER as DBNAME, CURRENT TIMESTAMP as CURRENT_TIMESTAMP, S.USER_DATA_L_SIZE_KB, DEC( (S.USER_DATA_L_SIZE_KB/1073741824.0), 31, 11 ) as USER_DATA_L_SIZE_TB, COALESCE( CEIL( DEC( (S.USER_DATA_L_SIZE_KB/1073741824.0), 31, 11 ) ), 1 ) as USER_DATA_L_ENTITLEMENT_REQ_TB from ( select ( sum(A.DATA_OBJECT_L_SIZE) + sum(A.LONG_OBJECT_L_SIZE) + sum(A.LOB_OBJECT_L_SIZE) + sum(XML_OBJECT_L_SIZE) ) as USER_DATA_L_SIZE_KB from SYSIBMADM.ADMINTABINFO as A, ( select TABSCHEMA, TABNAME, OWNER, OWNERTYPE, TYPE, STATUS, TABLEID, TBSPACEID from SYSCAT.TABLES where OWNERTYPE = 'U' and TYPE IN ('G', 'H', 'L', 'S', 'T', 'U')  ) as T where A.TABNAME = T.TABNAME and A.TABSCHEMA = T.TABSCHEMA ) as S 

DBNAME             CURRENT_TIMESTAMP          USER_DATA_L_SIZE_KB  USER_DATA_L_SIZE_TB               USER_DATA_L_ENTITLEMENT_REQ_TB   
------------------ -------------------------- -------------------- --------------------------------- ---------------------------------
MYF0002           2014-04-28-19.00.01.768168            110325200                     0.10274834930                                1.

  1 record(s) selected.


-- 
-- BREAKDOWN OF USER TABLE DATA SIZES
-- 

select rtrim(A.TABSCHEMA) SCHEMA, rtrim(A.TABNAME) TABLENAME, sum(A.DATA_OBJECT_L_SIZE) as DATA_OBJECT_L_SIZE_KB, sum(A.LONG_OBJECT_L_SIZE) as LONG_OBJECT_L_SIZE_KB, sum(A.LOB_OBJECT_L_SIZE) as LOB_OBJECT_L_SIZE_KB, sum(XML_OBJECT_L_SIZE) as XML_OBJECT_L_SIZE_KB, ( sum(A.DATA_OBJECT_L_SIZE) + sum(A.LONG_OBJECT_L_SIZE) + sum(A.LOB_OBJECT_L_SIZE) + sum(XML_OBJECT_L_SIZE) ) as USER_DATA_L_SIZE_KB, T.COMPRESSION, T.PCTPAGESSAVED as Taux_de_compression from SYSIBMADM.ADMINTABINFO as A, ( select TABSCHEMA, TABNAME, OWNER, OWNERTYPE, TYPE, STATUS, COMPRESSION, TABLEID, TBSPACEID, PCTPAGESSAVED from SYSCAT.TABLES where OWNERTYPE = 'U' and TYPE IN ('G', 'H', 'L', 'S', 'T', 'U')  ) as T where A.TABNAME = T.TABNAME and A.TABSCHEMA = T.TABSCHEMA group by A.TABSCHEMA, A.TABNAME, T.COMPRESSION, T.PCTPAGESSAVED order by A.TABSCHEMA, A.TABNAME 

SCHEMA                                                                                                                           TABLENAME                                                                                                                        DATA_OBJECT_L_SIZE_KB LONG_OBJECT_L_SIZE_KB LOB_OBJECT_L_SIZE_KB XML_OBJECT_L_SIZE_KB USER_DATA_L_SIZE_KB  COMPRESSION TAUX_DE_COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------------------- --------------------- -------------------- -------------------- -------------------- ----------- -------------------
SCHEMA01                                                                                                                           ADVISE_INDEX                                                                                                                                       128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           ADVISE_INSTANCE                                                                                                                                    128                     0                    0                    0                  128 N                            -1
SCHEMA01                                                                                                                           ADVISE_MQT                                                                                                                                         128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           ADVISE_PARTITION                                                                                                                                   128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           ADVISE_TABLE                                                                                                                                       128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           ADVISE_WORKLOAD                                                                                                                                    128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_ARGUMENT                                                                                                                                   128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_DIAGNOSTIC                                                                                                                                 128                     0                    0                    0                  128 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_DIAGNOSTIC_DATA                                                                                                                            128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_INSTANCE                                                                                                                                   128                     0                    0                    0                  128 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_OBJECT                                                                                                                                     128                     0                    0                    0                  128 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_OPERATOR                                                                                                                                   128                     0                    0                    0                  128 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_PREDICATE                                                                                                                                  128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_STATEMENT                                                                                                                                  128                     0                  144                    0                  272 N                            -1
SCHEMA01                                                                                                                           EXPLAIN_STREAM                                                                                                                                     128                     0                  144                    0                  272 N                            -1
SCHEMA02                                                                                                                             TE_DEC_ENC_REM                                                                                                                                14523392                     0                    0                    0             14523392 N                             0

Because i need to be able to extract all data within the event, i'm indexing it in multi-line events with config as:

props.conf:

[db2compress]

# your settings
MAX_EVENTS=100000
NO_BINARY_CHECK=1
TIME_FORMAT=%a %b %d %H:%M:%S DFT %Y
TIME_PREFIX=Date :  

REPORT-extract_regfields = regfields

EXTRACT-hostname = (?i)Hostname :  (?P<HOSTNAME>\w+)
EXTRACT-database_server = (?i)Database server        = (?P<DATABASE_SERVER>[0-9a-zA-Z/]+)
EXTRACT-sql_auth_id = (?i)SQL authorization ID   = (?P<SQL_AUTH_ID>\w+)
EXTRACT-database_alias = (?i)Local database alias   = (?P<DATABASE_ALIAS>\w+)
EXTRACT-entitlement = (?im)^\w+\s+\d+\-\d+\-\d+\-\d+\.\d+\.\d+\.\d+\s+\d+\s+\d+\.\d+\s+(?P<ENTITLEMENT>[^\.]+)
EXTRACT-size_KB = (?im)^(?:[^\.\n]*\.){3}\d+\s+(?P<SIZE_KB>[^ ]+)
EXTRACT-size_TB = (?im)^\w+\s+\d+\-\d+\-\d+\-\d+\.\d+\.\d+\.\d+\s+\d+\s+(?P<SIZE_TB>[^ ]+)

Transforms.conf:

[regfields]
REGEX = (?im)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)
MV_ADD = True

Events are being indexed with success as multi-lines, and everything could seem to be ok.

BUT, the regex used to extract fields from the schema detail:

(?im)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)

Does not to seem to do the job, when i try to achieve some simple stats with Splunk, i get impossible results (such as a simple stats count(TABLENAME) by SCHEMA)

When i check in details with a "stats values(DATA_OBJECT_L_SIZE_KB) by SCHEMA,TABLENAME" for example, i see the values contains every values of the full event field and not the result of the association between keys (and so the value for this table only) as it should be

when i achieve a "table SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB" for example, then the data is correct but even a stats after the table command reports bad results.

So i think the issue in my regex, but this is driving me crazy and i can't get to know why...

If i remove the multi-line mode of the regex with such command:

index=db2compress sourcetype=db2compress
| rex max_match=1 "(?m-s)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)"

Then i off course only get the first result, so the multi line is required, i'm thinking in something with back line return or something or like that, but everything i tried has failed.

Thank you VERY VERY much for any help !

0 Karma
1 Solution

guilmxm
SplunkTrust
SplunkTrust

Ok, got it to work 🙂 !!!

Finally, using mvexpand works great but you need filter fields before the mvexpand to get rid of that memory issue which truncates results, mainly for people with the same need;

<search stuff> | fields <my interesting fields > | fields - _raw | mvexpand ...

In my case: (for the example purpose i let the rex in the example but i moved it to transforms with MV_ADD to enable multi-value mode)

index=db2compress sourcetype=db2compress
| rex max_match=1000 field=_raw "(?m)(?P<AAA>\w+\s+\w+\s+\w+\s+\d+\s+\d+\s+\d+\s+\d+\s+\w+\s+[\-]*\d+)"
| fields AAA | fields - _raw
| mvexpand AAA | makemv AAA
| eval SCHEMA=mvindex(AAA,0)
| eval TABLENAME=mvindex(AAA,1)
| eval DATA_OBJECT_L_SIZE_KB=mvindex(AAA,2)
| eval LONG_OBJECT_L_SIZE_KB=mvindex(AAA,3)
| eval LOB_OBJECT_L_SIZE_KB=mvindex(AAA,4)
| eval XML_OBJECT_L_SIZE_KB=mvindex(AAA,5)
| eval USER_DATA_L_SIZE_KB=mvindex(AAA,6)
| eval COMPRESSION=mvindex(AAA,7)
| eval TAUX_DE_COMPRESSION=mvindex(AAA,8)
| stats values(SCHEMA) As SCHEMA, values(DATA_OBJECT_L_SIZE_KB) As DATA_OBJECT_L_SIZE_KB, values(LONG_OBJECT_L_SIZE_KB) As LONG_OBJECT_L_SIZE_KB, values(LOB_OBJECT_L_SIZE_KB) As LOB_OBJECT_L_SIZE_KB, values(XML_OBJECT_L_SIZE_KB) As XML_OBJECT_L_SIZE_KB, values(USER_DATA_L_SIZE_KB) As USER_DATA_L_SIZE_KB, values(COMPRESSION) As COMPRESSION, values(TAUX_DE_COMPRESSION) As TAUX_DE_COMPRESSION by TABLENAME
| fields SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB,LONG_OBJECT_L_SIZE_KB,LOB_OBJECT_L_SIZE_KB,XML_OBJECT_L_SIZE_KB,USER_DATA_L_SIZE_KB,COMPRESSION,TAUX_DE_COMPRESSION

View solution in original post

guilmxm
SplunkTrust
SplunkTrust

Ok, got it to work 🙂 !!!

Finally, using mvexpand works great but you need filter fields before the mvexpand to get rid of that memory issue which truncates results, mainly for people with the same need;

<search stuff> | fields <my interesting fields > | fields - _raw | mvexpand ...

In my case: (for the example purpose i let the rex in the example but i moved it to transforms with MV_ADD to enable multi-value mode)

index=db2compress sourcetype=db2compress
| rex max_match=1000 field=_raw "(?m)(?P<AAA>\w+\s+\w+\s+\w+\s+\d+\s+\d+\s+\d+\s+\d+\s+\w+\s+[\-]*\d+)"
| fields AAA | fields - _raw
| mvexpand AAA | makemv AAA
| eval SCHEMA=mvindex(AAA,0)
| eval TABLENAME=mvindex(AAA,1)
| eval DATA_OBJECT_L_SIZE_KB=mvindex(AAA,2)
| eval LONG_OBJECT_L_SIZE_KB=mvindex(AAA,3)
| eval LOB_OBJECT_L_SIZE_KB=mvindex(AAA,4)
| eval XML_OBJECT_L_SIZE_KB=mvindex(AAA,5)
| eval USER_DATA_L_SIZE_KB=mvindex(AAA,6)
| eval COMPRESSION=mvindex(AAA,7)
| eval TAUX_DE_COMPRESSION=mvindex(AAA,8)
| stats values(SCHEMA) As SCHEMA, values(DATA_OBJECT_L_SIZE_KB) As DATA_OBJECT_L_SIZE_KB, values(LONG_OBJECT_L_SIZE_KB) As LONG_OBJECT_L_SIZE_KB, values(LOB_OBJECT_L_SIZE_KB) As LOB_OBJECT_L_SIZE_KB, values(XML_OBJECT_L_SIZE_KB) As XML_OBJECT_L_SIZE_KB, values(USER_DATA_L_SIZE_KB) As USER_DATA_L_SIZE_KB, values(COMPRESSION) As COMPRESSION, values(TAUX_DE_COMPRESSION) As TAUX_DE_COMPRESSION by TABLENAME
| fields SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB,LONG_OBJECT_L_SIZE_KB,LOB_OBJECT_L_SIZE_KB,XML_OBJECT_L_SIZE_KB,USER_DATA_L_SIZE_KB,COMPRESSION,TAUX_DE_COMPRESSION

guilmxm
SplunkTrust
SplunkTrust

I am having s starting sucess by extracting full lines with a regex, then playing with mxexpand and others, such :

index=db2compress sourcetype=db2compress
| rex max_match=1000 field=_raw "(?m)(?P<AAA>\w+\s+\w+\s+\w+\s+\d+\s+\d+\s+\d+\s+\d+\s+\w+\s+[\-]*\d+)"
| mvexpand AAA | makemv AAA
| eval SCHEMA=mvindex(AAA,0)
| eval TABLENAME=mvindex(AAA,1)
| eval DATA_OBJECT_L_SIZE_KB=mvindex(AAA,2)
| eval LONG_OBJECT_L_SIZE_KB=mvindex(AAA,3)
| eval LOB_OBJECT_L_SIZE_KB=mvindex(AAA,4)
| eval XML_OBJECT_L_SIZE_KB=mvindex(AAA,5)
| eval USER_DATA_L_SIZE_KB=mvindex(AAA,6)
| eval COMPRESSION=mvindex(AAA,7)
| eval TAUX_DE_COMPRESSION=mvindex(AAA,8)
| stats values(SCHEMA) As SCHEMA, values(DATA_OBJECT_L_SIZE_KB) As DATA_OBJECT_L_SIZE_KB, values(LONG_OBJECT_L_SIZE_KB) As LONG_OBJECT_L_SIZE_KB, values(LOB_OBJECT_L_SIZE_KB) As LOB_OBJECT_L_SIZE_KB, values(XML_OBJECT_L_SIZE_KB) As XML_OBJECT_L_SIZE_KB, values(USER_DATA_L_SIZE_KB) As USER_DATA_L_SIZE_KB, values(COMPRESSION) As COMPRESSION, values(TAUX_DE_COMPRESSION) As TAUX_DE_COMPRESSION by TABLENAME
| fields SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB,LONG_OBJECT_L_SIZE_KB,LOB_OBJECT_L_SIZE_KB,XML_OBJECT_L_SIZE_KB,USER_DATA_L_SIZE_KB,COMPRESSION,TAUX_DE_COMPRESSION

That the way the result is correct, but i'm having issues with the mvexpand command and memory limitations.

Does any one knows an easiest method ?

0 Karma

lmyrefelt
Builder

I dont think m = multiline is enough you need a g as in global as well to for the regex to hit and extract multiple fields from this event if it is to span over multiple lines / events

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Help Please 🙂 I really need to get this to work, and probably i shall not not be far but...

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Perhaps i misunderstood your last comment.

Some fields can have multi values when they are common to others:

--> In the example, the schema has multiple tables, so the TABLENAME field has multiple values

Some fields cannot have multiple values, in the example above, the size of the table (for ex DATA_OBJECT_L_SIZE_KB) has only one value associated with its table.

But if you do a "stats by SCHEMA", multiple values shall be retrieved

With structured data such as CSV, Splunks does this natively, i need the same using regex to extract fields

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Well not necessary, i just thought this would be the solution. (extracting fields by fields on line by line)

My goal is to be able to have the same behavior as it would have been structured data such as csv:

SCHEMA,TABLENAME,VALUE
...

And to be able to issue statistical analysis.

How would you extract the all line using regex ?

I think i could then use mvindex or other to split the line into multiple fields

0 Karma

lmyrefelt
Builder

sorry ... Your problem is that you dont want splunk to do extractions on the whole multiline event ? Your regex seems to hit and extract all the fields ? So for each event you get multiple values for one field ? And this is not what you want ?

If this is not what you want, perhaps you should add more captures groups to your regex and name the fields with field_1/field_2 etc etc ?

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Thanks for answering

Can you give me more details, do you mean adding "/g" add the end of the regex ?

With:
| rex "(?im)^(?P\w+)\s+(?P\w+)\s+(?P\d+)\s+(?P\d+)\s+(?P\d+)\s+(?P\d+)\s+(?P\d+)\s+(?P\w+)\s+(?P[-]*\d+)/g"

I don't get any hit

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...