Hi,
I am in great troubles with a multilines events i'm trying to analyse, and associated required regex to extract fields.
An example of an event (sql query output):
---- Identification ----
Date : Mon Apr 28 19:00:00 DFT 2014
Hostname : MYHOST01
Script : RQ_TB.ksh
Version courante : 1.0
------------------------
Database Connection Information
Database server = DB2/AIX64 9.5.3
SQL authorization ID = MYF0001
Local database alias = MYF0002
--
-- SUMMARY OF USER TABLE DATA SIZES
--
select CURRENT SERVER as DBNAME, CURRENT TIMESTAMP as CURRENT_TIMESTAMP, S.USER_DATA_L_SIZE_KB, DEC( (S.USER_DATA_L_SIZE_KB/1073741824.0), 31, 11 ) as USER_DATA_L_SIZE_TB, COALESCE( CEIL( DEC( (S.USER_DATA_L_SIZE_KB/1073741824.0), 31, 11 ) ), 1 ) as USER_DATA_L_ENTITLEMENT_REQ_TB from ( select ( sum(A.DATA_OBJECT_L_SIZE) + sum(A.LONG_OBJECT_L_SIZE) + sum(A.LOB_OBJECT_L_SIZE) + sum(XML_OBJECT_L_SIZE) ) as USER_DATA_L_SIZE_KB from SYSIBMADM.ADMINTABINFO as A, ( select TABSCHEMA, TABNAME, OWNER, OWNERTYPE, TYPE, STATUS, TABLEID, TBSPACEID from SYSCAT.TABLES where OWNERTYPE = 'U' and TYPE IN ('G', 'H', 'L', 'S', 'T', 'U') ) as T where A.TABNAME = T.TABNAME and A.TABSCHEMA = T.TABSCHEMA ) as S
DBNAME CURRENT_TIMESTAMP USER_DATA_L_SIZE_KB USER_DATA_L_SIZE_TB USER_DATA_L_ENTITLEMENT_REQ_TB
------------------ -------------------------- -------------------- --------------------------------- ---------------------------------
MYF0002 2014-04-28-19.00.01.768168 110325200 0.10274834930 1.
1 record(s) selected.
--
-- BREAKDOWN OF USER TABLE DATA SIZES
--
select rtrim(A.TABSCHEMA) SCHEMA, rtrim(A.TABNAME) TABLENAME, sum(A.DATA_OBJECT_L_SIZE) as DATA_OBJECT_L_SIZE_KB, sum(A.LONG_OBJECT_L_SIZE) as LONG_OBJECT_L_SIZE_KB, sum(A.LOB_OBJECT_L_SIZE) as LOB_OBJECT_L_SIZE_KB, sum(XML_OBJECT_L_SIZE) as XML_OBJECT_L_SIZE_KB, ( sum(A.DATA_OBJECT_L_SIZE) + sum(A.LONG_OBJECT_L_SIZE) + sum(A.LOB_OBJECT_L_SIZE) + sum(XML_OBJECT_L_SIZE) ) as USER_DATA_L_SIZE_KB, T.COMPRESSION, T.PCTPAGESSAVED as Taux_de_compression from SYSIBMADM.ADMINTABINFO as A, ( select TABSCHEMA, TABNAME, OWNER, OWNERTYPE, TYPE, STATUS, COMPRESSION, TABLEID, TBSPACEID, PCTPAGESSAVED from SYSCAT.TABLES where OWNERTYPE = 'U' and TYPE IN ('G', 'H', 'L', 'S', 'T', 'U') ) as T where A.TABNAME = T.TABNAME and A.TABSCHEMA = T.TABSCHEMA group by A.TABSCHEMA, A.TABNAME, T.COMPRESSION, T.PCTPAGESSAVED order by A.TABSCHEMA, A.TABNAME
SCHEMA TABLENAME DATA_OBJECT_L_SIZE_KB LONG_OBJECT_L_SIZE_KB LOB_OBJECT_L_SIZE_KB XML_OBJECT_L_SIZE_KB USER_DATA_L_SIZE_KB COMPRESSION TAUX_DE_COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------------------- --------------------- -------------------- -------------------- -------------------- ----------- -------------------
SCHEMA01 ADVISE_INDEX 128 0 144 0 272 N -1
SCHEMA01 ADVISE_INSTANCE 128 0 0 0 128 N -1
SCHEMA01 ADVISE_MQT 128 0 144 0 272 N -1
SCHEMA01 ADVISE_PARTITION 128 0 144 0 272 N -1
SCHEMA01 ADVISE_TABLE 128 0 144 0 272 N -1
SCHEMA01 ADVISE_WORKLOAD 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_ARGUMENT 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_DIAGNOSTIC 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_DIAGNOSTIC_DATA 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_INSTANCE 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_OBJECT 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_OPERATOR 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_PREDICATE 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_STATEMENT 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_STREAM 128 0 144 0 272 N -1
SCHEMA02 TE_DEC_ENC_REM 14523392 0 0 0 14523392 N 0
Because i need to be able to extract all data within the event, i'm indexing it in multi-line events with config as:
props.conf:
[db2compress]
# your settings
MAX_EVENTS=100000
NO_BINARY_CHECK=1
TIME_FORMAT=%a %b %d %H:%M:%S DFT %Y
TIME_PREFIX=Date :
REPORT-extract_regfields = regfields
EXTRACT-hostname = (?i)Hostname : (?P<HOSTNAME>\w+)
EXTRACT-database_server = (?i)Database server = (?P<DATABASE_SERVER>[0-9a-zA-Z/]+)
EXTRACT-sql_auth_id = (?i)SQL authorization ID = (?P<SQL_AUTH_ID>\w+)
EXTRACT-database_alias = (?i)Local database alias = (?P<DATABASE_ALIAS>\w+)
EXTRACT-entitlement = (?im)^\w+\s+\d+\-\d+\-\d+\-\d+\.\d+\.\d+\.\d+\s+\d+\s+\d+\.\d+\s+(?P<ENTITLEMENT>[^\.]+)
EXTRACT-size_KB = (?im)^(?:[^\.\n]*\.){3}\d+\s+(?P<SIZE_KB>[^ ]+)
EXTRACT-size_TB = (?im)^\w+\s+\d+\-\d+\-\d+\-\d+\.\d+\.\d+\.\d+\s+\d+\s+(?P<SIZE_TB>[^ ]+)
Transforms.conf:
[regfields]
REGEX = (?im)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)
MV_ADD = True
Events are being indexed with success as multi-lines, and everything could seem to be ok.
BUT, the regex used to extract fields from the schema detail:
(?im)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)
Does not to seem to do the job, when i try to achieve some simple stats with Splunk, i get impossible results (such as a simple stats count(TABLENAME) by SCHEMA)
When i check in details with a "stats values(DATA_OBJECT_L_SIZE_KB) by SCHEMA,TABLENAME" for example, i see the values contains every values of the full event field and not the result of the association between keys (and so the value for this table only) as it should be
when i achieve a "table SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB" for example, then the data is correct but even a stats after the table command reports bad results.
So i think the issue in my regex, but this is driving me crazy and i can't get to know why...
If i remove the multi-line mode of the regex with such command:
index=db2compress sourcetype=db2compress
| rex max_match=1 "(?m-s)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)"
Then i off course only get the first result, so the multi line is required, i'm thinking in something with back line return or something or like that, but everything i tried has failed.
Thank you VERY VERY much for any help !
Ok, got it to work 🙂 !!!
Finally, using mvexpand works great but you need filter fields before the mvexpand to get rid of that memory issue which truncates results, mainly for people with the same need;
<search stuff> | fields <my interesting fields > | fields - _raw | mvexpand ...
In my case: (for the example purpose i let the rex in the example but i moved it to transforms with MV_ADD to enable multi-value mode)
index=db2compress sourcetype=db2compress
| rex max_match=1000 field=_raw "(?m)(?P<AAA>\w+\s+\w+\s+\w+\s+\d+\s+\d+\s+\d+\s+\d+\s+\w+\s+[\-]*\d+)"
| fields AAA | fields - _raw
| mvexpand AAA | makemv AAA
| eval SCHEMA=mvindex(AAA,0)
| eval TABLENAME=mvindex(AAA,1)
| eval DATA_OBJECT_L_SIZE_KB=mvindex(AAA,2)
| eval LONG_OBJECT_L_SIZE_KB=mvindex(AAA,3)
| eval LOB_OBJECT_L_SIZE_KB=mvindex(AAA,4)
| eval XML_OBJECT_L_SIZE_KB=mvindex(AAA,5)
| eval USER_DATA_L_SIZE_KB=mvindex(AAA,6)
| eval COMPRESSION=mvindex(AAA,7)
| eval TAUX_DE_COMPRESSION=mvindex(AAA,8)
| stats values(SCHEMA) As SCHEMA, values(DATA_OBJECT_L_SIZE_KB) As DATA_OBJECT_L_SIZE_KB, values(LONG_OBJECT_L_SIZE_KB) As LONG_OBJECT_L_SIZE_KB, values(LOB_OBJECT_L_SIZE_KB) As LOB_OBJECT_L_SIZE_KB, values(XML_OBJECT_L_SIZE_KB) As XML_OBJECT_L_SIZE_KB, values(USER_DATA_L_SIZE_KB) As USER_DATA_L_SIZE_KB, values(COMPRESSION) As COMPRESSION, values(TAUX_DE_COMPRESSION) As TAUX_DE_COMPRESSION by TABLENAME
| fields SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB,LONG_OBJECT_L_SIZE_KB,LOB_OBJECT_L_SIZE_KB,XML_OBJECT_L_SIZE_KB,USER_DATA_L_SIZE_KB,COMPRESSION,TAUX_DE_COMPRESSION
Ok, got it to work 🙂 !!!
Finally, using mvexpand works great but you need filter fields before the mvexpand to get rid of that memory issue which truncates results, mainly for people with the same need;
<search stuff> | fields <my interesting fields > | fields - _raw | mvexpand ...
In my case: (for the example purpose i let the rex in the example but i moved it to transforms with MV_ADD to enable multi-value mode)
index=db2compress sourcetype=db2compress
| rex max_match=1000 field=_raw "(?m)(?P<AAA>\w+\s+\w+\s+\w+\s+\d+\s+\d+\s+\d+\s+\d+\s+\w+\s+[\-]*\d+)"
| fields AAA | fields - _raw
| mvexpand AAA | makemv AAA
| eval SCHEMA=mvindex(AAA,0)
| eval TABLENAME=mvindex(AAA,1)
| eval DATA_OBJECT_L_SIZE_KB=mvindex(AAA,2)
| eval LONG_OBJECT_L_SIZE_KB=mvindex(AAA,3)
| eval LOB_OBJECT_L_SIZE_KB=mvindex(AAA,4)
| eval XML_OBJECT_L_SIZE_KB=mvindex(AAA,5)
| eval USER_DATA_L_SIZE_KB=mvindex(AAA,6)
| eval COMPRESSION=mvindex(AAA,7)
| eval TAUX_DE_COMPRESSION=mvindex(AAA,8)
| stats values(SCHEMA) As SCHEMA, values(DATA_OBJECT_L_SIZE_KB) As DATA_OBJECT_L_SIZE_KB, values(LONG_OBJECT_L_SIZE_KB) As LONG_OBJECT_L_SIZE_KB, values(LOB_OBJECT_L_SIZE_KB) As LOB_OBJECT_L_SIZE_KB, values(XML_OBJECT_L_SIZE_KB) As XML_OBJECT_L_SIZE_KB, values(USER_DATA_L_SIZE_KB) As USER_DATA_L_SIZE_KB, values(COMPRESSION) As COMPRESSION, values(TAUX_DE_COMPRESSION) As TAUX_DE_COMPRESSION by TABLENAME
| fields SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB,LONG_OBJECT_L_SIZE_KB,LOB_OBJECT_L_SIZE_KB,XML_OBJECT_L_SIZE_KB,USER_DATA_L_SIZE_KB,COMPRESSION,TAUX_DE_COMPRESSION
I am having s starting sucess by extracting full lines with a regex, then playing with mxexpand and others, such :
index=db2compress sourcetype=db2compress
| rex max_match=1000 field=_raw "(?m)(?P<AAA>\w+\s+\w+\s+\w+\s+\d+\s+\d+\s+\d+\s+\d+\s+\w+\s+[\-]*\d+)"
| mvexpand AAA | makemv AAA
| eval SCHEMA=mvindex(AAA,0)
| eval TABLENAME=mvindex(AAA,1)
| eval DATA_OBJECT_L_SIZE_KB=mvindex(AAA,2)
| eval LONG_OBJECT_L_SIZE_KB=mvindex(AAA,3)
| eval LOB_OBJECT_L_SIZE_KB=mvindex(AAA,4)
| eval XML_OBJECT_L_SIZE_KB=mvindex(AAA,5)
| eval USER_DATA_L_SIZE_KB=mvindex(AAA,6)
| eval COMPRESSION=mvindex(AAA,7)
| eval TAUX_DE_COMPRESSION=mvindex(AAA,8)
| stats values(SCHEMA) As SCHEMA, values(DATA_OBJECT_L_SIZE_KB) As DATA_OBJECT_L_SIZE_KB, values(LONG_OBJECT_L_SIZE_KB) As LONG_OBJECT_L_SIZE_KB, values(LOB_OBJECT_L_SIZE_KB) As LOB_OBJECT_L_SIZE_KB, values(XML_OBJECT_L_SIZE_KB) As XML_OBJECT_L_SIZE_KB, values(USER_DATA_L_SIZE_KB) As USER_DATA_L_SIZE_KB, values(COMPRESSION) As COMPRESSION, values(TAUX_DE_COMPRESSION) As TAUX_DE_COMPRESSION by TABLENAME
| fields SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB,LONG_OBJECT_L_SIZE_KB,LOB_OBJECT_L_SIZE_KB,XML_OBJECT_L_SIZE_KB,USER_DATA_L_SIZE_KB,COMPRESSION,TAUX_DE_COMPRESSION
That the way the result is correct, but i'm having issues with the mvexpand command and memory limitations.
Does any one knows an easiest method ?
I dont think m = multiline is enough you need a g as in global as well to for the regex to hit and extract multiple fields from this event if it is to span over multiple lines / events
Help Please 🙂 I really need to get this to work, and probably i shall not not be far but...
Perhaps i misunderstood your last comment.
Some fields can have multi values when they are common to others:
--> In the example, the schema has multiple tables, so the TABLENAME field has multiple values
Some fields cannot have multiple values, in the example above, the size of the table (for ex DATA_OBJECT_L_SIZE_KB) has only one value associated with its table.
But if you do a "stats by SCHEMA", multiple values shall be retrieved
With structured data such as CSV, Splunks does this natively, i need the same using regex to extract fields
Well not necessary, i just thought this would be the solution. (extracting fields by fields on line by line)
My goal is to be able to have the same behavior as it would have been structured data such as csv:
SCHEMA,TABLENAME,VALUE
...
And to be able to issue statistical analysis.
How would you extract the all line using regex ?
I think i could then use mvindex or other to split the line into multiple fields
sorry ... Your problem is that you dont want splunk to do extractions on the whole multiline event ? Your regex seems to hit and extract all the fields ? So for each event you get multiple values for one field ? And this is not what you want ?
If this is not what you want, perhaps you should add more captures groups to your regex and name the fields with field_1/field_2 etc etc ?
Thanks for answering
Can you give me more details, do you mean adding "/g" add the end of the regex ?
With:
| rex "(?im)^(?P
I don't get any hit