Getting Data In

Why can't I generate KV pairs from nested field?

plynch52
Explorer

Here is a single record

Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2,99296=2]

Inside the [ ] are the KV pairs that I want to extract. All that I am able to retrieve have been FIELDn=string where string is the "number=number" KV pair that I want broken into key and value.

I have tried transforms.conf (REGEX = ([0-9]+)=([0-9]+) FORMAT = $1::$2) with a REPORT in props.conf to reference this.

Search string is

index=* OR index=_* sourcetype=Shield | rex field=_raw "(?ms)(?=[^N]*(?:Rule Hits Digest|N.*NetDefender Rule Hits Digest))^(?P[^\\[]+)[^\\]\\n]*\\]\\[(?P[^\\]]+)\\]\\[(?P\\d+)\\]\\[(?P\\d+)[^\\]\\n]*\\]\\[(?P[^\\]]+)" offset_field=_extracted_fields_bounds 

And it is the stats field that I need to break up into KV pairs. Variable number of KV pairs, Key values are from 1 to 1 million. I want to sum the counts for each key

0 Karma
1 Solution

DalJeanis
Legend

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"

View solution in original post

0 Karma

DalJeanis
Legend

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"
0 Karma

plynch52
Explorer

Thanks,
as a newbie I figured there had to be some way. The original regex was generated by Splunk. It parsed the kv pairs.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...