Getting Data In

Why can't I generate KV pairs from nested field?

Explorer

Here is a single record

Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2,99296=2]

Inside the [ ] are the KV pairs that I want to extract. All that I am able to retrieve have been FIELDn=string where string is the "number=number" KV pair that I want broken into key and value.

I have tried transforms.conf (REGEX = ([0-9]+)=([0-9]+) FORMAT = $1::$2) with a REPORT in props.conf to reference this.

Search string is

index=* OR index=_* sourcetype=Shield | rex field=_raw "(?ms)(?=[^N]*(?:Rule Hits Digest|N.*NetDefender Rule Hits Digest))^(?P[^\\[]+)[^\\]\\n]*\\]\\[(?P[^\\]]+)\\]\\[(?P\\d+)\\]\\[(?P\\d+)[^\\]\\n]*\\]\\[(?P[^\\]]+)" offset_field=_extracted_fields_bounds 

And it is the stats field that I need to break up into KV pairs. Variable number of KV pairs, Key values are from 1 to 1 million. I want to sum the counts for each key

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"

View solution in original post

0 Karma

Explorer

Thanks,
as a newbie I figured there had to be some way. The original regex was generated by Splunk. It parsed the kv pairs.

0 Karma