Getting Data In

Why can't I generate KV pairs from nested field?

plynch52
Explorer

Here is a single record

Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2,99296=2]

Inside the [ ] are the KV pairs that I want to extract. All that I am able to retrieve have been FIELDn=string where string is the "number=number" KV pair that I want broken into key and value.

I have tried transforms.conf (REGEX = ([0-9]+)=([0-9]+) FORMAT = $1::$2) with a REPORT in props.conf to reference this.

Search string is

index=* OR index=_* sourcetype=Shield | rex field=_raw "(?ms)(?=[^N]*(?:Rule Hits Digest|N.*NetDefender Rule Hits Digest))^(?P[^\\[]+)[^\\]\\n]*\\]\\[(?P[^\\]]+)\\]\\[(?P\\d+)\\]\\[(?P\\d+)[^\\]\\n]*\\]\\[(?P[^\\]]+)" offset_field=_extracted_fields_bounds 

And it is the stats field that I need to break up into KV pairs. Variable number of KV pairs, Key values are from 1 to 1 million. I want to sum the counts for each key

0 Karma
1 Solution

DalJeanis
Legend

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"

View solution in original post

0 Karma

DalJeanis
Legend

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"
0 Karma

plynch52
Explorer

Thanks,
as a newbie I figured there had to be some way. The original regex was generated by Splunk. It parsed the kv pairs.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...