Getting Data In

Why can't I generate KV pairs from nested field?

plynch52
Explorer

Here is a single record

Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2,99296=2]

Inside the [ ] are the KV pairs that I want to extract. All that I am able to retrieve have been FIELDn=string where string is the "number=number" KV pair that I want broken into key and value.

I have tried transforms.conf (REGEX = ([0-9]+)=([0-9]+) FORMAT = $1::$2) with a REPORT in props.conf to reference this.

Search string is

index=* OR index=_* sourcetype=Shield | rex field=_raw "(?ms)(?=[^N]*(?:Rule Hits Digest|N.*NetDefender Rule Hits Digest))^(?P[^\\[]+)[^\\]\\n]*\\]\\[(?P[^\\]]+)\\]\\[(?P\\d+)\\]\\[(?P\\d+)[^\\]\\n]*\\]\\[(?P[^\\]]+)" offset_field=_extracted_fields_bounds 

And it is the stats field that I need to break up into KV pairs. Variable number of KV pairs, Key values are from 1 to 1 million. I want to sum the counts for each key

0 Karma
1 Solution

DalJeanis
Legend

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"

View solution in original post

0 Karma

DalJeanis
Legend

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"
0 Karma

plynch52
Explorer

Thanks,
as a newbie I figured there had to be some way. The original regex was generated by Splunk. It parsed the kv pairs.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...