Splunk Search

How to extract curly brackets in regular expression

ssaenger
Communicator

Hi,

I am having a problem extracting fields that have curly brackets {}
I have the log file line;
2015.06.24 11:55:13.567;:;12.34.567.241;:;somehost;:;21947UIGHFKD99HKW8R;:;deviceId;:;F90HSDUC0A49A2C;:;1001;:;Ref=0003313C;:;303;:;320;:;28;:;0xA5;:;co.cab.infra.exception.ApplicationException: tlv ids of DeviceId=DeviceId{DeviceId='27896RAWG96B'} is null or empty; applicationInstanceId=APPLICATION_INSTANCE_3;:;siteId;:;siteUid;:;

and i am using the reg ex to extract the fields;

(?<date_time>\d{4}\.\d{2}\.\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3});:;(?<ip>\d+\.\d+\.\d+\.\d+);:;(?<host>[a-z0-9]+);:;(?<requester_id>[0-9A-F]+);:;(?<req_id>.+);:;(?<drm_domain_id>.+);:;(?<status>.+);:;(?<data>.+);:;(?<request_type>.+);:;(?<device_id>.+);:;(?<latency>.+);:;(?<col12>.+);:;(?<col13>.+);:;(?<site_id>.+);:;(?<site_uid>.+);:;

(Please note that i know the >< are incorrect but its the only way i could get it display on the forum! I am that new to this :))

however Splunk is not extracting col13, for this line. Other lines are extracted fine and i believe its due to the Curly Bracket. i have tried to extract the line by delimiting it /{ and entering all the other characters, however this has not worked.
Obviously the best way would have been to transform it however Splunk only supports one delimiting value and mine is ;:;

Thanks in advance.

Tags (2)
0 Karma
1 Solution

woodcock
Esteemed Legend

Try this:

(?<date_time>\d{4}\.\d{2}\.\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3});:;(?<ip>\d+\.\d+\.\d+\.\d+);:;(?<host>[^;]+);:;(?<requester_id>[^;]+);:;(?<req_id>[^;]+);:;(?<drm_domain_id>[^;]+);:;(?<status>[^;]+);:;(?<data>[^;]+);:;(?<request_type>[^;]+);:;(?<device_id>[^;]+);:;(?<latency>[^;]+);:;(?<col12>[^;]+);:;(?<col13>.+?);:;(?<site_id>[^;]+);:;(?<site_uid>[^;]+);:;

The problem is both greediness and some of your character classes (I use [^;] everywhere instead).

View solution in original post

0 Karma

woodcock
Esteemed Legend

Try this:

(?<date_time>\d{4}\.\d{2}\.\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3});:;(?<ip>\d+\.\d+\.\d+\.\d+);:;(?<host>[^;]+);:;(?<requester_id>[^;]+);:;(?<req_id>[^;]+);:;(?<drm_domain_id>[^;]+);:;(?<status>[^;]+);:;(?<data>[^;]+);:;(?<request_type>[^;]+);:;(?<device_id>[^;]+);:;(?<latency>[^;]+);:;(?<col12>[^;]+);:;(?<col13>.+?);:;(?<site_id>[^;]+);:;(?<site_uid>[^;]+);:;

The problem is both greediness and some of your character classes (I use [^;] everywhere instead).

0 Karma

ssaenger
Communicator

Thanks Woodcock, that worked. 🙂

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You could always use a look-behind.. If your trying to extract the Device ID you can do

(?<Extracted_Field>\{DeviceId\=\'[A-Z0-9]{12}\'\}(?=\s is\null\sor))
0 Karma

richgalloway
SplunkTrust
SplunkTrust

According to www.regex101.com, your regex string is failing in the requester_id field. This regex works.

(?<date_time>\d{4}\.\d{2}\.\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3});:;(?<ip>\d+\.\d+\.\d+\.\d+);:;(?<host>[a-z0-9]+);:;(?<requester_id>[0-9A-Z]+);:;(?<req_id>.+);:;(?<drm_domain_id>.+);:;(?<status>.+);:;(?<data>.+);:;(?<request_type>.+);:;(?<device_id>.+);:;(?<latency>.+);:;(?<col12>.+);:;(?<col13>.+);:;(?<site_id>.+);:;(?<site_uid>.+);:;
---
If this reply helps you, Karma would be appreciated.
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...