Splunk Search

How do I create a capture group that continues until it matches two exact characters in a row

rbechtold
Communicator

Hey everyone,

This question probably shows my lack of understanding with regex, but this is giving me a headache and it isn't the first time I've run into this problem. I have a raw log that is in this format (with the personal information removed):

...field_one="3-5713-064B-AB34", field_two="45", message="This is a test message: it has no real relevance, but I need it to explain the problem I am facing. The log has both numbers and letters, as well as [brackets], {curly brackets}, Periods..., And many other types of punctuation and characters, such as !@#$%^&*( and ). The other problem is there are also "quotation marks". There are lots of "quotation marks" and commas, in the text itself. Therefore breaking by either quotation marks or commas does not work. This is the end of the test message.", uniquely_named_field="1", field_three="20180105"...

The way I'm attempting to extract the data at the moment looks like this:

\.{3}\w+\=\"(?P[^"]+)\"\,\s\w+\=\"(?P\d+)\"\,\s\w+\=\"(?P[^"]+\"[^"]+\"[^"]+\"[^"]+\"[^"]+)\"\,\s\w+\=\"(?P\d+)\"\,\s\w+\=\"(?P\d+)\"\.{3}

I've tested this in regex101 and it appears to be extracting correctly. However, this will only extract data that needs to be broken exactly at the fifth instance of quotation marks.

I would like to write an extraction that either breaks when it reaches a comma followed by a quotation mark (\,\") or by the uniquely_named_field. However, I can't seem to find a way to do this, and my regex looks terrible.

If anyone could help me out, I would greatly appreciate it.

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Here's how you would capture "everything until before quote-comma": message="(.*?)",

Not sure what you mean by "or by the uniquely_named_field", do you have an example event where that would be necessary?

View solution in original post

niketn
Legend

@rbechtold, based on the details and sample data provided in the question seems like you need to use the extract command.

<YourBaseSearch>
| extract kvdelim="=" pairdelim=","

Following is the run anywhere example based on the sample data provided in the question:

| makeresults
| eval _raw="field_one=\"3-5713-064B-AB34\", field_two=\"45\", message=\"This is a test message: it has no real relevance, but I need it to explain the problem I am facing. The log has both numbers and letters, as well as [brackets], {curly brackets}, Periods..., And many other types of punctuation and characters, such as !@#$%^&*( and ). The other problem is there are also \"quotation marks\". There are lots of \"quotation marks\" and commas, in the text itself. Therefore breaking by either quotation marks or commas does not work. This is the end of the test message.\", uniquely_named_field=\"1\", field_three=\"20180105\""
| extract kvdelim="=" pairdelim=","
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

micahkemp
Champion

If you know that your message field is followed by the uniquely_named_field field, you could do:

message="(.*)", uniquely_named_field=
0 Karma

mayurr98
Super Champion

i did not understand your question what do you exactly want? what do you want to extract can you give what should be the field name and what should be the value for that field name from the sample event that you gave?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Here's how you would capture "everything until before quote-comma": message="(.*?)",

Not sure what you mean by "or by the uniquely_named_field", do you have an example event where that would be necessary?

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...