Getting Data In

SEDCMD regex json in props.conf doesn't work but works in regex01 and sed command line

0xlc
Path Finder

Hello,

i got a json which looks like this:

https://pastebin.com/xHebS2x3

i need to get rid of the field 'sql_queries' and i am using SEDCMD-whatever = s/regex//g using this regex:

regex101 without escape:

s/,"sql_queries":"([^"\\]|\\.)*"//g

sed command line with escape:

s/,"sql_queries":"\([^"\\]\|\\.\)*"//g

it works in sed command line and regex101

anyone can help please?

Tags (1)
0 Karma

FrankVl
Ultra Champion

I don't see how this "works in regex101"? The match ends somewhere in the middle of the query:
https://regex101.com/r/EQXG0e/1

This might work: ,"sql_queries":".*?,"
https://regex101.com/r/EQXG0e/3
(as double quotes seem to be escaped inside the query), but not sure if that is perfectly reliable, I don't have access to pastebin here, so can only see the small sample put into regex101 by one of the other commenters.

And can you share in what sense it is not working in Splunk? Is it not stripping anything, or is it stripping too little / too much in certain cases?

0 Karma

Ranazar
Path Finder

I'm not terribly familiar with regex, but it looks like you might have a reliable pattern of two closing curlies - }} - followed eventually by a double quote. So perhaps:

,"sql_queries":".*?}}.*?"

https://regex101.com/r/IYPCl3/1

So, grab everything until you get to the first set of double closing curlies, and then grab everything after that until the first double quote.

0 Karma

0xlc
Path Finder

as i said it's not always like that. at the i asked the devs and the fields pattern is always the same so i just gram from sql_queries to @timestamp and that's it

0 Karma

somesoni2
Revered Legend

Trying something like this: https://regex101.com/r/z6dUPg/1

Also ensure that you're creating the props.conf in heavy forwarder/indexers (whichever comes first in your data flow).

0 Karma

0xlc
Path Finder

i got a indexers cluster and i am pushing props.conf from the master at every edit.

that works but the problem is not every sql_queries value has 'COMMIT' at the end or { token......, some are simplier..

i have another regex which match from sql_queries to timestamp (excluded) and it works but it's too generic

0 Karma

somesoni2
Revered Legend

You then have to find all (or most) of the patterns that you may encounter and then build your regex that handles all. If you can provide samples of those, Splunk community can help you better. Also, it would help if the order of fields (sql_queries and timestamp) can be made static (provided you can control the logging) then it would become easier for you.

0 Karma

0xlc
Path Finder

the regex i posted match every pattern but just doesn't work in splunk..

yes i am going to ask to to the dev if fields position is static

0 Karma

somesoni2
Revered Legend

I would say that doesn't work because of the those escaped backward slashes. (you have to include 3 of those to be counted as 1, as oppose to just 2 of those in other tools).

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...