Getting Data In

Use CLONE_SOURCETYPE only for matching events

patpro
Path Finder

Hello, I'm currently trying to convert some mixed-text events into JSON. The log file is made of some pure text log lines and some other lines that start with plain text and end with some JSON.

I have created a transforms.conf rule to extract the JSON and to clone the event into _json sourcetype:

[json_extract_rspamd]
SOURCE_KEY = _raw
DEST_KEY = _raw
LOOKAHEAD = 10000
#REGEX = ^([^{]+)({.+})$
REGEX = ^(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d) (#\d+)\(([^)]+)\) ([^;]+); lua[^{]+{(.+})$
FORMAT = {"date":"$1","ida":"$2","process":"$3","idb":"$4",$5
CLONE_SOURCETYPE = _json

 

This is working but unfortunately it will also clone every events from that log file. is there a way to trigger the CLONE_SOURCETYPE only when the REGEX is matched?

Labels (3)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

when you are using CLONE_SOURCETYPE it always clone all events. If you don't need all events, then you must filter those away and format only needed events as json.

You should never use _json as your real sourcetype. It's there just for reference. You should always define your own sourcetype names event those are e.g. jsow format!

r. Ismo

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust
* If CLONE_SOURCETYPE is used as part of a transform, the transform creates a
  modified duplicate event for all events that the transform is applied to via
  normal props.conf rules.

I point your attention to "for all events that the transform is applied to via normal props.conf rules".

So either match your events by host/source matching if possible or clone all events and then filter out those you don't need (not the prettiest idea, I know).

And yes, @isoutamo 's remark about _json is a good point.

0 Karma

patpro
Path Finder

Well, I think the part I’ve misunderstood is

via normal props.conf rules

 I thought

all events that the transform is applied to

was about the real transform (ie the REGEX + FORMAT ) so I didn’t understand why non-matching events (as per REGEX) would get cloned too.

Now I get it, I’ll try and find a way to work around that later.

About the cloning of _json into a new sourcetype and using that clone into my config, what would be the gain (apart for best practice’s sake) ?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

_json is the generic format that's applied to events if Splunk notices them as being JSON so it's not the best idea.

It's best to be as specific as possible with your definition. (ideally sourcetype should have a set of parameters responsible for proper breaking and time recognition defined - so called Great Eight).

So while _json is fairly generic you should have your specific settings for - for example - time extraction.

Another thing is that even though the format of the event might just be JSON, specific sourcetypes can have different additional aliases or calculated fields defined (for example for CIM compliance). So you want to have your events "pinned" to the specific sourcetype instead of the generic _json.

0 Karma

patpro
Path Finder

Okay, thank you. I am quite good with the default JSON but I’ll take a look and maybe have some tuning.

Thanks again

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

when you are using CLONE_SOURCETYPE it always clone all events. If you don't need all events, then you must filter those away and format only needed events as json.

You should never use _json as your real sourcetype. It's there just for reference. You should always define your own sourcetype names event those are e.g. jsow format!

r. Ismo

0 Karma
Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...