Getting Data In
Highlighted

Extract fields in JSON during index time

Contributor

Hi ,
I'm a newbie to splunk in field extractions. Appreciate any help on this.
I have JSON Format logs like below:

alt text

I want source and tag as a field i.e it should not appear in events instead as separate fields like the way default fields appear on the left hand side in UI. Also I want the word "line:" to be removed. so basically only my line event should appear in splunk. How can I achieve this?
I believe props.conf and transforms should be a solution. But I dont know how to approach that. My transforms should contain a regex to capture what? I'm not understandin what my regex should do?

Highlighted

Re: Extract fields in JSON during index time

Contributor

RAW DATA:

{"line":"[ERROR ] CWWKS9660E: The orb element with the defaultOrb id attribute requires a user registry but no user registry became available within 10 seconds. As a result, no application will start. Ensure that you have configured an appropriate user registry in the server.xml file.","source":"stderr","tag":"itec-artifactory.fmr.com:6555/com.fmr.pl000123.demo.actionate:0.0.1-14/ActionateDEVACTIONATE.1.385y3873nb5k4m7xsmwxokgum/92e6e10df174"}
{"line":"[AUDIT ] CWWKS4104A: LTPA keys created in 1.184 seconds. LTPA key file: /opt/ibm/wlp/output/defaultServer/resources/security/ltpa.keys","source":"stdout","tag":"itec-artifactory.fmr.com:6555/com.fmr.pl000123.demo.actionate:0.0.1-14/ActionateDEVACTIONATE.1.385y3873nb5k4m7xsmwxokgum/92e6e10df174"}
{"line":"[AUDIT ] CWWKZ0058I: Monitoring dropins for applications. ","source":"stdout","tag":"itec-artifactory.fmr.com:6555/com.fmr.pl000123.demo.actionate:0.0.1-14/ActionateDEVACTIONATE.1.385y3873nb5k4m7xsmwxokgum/92e6e10df174"}
{"line":"[ERROR ] CWWKG0074E: Unable to update the configuration for jndiReferenceEntry with the unique identifier customDataSourceFactoryEntry because of the exception: The value jdbc/actionateDB for attribute jndiName is not unique.","source":"stderr","tag":"itec-artifactory.fmr.com:6555/com.fmr.pl000123.demo.actionate:0.0.1-14/ActionateDEVACTIONATE.1.385y3873nb5k4m7xsmwxokgum/92e6e10df174"}

0 Karma
Highlighted

Re: Extract fields in JSON during index time

Contributor

Someone kindly help me write the regex for source n tag .. I'm finding it very difficult to frame since it's new to me..

0 Karma
Highlighted

Re: Extract fields in JSON during index time

Splunk Employee
Splunk Employee

I don't believe you'll need any regex according to what i'm seeing. Or perhaps i don't understand exactly what you want to display. I am using the lastest Splunk 6.5 and this is what I get when i ingest your events and assign _json as the sourcetype. That is simply the raw event viewer.

alt text

The fields are being created properly. Look at the left side in the above screenshot to see those extracted fields. Then, below then you can simply use the table command to display the data how you'd like to see it.

alt text

View solution in original post

Highlighted

Re: Extract fields in JSON during index time

Contributor

Hi @sdaniels ,

I did that. But I don't want source and tag to be displayed in events. They should be as only fields on the left side.
Is that possible?

0 Karma
Highlighted

Re: Extract fields in JSON during index time

Splunk Employee
Splunk Employee

Responded below. thanks

0 Karma
Highlighted

Re: Extract fields in JSON during index time

Splunk Employee
Splunk Employee

Wasn't posting in the comments section so responding to your comment here.

Sure. Why do you want it out of the raw event if it doesn't affect your searching and viewing of the data the way that you want it? In props.conf you can use the command: SEDCMD

http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Anonymizedatausingconfigurationfiles#Anonymiz...

The link above shows how to anonymize data using a SED script. Pattern match and replace it etc... In your case, you replace it with nothing. If you do this, you may then have to create regex to pull out the source and tag fields manually though, not sure. Right now the _json format is taking care of that for you. Try it out. Use Regexr.com and you can play with RegEx matching if you need to change anything.

Something like this in prop.conf to remove source and then similar for tag-
SEDCMD - dumpsrc = \,\"source\"://g

Highlighted

Re: Extract fields in JSON during index time

Contributor

Hi @sdaniels ,

Could'nt attach an image in comments section so responding
here in the answers section.
Thankyou for the response. But I believe that doesn't solve our customer's requirement completely.
Like you said , I can use SEDCMD to remove the word "line:" .
But I want only the below highlighted ones to appear in _raw events. Is that possible? How do we achieve that?

alt text

When I perform the search, the values of source and tag should not appear in _raw events but they should appear as only extracted fields.

I tried the below props and transforms but it doesn't seem to work. Could you please help?

PROPS

[httpevent]
CHARSET=UTF-8
INDEXEDEXTRACTIONS=json
KV
MODE=none
SHOULDLINEMERGE=true
category=Structured
disabled=false
pulldown
type=true
TRANSFORMS-fields = field1,filed2

TRANSFORMS

[field1]
REGEX = (?:[^"\n]*"){7}(?P[^"]+)
FORMAT = source::$1

[field2]
REGEX = (?:[^,\n]*,){2}"\w+":"(?P[^"]+)
FORMAT = tag::$1

0 Karma
Highlighted

Re: Extract fields in JSON during index time

Splunk Employee
Splunk Employee

I'm not sure what you are trying to accomplish here. If you only want the highlighted yellow part to appear in the raw message that means you'd need to modify raw and delete the rest using the SEDCMD. The fields that appear on the lower left of the Search page create fields that are extracted from _raw. If you remove data from _raw, it's not available to create fields, therefore you wouldn't have fields for source and tag. Is there a security concern here? Is it about abstracting away complexity to the user? Why does your customer want it done this particular way?

0 Karma
Highlighted

Re: Extract fields in JSON during index time

Contributor

Hi @sdaniels,

Basically, earlier we had indexed the Dynatrace collector logs for monitoring and these logs appeared in normal format in splunk.
Now these(above images) are the Dynatrace collectors running in the Docker containers. So after these collectors are dockerized, these collector logs appear in json format. We are trying to see if we can make this json appear like the old regular non-json collector logs. Is that possible?

0 Karma