Splunk Search

How to get the type extracted in regex?

Motivator

Hi,

I have this data and I'm challenged (not hard to do) on how to get the type extracted. On the first line type=Thermostat is what is needed and writing regex for that is easy. The problem comes on the 2nd line where now type would be equal to "Zone" which is not what we are looking for. The second line type would actually be zoneType=smoke. Is there a way to do this in one regex?

{"id":"1410210064975923625.0","name":"Thermostat","type":"Thermostat","timeStamp":1513238363397,"troubleName":"bootloadFail"}   
     {"id":"117.0","name":"Zone 5","type":"Zone","timeStamp":1517253504269,"troubleName":"senLowBat","zoneType":"smoke"}    
Tags (3)
0 Karma
1 Solution

Super Champion

Hey try this search:

...|rex field=_raw "name\":\"(?<Name>\w+)"|rex field=_raw "zoneType\":\"(?<Name1>\w+)"|eval Name=coalesce(Name1,Name)

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

hey @dbcase

Try this run anywhere search

| makeresults 
| eval _raw="{\"id\":\"1410210064975923625.0\",\"name\":\"Thermostat\",\"type\":\"Thermostat\",\"timeStamp\":1513238363397,\"troubleName\":\"bootloadFail\"}" 
| append 
    [| makeresults 
    | eval _raw="{\"id\":\"117.0\",\"name\":\"Zone 5\",\"type\":\"Zone\",\"timeStamp\":1517253504269,\"troubleName\":\"senLowBat\",\"zoneType\":\"smoke\"}"] 
| rex field=_raw max_match=0 "(zoneType|type)\":\"(?<dtype>\w+)" 
| mvexpand dtype 
| stats count by dtype 
| where dtype!="Zone"

let me know if this helps!

0 Karma

Legend

It is tough to write a regular expression for JSON or XML files, as the field names are built-in. Here are a couple of choices that might be easier than writing regexes:

Option 1:

Look at the great answer provided by @somesoni2 here
https://answers.splunk.com/answers/489517/what-is-the-best-approach-for-a-search-time-extrac.html

Option 2:

Copy a sample of the data onto a test box. Bring the data into Splunk using the Add Data wizard so that you can use the Data Preview. In the preview, choose "json_no_timestamp" for the sourcetype. Then click the Save As button next to the sourcetype name - and create your own sourcetype. For this example, name it "detector." Tell Splunk where to save this sourcetype - this defines where you will find the resulting props.conf entries. For example, if you choose the app "Searching and Reporting," you will find your new sourcetype in SPLUNK_HOME/etc/apps/search/local/props.conf. It will be located in a stanza named "[detector]" Now, Splunk should be able to identify all the fields in your data properly.

As long as you are in the Data Preview, you can open some of the menus on the left and tell Splunk where to find the timestamp, etc. These settings will also be stored in the "[detector]" stanza of props.conf. When you are happy with the results, copy the stanza into production.

Option 3:
Like option 2, but use "json" instead of "json_no_timestamp" for the base sourcetype. In some ways, this is the easiest method BUT: it creates "index time fields." This means that the data will take up more space on disk and may be slower to retrieve (how much disk and how much slower depends on the volume of data). So I would avoid this option in favor of the other ones.

HTH

0 Karma

Motivator

Hi Iguinn,

In this case the appearance is a bit deceiving, whereas the string I posted looks like JSON, the real raw data looks like this
{"analyticType":"CustomAnalytic","buildTarget":"blah","product":"blah","clientSessionId":"DXSGCPC-CEAJTMA","Properties":{"index":0,"args":["{\"id\":\"1123.0\",\"name\":\"Garage Door 2\",\"type\":\"Zone\",\"timeStamp\":1517014287452,\"troubleName\":\"sensCom\",\"zoneType\":\"door\"}"],"category":"Event"}}

Ugly I know..... Sadly I can't get the developer to change the format. So Splunk shows it as JSON like but all the backslashes throws it off.

0 Karma

Motivator

Lets try this again

{"analyticType":"CustomAnalytic","buildTarget":"blah","product":"blah","clientSessionId":"DXSGCPC-CEAJTMA","Properties":{"index":0,"args":["{\"id\":\"1123.0\",\"name\":\"Garage Door 2\",\"type\":\"Zone\",\"timeStamp\":1517014287452,\"troubleName\":\"sensCom\",\"zoneType\":\"door\"}"],"category":"Event"}}
0 Karma

Legend

In @somesoni2 's answer, he gives a transform that will look for key-value pairs in the data:

 REGEX = \"([^\"]+)\":\"([^\"]+)\" 
 FORMAT = $1::$2

This is a heckuva regular expression, but Splunk interprets it as "for every data pair that is found, where the two values are surrounded by double-quotes and separated by a colon, take the first item as the field name and the second item as the field value. You could change his regular expression to add "and there could be an optional backslash in front of the double-quote" by changing the regular expression to:

 REGEX = \"([^\"]+)\\?\":\\?\"([^\"]+)\\?\" 

(I think!) Try it and see what happens. I like this better than the other regexes, because it can handle events that have different fields - it doesn't require that all events have the same fields or that the fields are named consistently; it just extracts what it finds in each event.

The good thing is that field extractions happen at search time, so you can change them as often as you like without having to re-index the data...

0 Karma

Motivator

Attempt #2, nada

(?!"type":"Zone")(type|zoneType)...(?<dtype>\w+)
0 Karma

Super Champion

Hey try this search:

...|rex field=_raw "name\":\"(?<Name>\w+)"|rex field=_raw "zoneType\":\"(?<Name1>\w+)"|eval Name=coalesce(Name1,Name)

View solution in original post

0 Karma

Motivator

Hi,

Interesting approach, didn't think of doing it that way. Unfortunately though I can't use the name, this particular one the name matches the type but other records the name could be Porch or Downstairs, or Jimmy's room.

0 Karma

Super Champion

not sure does it works or not!

0 Karma

Motivator

Hi

Sorry no, by using the name instead of the type, where I would expect Thermostat, sometimes I get Downstairs because the name is a user input description field where the type is the type of device. They can match but often they don't

Does that make sense?

This query works, its just inefficient

index=wholesale_app CustomAnalytic Properties.index=0|rename Properties.args{} as pr |rex field=pr "(type|zoneType)...(?<dtype>\w+)"|where dtype!="Zone"|stats count by dtype
0 Karma

Motivator

check that, the regex in that query doesn't work, grrrrrrr

0 Karma

Super Champion

ohh..my mistake you are expecting type and not name

...|rex field=_raw "type\":\"(?<dtype>\w+)"|rex field=_raw "zoneType\":\"(?<dtype1>\w+)"|eval dtype=coalesce(dtype1,dtype)

if I understood it correctly...

0 Karma

Motivator

PERFECT! Many thanks!!!!

0 Karma

Motivator

now it works

index=wholesale_app CustomAnalytic Properties.index=0|rename Properties.args{} as pr |rex max_match=10 field=pr "(type|zoneType)...(?<dtype>\w+)"|stats count by dtype|where dtype!="Zone"
0 Karma

Motivator

I have a regex that looks like this, the trouble is it still matches type=Zone. I could filter out type=Zone post regex I guess but that seems inefficient.

(type|zoneType)...(?<dtype>\w+)
0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!