Hi,
I have this data and I'm challenged (not hard to do) on how to get the type extracted. On the first line type=Thermostat is what is needed and writing regex for that is easy. The problem comes on the 2nd line where now type would be equal to "Zone" which is not what we are looking for. The second line type would actually be zoneType=smoke. Is there a way to do this in one regex?
{"id":"1410210064975923625.0","name":"Thermostat","type":"Thermostat","timeStamp":1513238363397,"troubleName":"bootloadFail"}
{"id":"117.0","name":"Zone 5","type":"Zone","timeStamp":1517253504269,"troubleName":"senLowBat","zoneType":"smoke"}
Hey try this search:
...|rex field=_raw "name\":\"(?<Name>\w+)"|rex field=_raw "zoneType\":\"(?<Name1>\w+)"|eval Name=coalesce(Name1,Name)
hey @dbcase
Try this run anywhere search
| makeresults
| eval _raw="{\"id\":\"1410210064975923625.0\",\"name\":\"Thermostat\",\"type\":\"Thermostat\",\"timeStamp\":1513238363397,\"troubleName\":\"bootloadFail\"}"
| append
[| makeresults
| eval _raw="{\"id\":\"117.0\",\"name\":\"Zone 5\",\"type\":\"Zone\",\"timeStamp\":1517253504269,\"troubleName\":\"senLowBat\",\"zoneType\":\"smoke\"}"]
| rex field=_raw max_match=0 "(zoneType|type)\":\"(?<dtype>\w+)"
| mvexpand dtype
| stats count by dtype
| where dtype!="Zone"
let me know if this helps!
It is tough to write a regular expression for JSON or XML files, as the field names are built-in. Here are a couple of choices that might be easier than writing regexes:
Option 1:
Look at the great answer provided by @somesoni2 here
https://answers.splunk.com/answers/489517/what-is-the-best-approach-for-a-search-time-extrac.html
Option 2:
Copy a sample of the data onto a test box. Bring the data into Splunk using the Add Data wizard so that you can use the Data Preview. In the preview, choose "json_no_timestamp" for the sourcetype. Then click the Save As button next to the sourcetype name - and create your own sourcetype. For this example, name it "detector." Tell Splunk where to save this sourcetype - this defines where you will find the resulting props.conf entries. For example, if you choose the app "Searching and Reporting," you will find your new sourcetype in SPLUNK_HOME/etc/apps/search/local/props.conf. It will be located in a stanza named "[detector]" Now, Splunk should be able to identify all the fields in your data properly.
As long as you are in the Data Preview, you can open some of the menus on the left and tell Splunk where to find the timestamp, etc. These settings will also be stored in the "[detector]" stanza of props.conf. When you are happy with the results, copy the stanza into production.
Option 3:
Like option 2, but use "json" instead of "json_no_timestamp" for the base sourcetype. In some ways, this is the easiest method BUT: it creates "index time fields." This means that the data will take up more space on disk and may be slower to retrieve (how much disk and how much slower depends on the volume of data). So I would avoid this option in favor of the other ones.
HTH
Hi Iguinn,
In this case the appearance is a bit deceiving, whereas the string I posted looks like JSON, the real raw data looks like this
{"analyticType":"CustomAnalytic","buildTarget":"blah","product":"blah","clientSessionId":"DXSGCPC-CEAJTMA","Properties":{"index":0,"args":["{\"id\":\"1123.0\",\"name\":\"Garage Door 2\",\"type\":\"Zone\",\"timeStamp\":1517014287452,\"troubleName\":\"sensCom\",\"zoneType\":\"door\"}"],"category":"Event"}}
Ugly I know..... Sadly I can't get the developer to change the format. So Splunk shows it as JSON like but all the backslashes throws it off.
Lets try this again
{"analyticType":"CustomAnalytic","buildTarget":"blah","product":"blah","clientSessionId":"DXSGCPC-CEAJTMA","Properties":{"index":0,"args":["{\"id\":\"1123.0\",\"name\":\"Garage Door 2\",\"type\":\"Zone\",\"timeStamp\":1517014287452,\"troubleName\":\"sensCom\",\"zoneType\":\"door\"}"],"category":"Event"}}
In @somesoni2 's answer, he gives a transform that will look for key-value pairs in the data:
REGEX = \"([^\"]+)\":\"([^\"]+)\"
FORMAT = $1::$2
This is a heckuva regular expression, but Splunk interprets it as "for every data pair that is found, where the two values are surrounded by double-quotes and separated by a colon, take the first item as the field name and the second item as the field value. You could change his regular expression to add "and there could be an optional backslash in front of the double-quote" by changing the regular expression to:
REGEX = \"([^\"]+)\\?\":\\?\"([^\"]+)\\?\"
(I think!) Try it and see what happens. I like this better than the other regexes, because it can handle events that have different fields - it doesn't require that all events have the same fields or that the fields are named consistently; it just extracts what it finds in each event.
The good thing is that field extractions happen at search time, so you can change them as often as you like without having to re-index the data...
Attempt #2, nada
(?!"type":"Zone")(type|zoneType)...(?<dtype>\w+)
Hey try this search:
...|rex field=_raw "name\":\"(?<Name>\w+)"|rex field=_raw "zoneType\":\"(?<Name1>\w+)"|eval Name=coalesce(Name1,Name)
Hi,
Interesting approach, didn't think of doing it that way. Unfortunately though I can't use the name, this particular one the name matches the type but other records the name could be Porch or Downstairs, or Jimmy's room.
not sure does it works or not!
Hi
Sorry no, by using the name instead of the type, where I would expect Thermostat, sometimes I get Downstairs because the name is a user input description field where the type is the type of device. They can match but often they don't
Does that make sense?
This query works, its just inefficient
index=wholesale_app CustomAnalytic Properties.index=0|rename Properties.args{} as pr |rex field=pr "(type|zoneType)...(?<dtype>\w+)"|where dtype!="Zone"|stats count by dtype
check that, the regex in that query doesn't work, grrrrrrr
ohh..my mistake you are expecting type and not name
...|rex field=_raw "type\":\"(?<dtype>\w+)"|rex field=_raw "zoneType\":\"(?<dtype1>\w+)"|eval dtype=coalesce(dtype1,dtype)
if I understood it correctly...
PERFECT! Many thanks!!!!
now it works
index=wholesale_app CustomAnalytic Properties.index=0|rename Properties.args{} as pr |rex max_match=10 field=pr "(type|zoneType)...(?<dtype>\w+)"|stats count by dtype|where dtype!="Zone"
I have a regex that looks like this, the trouble is it still matches type=Zone. I could filter out type=Zone post regex I guess but that seems inefficient.
(type|zoneType)...(?<dtype>\w+)