Getting Data In

Creating dashboards based on field-names rather than field-values in nested-JSON

Explorer

Hi Splunkers,

I have events coming to Splunk Enterprise in the following JSON format:

{
    "ip" : "1.1.1.1"
    "mac" : "010203040506"
    "policies" : {
        "policy_name_1" : {
            "rule_name_in_policy1" : {
                "status" : "Unmatched"
                "timestamp" : "15012456757"
            }
        },
        "policy_name_2" : {
            "rule_name_in_policy2" : {
                "status" : "Matched"
                "timestamp" : "15012446751"
            }
        },
        "policy_name_3" : {
            "rule_name_in_policy3" : {
                "status" : "Matched"
                "timestamp" : "15012456487"
            }
        }
    }
    "username" : "abstract"
}

I want to create a 'matched' dashboard which shows a pie chart conveying "rule_name_in_policy1 is matched by 25 hosts, rule_name_in_policy2 is matched by 3 hosts,... and so on). To achieve this, I can roughly think of a search string that would store the rule_names in a variable_a and possibly do a "timechart count by variable_a". But I don't know how to do this. I also can't figure out how to filter out all instances of (policies.policy_name_x.rule_name_in_policyx.status=Matched).

I'm new to SPL. Can someone please help me with writing the correct search string?

EDIT: Updated the event data format to include double quotes. These double quotes are not allowing me to use the rex command suggested by DalJeanis in his very well-written and clear answer.

0 Karma

Explorer

After some thought on the answer by Dal above where he laid out PCRE to be used by Splunk rex command, I think the event format stated by me in the question, is going to make it too complicated for the user to write such regexes if he wants to add another custom dashboard. Furthermore, there is the issue of the rex command not able to work with double-quotes in event data (although it works on regex 101).

I'm thinking of alternate formats that would simplify search queries for the user. Here is another that I can think of:

{
    "ip":"1.1.1.1",
    "mac":"010203040506",
    "policies":[
        {
            "policy_name":"policy_name1",
            "rule_name":"rule_name2",
            "status":"Matched",
            "timestamp":"15012456487"
        },
        {
            "policy_name":"policy_name2",
            "rule_name":"rule_name2",
            "status":"unmatched",
            "timestamp":"15012456487"
        },
        {
            "policy_name":"policy_name3",
            "rule_name":"rule_name3",
            "status":"Matched",
            "timestamp":"15012456487"
        }
    ],
    "username":"abstract"
}

With the above format, I can write a query as follows and extract all the event data:

index=fsctcenter 
| spath output=rname path=policies{}.rname 
| spath output=status path=policies{}.status 
| eval x=mvzip(rname,status)| mvexpand x 
| eval x = split(x,",") 
| eval rname=mvindex(x,0) 
| eval status=mvindex(x,1) 
| table ip, rname, status

This search string gives me a result like follows:
alt text

Now, a small question remains. Of the three events that can be seen in the image, how do I filter just the one with status=Matched for creating a timechart?

Please excuse my naivete with SPL. I'm still all ears for better event format and better search queries for generating the dashboard I described in the initial question.

0 Karma

SplunkTrust
SplunkTrust

This version extracts the Rule Names and the ips, then gives you a count of the matched ones.

| makeresults 
| eval _raw="{
     ip : 1.1.1.1
     mac : 010203040506
     policies : {       
         policy_name_1 : {
             rule_name_in_policy1 : {
                 status : Unmatched
                 timestamp : 15012456757
             }
         },
         policy_name_1A : {
             rule_name_in_policy1A : {
                 status : Matched
                 timestamp : 15012456757
             }
         },
         policy_name_2 : {
             rule_name_in_policy2 : {
                 status : Matched
                 timestamp : 15012446751
             }
         },
         policy_name_3 : {
             rule_name_in_policy3 : {
                 status : Matched
                 timestamp : 15012456487
             }
         }
     }
     username : abstract
 }"
| appendpipe [| rex mode=sed "s/(\d+\.\d+\.\d+)\.1/\1.2/g"]
| appendpipe [| rex mode=sed "s/(\d+\.\d+)(\.\d+\.\d+)/2.2\2/g s/policy2/policy4/g"]
| rename COMMENT as "The above just enters test data."

| rename COMMENT as "Extract ip and RuleNames (only for Matched) and then stats them up."
| eval myfield=_raw
| rex field=myfield "ip\s+:\s+(?<ip>\d+\.\d+\.\d\.\d+)"
| rex field=myfield "(?<PolicyName>\w+)\s+:\s*{\s*(?<RuleName>\w+)\s+:\s*{\s*status\s+:\s*(?<RuleStatus>Matched)" max_match=0
| stats dc(ip) as hostcount by RuleName

It would not be difficult to extract the timestamps. If you were going to do that, then you probably would also want to extract the unmatched records so that if a host flipped back and forth it would be detected.

I can't tell what the timestamps are - either they are epoch times in July, counting to the 1/10 second, or they are some other format.

0 Karma

Explorer

Thanks a lot Dal for your response. This works perfectly. However, I wanted to know if it would be possible to avoid executing the 'rex' command here on _raw data? The reason is that using rex on _raw data would make my app very tightly coupled with the data format and any future changes in data formats will force me to update all dashboards and saved searched as well. If possible, I would like to avoid such a scenario.

In my case, the event data format is also very flexible and I can change my code to change the event format if that allows writing a simpler search query.

I also forgot to mention that the above KV pairs are all enclosed in double quotes, and although I was able to tweak the regex provided by you to work on double quotes data (checked it via regex 101), I cannot put it in the rex command. It gives me errors (since the rex command cannot accept double quotes?). Here is the tweaked regex I got to work on regex 101:

(?<PolicyName>"[a-zA-Z0-9_ \.]*")\s*:\s*{\s*"rule"\s*:\s*{\s*(?<RuleName>"[a-zA-Z0-9_ \-]*")\s*:\s*{\s*"since"\s*:\s*"[0-9]*","status"\s*:\s*(?<RuleStatus>"Matched")
0 Karma