Hi All
I am trying to extract the values that trail context, userid, username, groupid
Sample partial event
{ "type": "login","context": "Rsomeserver:8877-T1670321752-P18407-T030-C000025-S38","sequence": 998,"message": { "state": "ok","agent": true,"userid": "User0000000949","loginid": "somelogin101","ownerid": "system","username": "John Smith","cssurl": "[\"/css/somepage.css\",\"/branding/\"]","groupid": "Group0000000945","windows": [ {"name":"something","id":"someid","url":"/someurl//
I started with this approach
"context": "(?<SessionID>[^\"]*)".*?"username"+: "(?<Username>[^\"]*)"
And this seems to compile on regex101 but on rex it's throwing an error
Error in 'SearchParser': Missing a search command before '^'. Error at position '141' of search query 'search index=<removed> ("\"login\"\,\"contex...{snipped} {errorcontext = ?<userid>[^\"]*)"}'.
My aim is to then use this data to join on the context value with another search, but I'm looking for help on where I'm going wrong with my Rex.
As the JSON seems to be truncated, I don't think I can treat it as JSON, so any help with a REX extraction would be greatly appreciated.
and as a further comment - join is rarely the right solution to a Splunk join search.
It has limitations and can silently give you the wrong results.
It's best to start looking at solving a join issue with stats, e.g. a typical starting point is
(search data_set_1) OR (search data_set_2)
| get_session_id_from_data_here
| stats values(*) as * by sessionId
and getting the session id will depend on the data set it comes from. This can involve typically
| rex data_set_1_field "(?<id_1>session id from here)"
| rex data_set_2_field "(?<id_2>session id from here)"
| eval sessionId=coalesce(id_1, id_2)
Your quotes inside your rex string need to be escaped
| rex "context\":\s\"(?<SessionID>[^\"]*)\".*?\"username\"+:\s\"(?<Username>[^\"]*)"
So my aim is to execute the below, which should tally up the number of events that a given "context" has executed, and subsequently logged. This context(id) is another name for a session.
index=myindex ("events") OR ("events2") | rex "context.{3}\"(?<context>.[a-zA-Z0-9_:-]+)" | stats count by context
I'd then like to tie join the above context on the below, so that I can get user details related to above results
index=myindex ("\"login\"\,\"context\"") AND ("username") | rex "context\":\s\"(?<context>[^\"]*)\".*?\"userid\"+:\s\"(?<userid>[^\"]*)\".*?\"username\"+:\s\"(?<username>[^\"]*)\".*?\"groupid\"+:\s\"(?<groupid>[^\"]*)" | table context userid groupid username
I'd then like to only show unique rows based on the userid
And finally, I'd then like to be able to show a count of the unique rows above
First, if you have any influence at all on the developers, persuade, plea with, beg them to make logs complete. Second, because you are confident that groupid is always included in the login event, I would recommend mending partial JSON to conformant objects, like thus
| rex mode=sed "s/(\"groupid\": *\"[^\"]+\"),.*/\1}}/"
```| eval valid = if(json_valid(_raw), "yes", "no")```
| spath
Your sample input now becomes
context | message.agent | message.cssurl | message.groupid | message.loginid | message.ownerid | message.state | message.userid | message.username | sequence | type |
Rsomeserver:8877-T1670321752-P18407-T030-C000025-S38 | true | ["/css/somepage.css","/branding/"] | Group0000000945 | somelogin101 | system | ok | User0000000949 | John Smith | 998 | login |
This would be much easier to handle.
To achieve your combined search, your want to retrieve all events in both searches, then perform stats on them, like thus
index=myindex (("events") OR ("events2")) OR ("\"login\"\,\"context\"") AND ("username")
| rex mode=sed "s/(\"groupid\": *\"[^\"]+\"),.*/\1}}/" ``` you can design another rex to make "events or events2" conformant ```
| spath
| rename message.* AS *
| rex "\"context\"\s*:\"(?<context>.[^\"]+)" | rex "\"type\"\s*:\"(?<type>.[^\"]+)\"" ``` unnecessary if "events or events2" are already mended ```
| stats dc(type) count by username userid groupid context
| where 'dc(type)' > 1