Splunk Search

Transaction, Grouping, Summary Index, Collect, Stats: Is there a best way to group my data and provide grouped-data authorizations?

rewritex
Contributor
<141>Jun 99 15:03:13 f5-vpn-99 zzm1[3645]: 01490506:5: fi87dde3: Received User-Agent header: Mozilla%2f5
<141>Jun 99 15:03:13 f5-vpn-99 zzm1[3645]: 01490544:5: fi87dde3: Received client info - Type: IE Version
<141>Jun 99 15:03:13 f5-vpn-99 zzm1[3645]: 01490500:5: fi87dde3: **New session** from client IP 255.255.255.0
<141>Jun 99 15:03:17 f5-vpn-99 dpi[3594]: 01490116:5: fi87dde3: User chose option: 1
<141>Jun 99 15:03:22 f5-vpn-99 dpi[3594]: 01490116:5: fi87dde3: User chose option: 1
<141>Jun 99 15:04:08 f5-vpn-99 dpi[3594]: 01490010:5: fi87dde3: Username 'skatergator'
<141>Jun 99 15:04:10 f5-vpn-99 dpi[3594]: 01490115:5: fi87dde3: Following rule 'fallback' from item 'iRule 
<141>Jun 99 15:04:32 f5-vpn-99 dpi[3594]: 01490010:5: fi87dde3: Username 'skatergator2'
<141>Jun 99 15:04:33 f5-vpn-99 dpi[3594]: 01490115:5: fi87dde3: Following rule 'G-S-DOC1-VPN-CONTRACT-USERS' 
<141>Jun 99 15:04:33 f5-vpn-99 dpi[3594]: 01490008:5: fi87dde3: Connectivity resource '/SALES/NA_SALES_VPN' 
<141>Jun 99 15:04:33 f5-vpn-99 dpi[3594]: 01490128:5: fi87dde3: Webtop '/SALES/WT_NA_SALES_VPN' assigned
<141>Jun 99 15:04:33 f5-vpn-99 dpi[3594]: 01490115:5: fi87dde3: Following rule 'fallback' from item 'Ro
<141>Jun 99 15:04:33 f5-vpn-99 dpi[3594]: 01490005:5: fi87dde3: Following rule 'Out' from item 'Full VP
<141>Jun 99 15:04:33 f5-vpn-99 dpi[3594]: 01490102:5: fi87dde3: Access policy result: Network_Access
<141>Jun 99 15:04:42 f5-vpn-99 zzm1[3645]: 01490549:5: fi87dde3: Assigned PPP Dynamic IPv4: 255.255.254.0
<141>Jun 99 15:04:42 f5-vpn-99 zzm1[3645]: 01490505:5: fi87dde3: PPP tunnel 0x57008bb13800 started.
<141>Jun 99 15:05:13 f5-vpn-99 zzm1[3645]: 01490501:5: fi87dde3: **Session deleted** due to user logout request.
<141>Jun 99 15:05:14 f5-vpn-99 zzm1[3645]: 01490505:5: fi87dde3: PPP tunnel 0x57008bb13800 closed.
<141>Jun 99 15:05:53 f5-vpn-99 zzm1[3645]: 01490521:5: fi87dde3: Session statistics - bytes in: 146965, bytes out: 283837

Background
These events are intertwined with other log data. To isolate this vpn data, I use the process= field with a NOT to only search in the VPN data. My search string is:

index=index1 process=* NOT (logiz OR cron OR crond OR syslog-ng OR tmtm OR snm OR ssd OR httpd OR mpd) .. 

which is at the start of all my search panels.
**Question:* Should this go into my props.conf?

The Data
The above data shows the transaction command by session_id which allows me to call on fields: username, session_id, department, user-agent, statistics. A problem is without the transaction, whenever I search for a field, I get get the single line event. By using the transaction command, transaction session_id it creates a session_id mega event... Question: "Is there a better and faster way to group my events? Oh, and as a side note, using the transaction command builds the 'duration' field which is a required field.

I use the transaction command because it groups this all together for me.
The resource '/SALES/' event info is important because that is how I isolate data for the sales department.
The username 'skatergator' event line is important because that is how I get a username
** Each event line has the session_id

The use case:
Sales Department
I want to search the last 6 months by username or session_id. I am interested in all the session information: duration, stats, user-agent info, time and date.

Shipping Department
I want to search the last 6 months by username or session_id. I am interested is all the session information: duration, stats, user-agent info, time and date.

Question: Is there an overall better way or (a way if it comes down to it) to accomplish this task I am working on. The task of isolated session date by department, presenting isolated data in a locked down method and most importantly a fast search. I am actively working on this, and i think its pretty fun but I do need advice and help. Thank You!

Fields

session_id - ^(?:[^:]*\:){5}\s(?<vpn_session_id>([^\:]+))
username - (?=[^U]*(?:Username '|U.*Username '))^[^'\n]*'(?P<username>[^']+)
agency - (?=[^R]*(?:Resource: /|R.*Resource: /))^[^/\n]*/(?P<agency>\w+)
process - already built ( used with a NOT to remove unnecessary logs)

Approach:
1) transaction -> collect -> index summary

index=index1 process=* NOT (logiz OR cron OR crond OR syslog-ng OR tmtm OR snm OR ssd OR httpd OR mpd) 
| transaction session_id startswith="New session" endswith="Session deleted" 
| search department=SALES
| collect index=logs-sales addtime=true

Pros:
creates a seperate department index (index=logs-sales)
allows for roles,permission,authorization to work using restrict access

Cons:
duration and _time date is screwed up, not allowing for accurate durations
searching for anything over 24hrs fails (could be because the search takes over 3 minutes)

2) transaction -> permission based dashboard

index=index1 process=* NOT (logiz OR cron OR crond OR syslog-ng OR tmtm OR snm OR ssd OR httpd OR mpd)
| transaction vpn_session_id 
| search department=sales username=$username$ session_id=$session_id$ 
| eval duration=tostring(duration, "duration") 
| eval user_agent_string_decoded=urldecode(user_agent_string_encoded) 
| table username session_id duration user_agent_string_decoded

Pros:
Full session is created
Duration is created and is accurate

Cons:
search time is really slow
permissions dont quite lockdown the search as expected
* can can lock down a dashboard but the user adjust the URL to gain access to search

3)
Using stats, eventstats, append, intersect commands to figure out another way to group data.
After the grouping is figured out, I can then figure out the detail of permissions/authorizations

0 Karma
1 Solution

rewritex
Contributor

I've found a solution for my task ...

I'm using a join instead of a transaction to grab the events and then I collect them which pushes the unmodified events over to my summary index. Now within the summary index I run a transaction as needed and all the duration works as expected. The duration seems to be lost "sometimes" when doing it before the collect ...

index=vpnlogs  (process=tmn* OR process=apd) | join session_id [search group=sales] | collect index=vpn-log-sales

The other problem was since this is a VPN session, not all of the events are grouped .... ie a user will login now, but logout hours/days later ... Yes, there is a session-id field on every event but my scenario is I needed to isolate events by a secondary field .... The disconnects did have my session_id field and fields stating it was a disconnect so I run a collect for just the disconnects and pipe those into another summary index ..... Setting up the ROLE I grant each different group access to their own summary index plus grant every group access to the disconnect summary index ... this solves my issues.

Issue: multiple group vpn logs are being piped into an index that also includes other logs. No group/pipe tagging is available.
task: all groups want to see just their data and want their data to be kept private from the others.

Solution:

Put this into a report and schedule it to run at a set interval

index=vpnlogs (process=tmn* OR process=apdd) | join session_id [search group=sales] | collect index=vpn-logs-sales

1) isolate the VPN logs away from all other logs

  index=vpnlogs (process=tmn* OR process=apdd)

2) join the events using a common session_id field then group the events by group field

 | join session [ search group=sales]

3) send the groups data to their own summary index

collect index=vpn-logs-sales

4) create a separate summary index for the disconnect logs and create a report and schedule to collect the data

index=vpn (process=tmn* OR process=apdd) | search ppp_tunnel=* OR session_deleted=* OR session_statistics=* | collect index=vpn-stats
(index=vpn-stats)

5) create a ROLE for each group with a restrtric search terms setting for just their index ( ie. summary index) and disconnect logs index

Role: Sales Restrict Search Terms: index=vpn-logs-sales OR index=vpn-stats

Lastly I am looking at adding a -dedup on a schedule to remove some of the duplicate disconnect events

View solution in original post

rewritex
Contributor

I've found a solution for my task ...

I'm using a join instead of a transaction to grab the events and then I collect them which pushes the unmodified events over to my summary index. Now within the summary index I run a transaction as needed and all the duration works as expected. The duration seems to be lost "sometimes" when doing it before the collect ...

index=vpnlogs  (process=tmn* OR process=apd) | join session_id [search group=sales] | collect index=vpn-log-sales

The other problem was since this is a VPN session, not all of the events are grouped .... ie a user will login now, but logout hours/days later ... Yes, there is a session-id field on every event but my scenario is I needed to isolate events by a secondary field .... The disconnects did have my session_id field and fields stating it was a disconnect so I run a collect for just the disconnects and pipe those into another summary index ..... Setting up the ROLE I grant each different group access to their own summary index plus grant every group access to the disconnect summary index ... this solves my issues.

Issue: multiple group vpn logs are being piped into an index that also includes other logs. No group/pipe tagging is available.
task: all groups want to see just their data and want their data to be kept private from the others.

Solution:

Put this into a report and schedule it to run at a set interval

index=vpnlogs (process=tmn* OR process=apdd) | join session_id [search group=sales] | collect index=vpn-logs-sales

1) isolate the VPN logs away from all other logs

  index=vpnlogs (process=tmn* OR process=apdd)

2) join the events using a common session_id field then group the events by group field

 | join session [ search group=sales]

3) send the groups data to their own summary index

collect index=vpn-logs-sales

4) create a separate summary index for the disconnect logs and create a report and schedule to collect the data

index=vpn (process=tmn* OR process=apdd) | search ppp_tunnel=* OR session_deleted=* OR session_statistics=* | collect index=vpn-stats
(index=vpn-stats)

5) create a ROLE for each group with a restrtric search terms setting for just their index ( ie. summary index) and disconnect logs index

Role: Sales Restrict Search Terms: index=vpn-logs-sales OR index=vpn-stats

Lastly I am looking at adding a -dedup on a schedule to remove some of the duplicate disconnect events

dhogland_splunk
Splunk Employee
Splunk Employee

rewritex

First off, great seeing you at SplunkLive! yesterday, hope you enjoyed the event.

Now, down to business...
Is there a way that you can re-write your initial search to be more inclusive and less exclusive?
"process=* NOT (logiz OR cron OR crond OR syslog-ng OR tmtm OR snm OR ssd OR httpd OR mpd)"

If you remove the NOT and get away from the "process=*" you may find more efficiency in your search with something along the lines of
(process=dpi OR pcrocess=zzm1)

I also like the approach of grouping transactions by the vpn_session_id as this seems a bit more accurate and more likely to contain all relevant data about that session, I'd suggest though adding a maxspan= or even better yet a maxevents= option if you can just to help your search close out transactions.

Can you provide some examples of how the time is being screwed up in option 1? I think this option best fits your need so I'd like to see more around your cons there and if we can alleviate those, I think you're going to have what you want/need.

0 Karma

rewritex
Contributor

Hi dhogland,

Yea thanks for the response and SplunkLive! was pretty cool... It must be funny (maybe even frustrating) listening to questions from people who have had a few drinks...

performance
Yes, I have updated my search string to use OR instead of NOT.. I was doing a (process=dpi* OR zzm) but parsed it out with your suggestions. I'm also testing the <param name="searchModeLevel">fast</param>

close out transactions
I'll need to research my maxevents.. I don't think I've seen any grouped events over 21events while the average looks like 15-17 events per group

index summary / duration
The data in the summary index has the transaction vpn_session_id grouped structure minus the duration field.
When I attempt to create a duration field the time comes back as duration=0 for each group
If I posted the sample data, it would look just like the above data, just duration=0

1) index=test1 | transaction_id vpn_session_id
** produces duration=0

2) index=test1 | transaction_id vpn_session_id startswith="New session" endswith="Session deleted"
**** produces no data found .. this is wierd, bug?**

3) index=test1 | transaction_id vpn_session_id startswith="New session"
** produces duration=0

4) index=test1 | transaction_id vpn_session_id endswith="Session deleted"
** produces duration=0

Things I am trying
My search:
index=sslvpn (process=dpi OR process=dpi1 OR process=dpi2 OR process=dpi3 OR process=zzm OR process=tmm4) | transaction vpn_session_id startswith="New session" endswith="Session deleted" | search department=SALES

testing
addinfo
collect - using addtime=(true|false)
stats - to try and calculate duration with _time

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...