Splunk Search

User keyword Lookup and Replace

gnovak
Builder

I'm trying to use lookups to do a keyword search and I can't grasp my brain around the right way to do this.

I've got some web logs I'm looking at in splunk that contain data that identifies what operating system and browser a user is using. The string that contains this data isn't always the same algorithm so my regex's haven't been succssful. I'm planning on making a chart of the most popular browsers and the most popular operating systems. I'd like to do the following as a new idea:

  1. Make a csv of all the operating systems and a csv of all the browsers.
  2. Use the lookups command to do a keyword search to locate these key words and rename them to more identifiable terms (example: Windows NT 6.1 = Windows 7).
  3. Perform a count of how many times the new identifiable term (example: Windows 7) has been found for the given period of time.

I have a simple search like this. I am looking at one particular object to get the information I need:

sourcetype=access_logs command=GET company_logo | dedup username

The type of information i get back in results is :

10.10.10.10 10.120.130.140 www.testing.somedomain.com [22/Jul/2013:19:22:08 +0000] 304 "GET /blahblah-tmf/images/company_logo.png HTTP/1.1" [booberry] (http-apr-8080-exec-3) 1 - "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0"

So, I want to pipe this search to look at the lookup file, look for keywords I have listed, rename those keywords to something else and put them in a field, and then I will do a count of how many times those new renamed keywords were found. Even if I don't use the lookups command and somehow could do an automatic lookup would be cool.

My lookup file for the browser csv I started looked like:

keyword, browser_type
Trident/4.0,IE8
Trident/5.0,IE9
Trident/6.0,IE10

I checked a few other questions on this but didn't get it right just yet so figured I'd dump that here. I tried this one: http://splunk-base.splunk.com/answers/84799/find-multiple-keywords-in-file-and-show-them-on-a-chart

My search is this so far:
sourcetype=access_logs command=GET company_logo | dedup username

Any ideas?

Tags (2)
0 Karma
1 Solution

lguinn2
Legend

Before you go too far down this path, you might look at this question/answer about
IIS User Agent Extraction

There is no definitive list of possible user-agents, and no algorithm for deriving the OS and browser from the user agent. But the technology add-ons that are mentioned in IIS User Agent Extraction question are pretty good.

View solution in original post

0 Karma

lguinn2
Legend

Before you go too far down this path, you might look at this question/answer about
IIS User Agent Extraction

There is no definitive list of possible user-agents, and no algorithm for deriving the OS and browser from the user agent. But the technology add-ons that are mentioned in IIS User Agent Extraction question are pretty good.

View solution in original post

0 Karma

gnovak
Builder

Ok i got this to work actually by extracting the entire line "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0", placing this into a field called "http_user_access" and downloading the necessary csv file for the app. (view read me). this worked. 🙂 thanks for the tips!

0 Karma

gnovak
Builder

Well the field extractor is not letting me extract that information into a field so I guess I have to do this manually.

0 Karma

lguinn2
Legend

Yes, I think that will work...

0 Karma

gnovak
Builder

oh wait are you saying i should make a field out of the entire string "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0" that i might find in the logs and then push that to the lookup file?

0 Karma

gnovak
Builder

Well the search I was using originally was sourcetype=access_logs command=GET company_logo | dedup username. I was trying to only get one count of the browser and OS a user was using when they login to the web app. I'm going to keep playing around with this a bit though.

0 Karma

lguinn2
Legend

And you could also send a message to the author of the plug-in. I am sure he would answer...

0 Karma

lguinn2
Legend

If you were using the sourcetype of access_combined or access_combined_wcookie (which are built into Splunk), you would have a field named useragent. You could set a field alias of http_user_agent and that would solve the problem.

For your sourcetype, I don't know what field you have, but it should include the entire string
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0"

Again, setting a field alias would create the field name that the script expects. That would be easier than changing the script.

0 Karma

gnovak
Builder

Well the one plugin expects a field http_user_agent which I don't have. I tried maybe changing the script to look at a different field but so far no dice. It's a cool plugin though.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.