Getting Data In

Loop through URL and http_referrer to find original request

bababou
Explorer

Hi everyone,

I'd like to see the flow from a given final URL, back to original URL the user typed.

In my Web Proxy Logs, I see the following :
_time, src_ip, http_referrer, http_method, URL

For example :
003, 1.1.1.1, htp://www.bbb.com/ads.html, GET, htp://www.ccc.com/ccc.html
002, 1.1.1.1, htp://www.aaa.com/, GET, htp://www.bbb.com/ads.html
001, 1.1.1.1, -, GET, htp://www.aaa.com/

What I want to do is, given the final URL (ccc.com/ccc.html), be able to go back in time, through the pair (http_referrer, URL) and find all the URLs up to the original one (aaa.com) with http_referrer="-".

Sometimes this flow can be spread among 10 different requests mixed in the middle of other web traffic, so this is hard to find by hand.

Programmatically I would do this with one loop, but I cannot find any loops with Splunk.

Can you help me ? Thanks.

0 Karma
1 Solution

bababou
Explorer

I solved my problem with an external script :


import splunk.Intersplunk

results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()

keywords, options = splunk.Intersplunk.getKeywordsAndOptions()
httpref = options.get('url', '-')

newresults = []

for result in results:
    if httpref == '-':
        break
    if result.get('url') == httpref:
        newresults.append(result)
        httpref = result.get('http_referer')

splunk.Intersplunk.outputResults(newresults)

And I call it this way :

... | referer url="htp://www.ccc.com/ccc.html" | table _time, http_referer, url

View solution in original post

0 Karma

bababou
Explorer

I solved my problem with an external script :


import splunk.Intersplunk

results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()

keywords, options = splunk.Intersplunk.getKeywordsAndOptions()
httpref = options.get('url', '-')

newresults = []

for result in results:
    if httpref == '-':
        break
    if result.get('url') == httpref:
        newresults.append(result)
        httpref = result.get('http_referer')

splunk.Intersplunk.outputResults(newresults)

And I call it this way :

... | referer url="htp://www.ccc.com/ccc.html" | table _time, http_referer, url

View solution in original post

0 Karma

somesoni2
Revered Legend

See Splunk's map command which is looping operator.

0 Karma

technoe
Explorer

How is the data indexed? Maybe you could use a last or first command instead of looping through each one...

0 Karma

bababou
Explorer

Some kind of "transaction" could also be fine, ideally a table with _time and url.

0 Karma

jsie_splunk
Splunk Employee
Splunk Employee

When you say "interested" how do you want the data expressed? As a single field containing the full path?

0 Karma

bababou
Explorer

What really interests me is the whole path.
In this example : aaa.com -> bbb.com/ads.html -> ccc.com/ccc.html
And not only the first and last requests.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!