Solved: Joining two log files that have a common field

Al · ‎11-04-2010

Hi All -

I am pretty new at advanced splunk searching, so I'm probably missing something very easy. I have two access log files that are of the same request, but from different servers, logging different things. They share a common field that is unique per request. I'd like to join these two files in a splunk search. I've easily whipped up a search using join which seems to work, however the main search results screen only shows one of the two files as output. I'd like to see a combination of both files instead. Here are examples:

file 1:

10.10.10.10 - - [04/Nov/2010:10:40:02 -0400] "GET /favicon.ico HTTP/1.1" 200 318 "-" AAABBBCCCDDDEEEFFF

file 2:

10.10.10.10 myusername AAABBBCCCDDDEEEFFF

AAABBBCCCDDDEEEFFF is unique, and common between the two. I turned that into a field called UniqueID on both. What I'm ultimately after is a single result that looks like (or something similar):

10.10.10.10 - - [04/Nov/2010:10:40:02 -0400] "GET /favicon.ico HTTP/1.1" 200 318 "-" AAABBBCCCDDDEEEFFF myusername

The join I have come up with is simply:

sourcetype="access_log_1" | join UniqueID [search sourcetype="access_log_2" ]

If I save these results a csv, it works as expected - however, I can't see the results layed out nicely in the splunk interface. Am I missing something?

Thanks,

Al

sideview · ‎11-04-2010

You dont really want to use join in this case. Join is the most intuitive direction to go in, but it's not very efficient and it's pretty cumbersome.

there are two other ways to do it that are usually better -- transaction and stats.

1) Stats. Im not sure what you're looking to do, but here's a made-up example showing the basic idea:

sourcetype="access_log_1" OR search sourcetype="access_log_2" | stats first(someField) first(someOtherField) sum(bytes) by UniqueID

This will allow you to merge the two rows into a single row and calculate statistics on the pairs.

2) Transaction

sourcetype="access_log_1" OR search sourcetype="access_log_2" | transaction UniqueID

This is a little different in how it does it, but similar. transaction takes the two single line events and basically makes a single multiline event. Any fields present on either event get carried over, (and where the field is present on both you will get multivalued fields in the process)

View solution in original post

sideview · ‎11-04-2010

You dont really want to use join in this case. Join is the most intuitive direction to go in, but it's not very efficient and it's pretty cumbersome.

there are two other ways to do it that are usually better -- transaction and stats.

1) Stats. Im not sure what you're looking to do, but here's a made-up example showing the basic idea:

sourcetype="access_log_1" OR search sourcetype="access_log_2" | stats first(someField) first(someOtherField) sum(bytes) by UniqueID

This will allow you to merge the two rows into a single row and calculate statistics on the pairs.

2) Transaction

sourcetype="access_log_1" OR search sourcetype="access_log_2" | transaction UniqueID

This is a little different in how it does it, but similar. transaction takes the two single line events and basically makes a single multiline event. Any fields present on either event get carried over, (and where the field is present on both you will get multivalued fields in the process)

loopbit · ‎03-16-2016

The results seems to be outer join of SQL. It works perfectly. Thank you~

sideview · ‎11-05-2010

Just a note - but often after a few more days of headscratching people come back to stats. It's the least intuitive but generally performs the best. Once you get used to it it starts to feel simpler than transaction.

Al · ‎11-05-2010

Hi nick,

transaction is perfect. Thank you very much.

Al

karthicjayarama · ‎11-16-2014

Joining two log files that have a common field

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services