Solved: Re: How to tell splunk that 2 entries are related ...

zsimic · ‎05-01-2011

Hi,

I'm trying to feed some data coming from a SQL database to Splunk. I have multiple tables that I'm trying to "flatten out" for Splunk, and I'd like to know if there is any way to keep the connections that existed in the SQL DB. I have something like this:

Table A:
- Columns: timestamp, eventid, class, user, host and multiple other key-value pairs
Table B:
- Columns: timestamp, eventid, severity, message
- Many rows in Table B will correspond to 1 row in Table A
Table C:
- Columns: eventid, type, value
- Many rows in Table C will also correspond to 1 row in Table A

What I would like to know is basically what's the best approach to teach Splunk about the connection that exists here via 'eventid'? (which is just a number). I'm planning on simply outputting the contents of Table A and B in this form:

For Table A, one line per row:

[time] eventid=<number>, class=<string>, user=<string>, host=<string>, etc...

For Table B, also one line per row:

[time], eventid=<number>, severity=<string>: <message>

For Table C, also one line per row:

[time], eventid=<number>, type=<string>, value=<decimal number>

I'm new to Splunk, so maybe what I'm worrying about here is irrelevant for Splunk...
Is there a way to correlate entries from B and C without repeating 'class=...' in each entry output for B and C?
Can Splunk basically find all 'eventid' numbers that were generated for a given 'class' (in a given time frame), then fetch all B and C items for all those 'eventid'-s? And could it sum/average the 'value' found in C for such a 'class'?

I don't want to repeat 'class=' in each line for B and C because the amount of data repeated would be huge:

there are many fields like 'class' (like 'user' and 'host' shown above), some of them with relatively long values
there are many entries in B and C for 1 row in A...

To explain this with a more concrete example: suppose I search for all Table A entries with class=FOO, I get a set of entries found with their 'eventid', I would like to now show a chart per day with the number (count) of entries in Table B having one of the 'eventid'-s in that set.

Also, with Table C's 'value', I would like to see a chart with average 'value'-s per day, all the 'value'-s considered being averaged in that set of 'eventid'-s, and by 'type' (found in the Table C output).

What to do basically (how to best organize the data I output) to make sure Splunk has everything it needs to make the connections?

mslvrstn · ‎05-01-2011

I believe you'll feel pretty comfortable with the command you want : join

search sourcetype=A class=FOO | join eventid [search sourcetype=B] | timechart span=24h count by eventid

After the join, you'll have a set of results, one line for every event in B that had an eventid in the first search.
You can do something similar for C, or perhaps it will make more sense to make the join search [sourcetype=B OR sourcetype=C]

View solution in original post

mslvrstn · ‎05-01-2011

I believe you'll feel pretty comfortable with the command you want : join

search sourcetype=A class=FOO | join eventid [search sourcetype=B] | timechart span=24h count by eventid

After the join, you'll have a set of results, one line for every event in B that had an eventid in the first search.
You can do something similar for C, or perhaps it will make more sense to make the join search [sourcetype=B OR sourcetype=C]

zsimic · ‎05-01-2011

Allright! Made it work, all I needed to do was add index=imported as well in the join 🙂 as in:

index=imported sourcetype=A class=FOO | join eventid [search index=imported sourcetype=B] | timechart span=24h count by eventid

zsimic · ‎05-01-2011

The join doesn't seem to work, I tried but I get no entries returned. The first part of the search had to be like this because of how I organized the data (had to omit the first 'search' word otherwise I get nothing):

index=imported sourcetype=A class=FOO

That returns all results as I expected, but adding a join yields 0 events, no matter what I try

How to tell splunk that 2 entries are related when feeding it data from a SQL database?

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability

Introducing the Splunk Community Dashboard Challenge!