All Apps and Add-ons

Using splunk to create and view table metadata

sjanwity
Communicator

I am completely new to Splunk!

I have a database where updates are performed in a very ad-hoc way: I delete records and insert a new record with the new values. I hook this up to another 'logging' database which records each insert/delete operation and the details of the affected record (e.g. the timestamp of the operation, the ID of the record, etc.) . I'd like to use splunk to group these update operations and view them - what field was changed when - , bearing in mind that I also do normal inserts and deletes.

I am aware that you can input SQL and use splunk to present the data, but how do I go about almost creating a new field by itself?

For example:

Table Customers:

Customer ID | Customer Name | Customer Address |
001 | John F | 213 Privet Drive
002 | Kyle A | 16 Gammon Road


Table Customers-History:

TIMESTAMP         | OPERATION | Customer ID | Customer Name | Customer Address
1-Dec-2010 09:52:1232| INSERT | 002       | Kyle A          | 10 Gammon Road
2-Dec-2010 09:54:9500| DELETE| 002         | Kyle A         | 10 Gammon Road
2-Dec-2010 09:54:9500| INSERT | 002         | Kyle A        | 16 Gammon Road
2-Dec-2010 09:55:9921| DELETE | 003         | Josh C        | 21 Drury Lane

In the above example, the 2nd and 3rd logs of the Customers-History table show an edit operation. I want splunk to go through and record all the changes to the Customers table, automatically grouping up the edit operations as well. How should I do this?

1 Solution

aweitzman
Motivator

What you want to do is quite doable, but if you're completely new to Splunk, then you should start with the tutorial. Your first order of business will be getting your logs into Splunk and ensuring that the fields that you want to appear are being extracted properly. Then your question about a search that will group update operations will be easier to answer.

View solution in original post

sjanwity
Communicator

I have accepted @aweitzman's answer as, although it did not really solve my question, his help from the comments did.

0 Karma

sjanwity
Communicator

@aweitzman It seems I've completely misunderstood splunk...yes that is the case, I am viewing this table through dbquery. Yes I would like it imported on splunk, but my understanding was that I would use splunk to display this data first, then try to create a scheduler later to get data into splunk itself. I thought, though, that I could form a dbquery which indexes the database data automatically....is this not the case?

By 'grouping' I mean, have splunk see that the insert-delete operations are actually edit operations and present the information as such. For example on the data above splunk would show 2 records of 'changes' for Customer 'Kyle A': an insert on 1-Dec-2010 9:52 and an edit on 2-Dec-2010 9:54

0 Karma

sjanwity
Communicator

@aweitzman My key is a compound key though (it has to get ID and transaction time so it doesn't mistake seperate independent insert-deletes as edits). When trying your snippet out I get a command="dbquery", A database error occurred: ORA-00933: SQL command not properly ended when appending the code to the end. Am I missing something? When you say 'main search goes here' what do you mean?

0 Karma

sjanwity
Communicator

@aweitzman I did, just abit confused how the db connector works! I know the searches and the search commands but I am completely unsure as to how to use it on the db connector context. I thought at first that the dbconnector is a seperate app where you can simply combine SQL and splunk syntax together and it'll automatically help you query your database then display the data (with the data being stored on splunk as an interim), but I think from your messages this is not the case?

0 Karma

aweitzman
Motivator

You understand things perfectly. You use the dbquery command for your SQL syntax to get some results, and then you pipe those results through Splunk commands.

However, I must note I informed you incorrectly about transaction. I just realized that since dbquery returns a table rather than events, transaction won't work. (However, that just means your search should get you 0 results rather than an error.)

What that means for you is that your ultimate solution here is going to involve creating a database input, so that all of your logs are in Splunk natively. Then you'll be able to work on the data using the transaction command to get the answers you want.

(Just to be clear, using dbquery to get a table and then piping it through a Splunk command to get another table, or a chart, will work with many, many Splunk commands, but transaction is not one of them. Unfortunately, transaction is the capability you're asking for in your question, so you'll need to go the database input route to solve that problem.)

sjanwity
Communicator

By "creating a database input", which tutorial topic should I be looking at, and do you mean through dbconnect or native splunk? I'm confused because I don't know whether this is from a dbconnector perspective or a native splunk perspective (in fact I think that's the root of my problem - I jumped right into dbconnect before trying to understand splunk itself, so now I don't know what is unique to which perspective...)

0 Karma

aweitzman
Motivator

Database inputs are a dbconnect concept. Look here: http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Configuredatabasemonitoring

That said, understanding Splunk itself is key, especially understanding the ways in which it differs from SQL. In any Splunk setup there is native Splunk data you can see by doing the search index=_internal. Exercise the concepts you learned in the tutorial by spending some time doing searches on this data to better "get" it, and then apply what you know about dbconnect to bring more data into the system to work with.

sjanwity
Communicator

I think I've gotten the data in - I've posted a screenshot of my dbconnect app and a sample code sql query I tried (with meaningful data blotted out, sorry!): http://pbrd.co/1q4a4Sa

What would my next steps be - use transactions? Note that a data model has already been built
Would this task be easier editing the splunk config files rather than on the dbconnector itself?

0 Karma

aweitzman
Motivator

Your picture shows you doing a database query within the DB Connect app, which appears to be the functional equivalent of running dbquery from the search box. So this will not help you.

Once you have created a database input as described in the link above, you should be able to search your data entirely within Splunk, without using dbquery or the Database Query part of the DB Connect app. Once this is established, you should be able to run regular Splunk commands (like transaction) on the raw events and get useful responses.

sjanwity
Communicator

Oh alright, I think I understand now...

Wouldn't this be easier defining in the config files, and if so, I'm guessing on outputs.conf? Also, if I get the data into splunk, how do I search by a specific table? Or will that be indexed as one of the fields, i.e. sourcetype?

0 Karma

aweitzman
Motivator

Wouldn't what be easier, exactly? Creating a database input? I think at the level you're at right now, using the UI to help you as much as you can is your best bet.

There aren't really tables in Splunk the way you're thinking about them. It's a very different mindset than SQL. You can use the sourcetype to segregate your data into meaningful sections. It would probably work for you at this point to assign a different sourcetype to each table/view you input. (Over time, you might realize that you may get more performant searches if some tables/views were grouped together into a single sourcetype, but you can attack that problem later.)

sjanwity
Communicator

Thanks for all your help so far! I've gotten the data in splunk and gotten transactions to work. My splunking adventure is almost complete, however the data is not displaying as I want it, though. I've also changed the output abit: I think it is better to see the changes on a field as well as grouping update events. My transaction command is currently
transaction TYPE_NAME FIELD_NAME OBJECT_KEY
but it's not displaying the data as I want.

Currently the data is displayed like this:
http://i.stack.imgur.com/RZwib.jpg

But in reality the database is this:
http://i.stack.imgur.com/ulV2A.jpg

As you can see, the initial insert has completely disappeared and the last delete as well.

Why is the last delete event and initial insert event missing? It should group the insert and delete event that occured on 11-May-11 but if I use maxspan then the initial insert and subsequent delete becomes a seperate event. How should I do this, and how do I make splunk rename the INSERT-DELETE event to an UPDATE?

0 Karma

aweitzman
Motivator

The last delete and initial insert are missing because they're not part of your defined transactions. If you want to include them, add keeporphans=true to your transaction command.

As for renaming the event, create a new field based on the contents of the OPERATION field and use that instead. Add the following to the end of your search:

| eval TOTALOPERATION=if(OPERATION=="INSERT" AND OPERATION=="DELETE","UPDATE",OPERATION)

This creates a new field TOTALOPERATION which contains either UPDATE if your original OPERATION value contains both INSERT and DELETE, or the original value of the OPERATION field if it doesn't. (I know it seems strange to test for equality against two different values in the same logical operation, but it works in Splunk because the field is multivalued.)

sjanwity
Communicator

Thanks for the TOTALOPERATIONfield, it worked nicely! However keeporphans is still returning me this result:

http://i.stack.imgur.com/ulV2A.jpg

My transaction command is currently | transaction TYPE_NAME FIELD_NAME OBJECT_KEY keeporphans=true |, am I doing something wrong?

It's also not showing the operation on records which were changed multiple times:
http://i.stack.imgur.com/NgaHw.jpg

0 Karma

aweitzman
Motivator

(Your first image is the one with your table, not Splunk results, so I'm not sure what it means, but I'll try and interpret it anyways.)

You need to add the qualifiers I mentioned earlier to your transaction command, to limit them to just the UPDATE part of things. You'll end up with something like this:

| transaction TYPE_NAME FIELD_NAME OBJECT_KEY keeporphans=true maxspan=1s maxevents=2 startswith="DELETE" endswith="INSERT"|

0 Karma

sjanwity
Communicator

Whoops sorry, I copied the wrong image to you. The image is this:
http://i.stack.imgur.com/1hVDt.jpg

i.e. it's still not showing the initial insert and the ending delete operation even when I put keeporphans=true

The problem with maxevents=2 is that it then splits all the events that are associated with that key chain (i.e. TYPE_NAME, FIELD_NAME OBJECT_KEY). Is it possible to show all the events associated with that key chain and show the UPDATE/INSERT/DELETE events? For example the expected output of my field would be the 4 events recorded on the database as INSERT, DELETE, INSERT, DELETE but shown on Splunk with the same data but with operations as INSERT, UPDATE, DELETE and all on the same splunk statistic/'event'.

0 Karma

aweitzman
Motivator

I'm not sure I understand what the problem is. You get three separate events (INSERT, UPDATE, DELETE) when you do what I suggested, right? You don't want to cram all three separate events into a single Splunk event because you'll lose all the information about what order each separate thing occurred.

In other words, resist the urge to try and combine these into a single event. You'll be able to do a lot more with your data by keeping them separate. What is your motivation for trying to put them into a single event?

0 Karma

sjanwity
Communicator

I apologise, the customers example was abit poor; the table is supposed to be a sort of ledger table for the database. Here TYPE_NAME represents a table name, OBJECT_KEY represents a key and FIELD_NAME is a field of that key in that table. So on http://i.stack.imgur.com/1hVDt.jpg we have "In the table RiskManagementBookConfig, for field 'enabled' on the key 'Y:LD_ERISK_A' we deleted the previous record of 'false' and inserted the value 'true'".

From this architecture, I just think it's easier to see the report as a log of changes for that specific key-chain. Each key-chain represents a field on the database, so in effect it'll be a log for the viewer on how the field was changed in the past specified time range. Does this affect your advice to me?

0 Karma

aweitzman
Motivator

Are you sorting your results? I think if you sorted your results by your key-chain fields, and then by time, you'd get the kind of thing you're looking for without having to cram everything into a single event. You'd get each key-chain's events next to each other in your results, thus being able to see the kind of thing you're trying to see.

0 Karma

sjanwity
Communicator

Yes I am, but I still think it's better to view them as a single event (and I'm curious as to how it would work anyway). Purely academically, is this possible and if so, how?

0 Karma
Get Updates on the Splunk Community!

Get ready to show some Splunk Certification swagger at .conf24!

Dive into the deep end of data by earning a Splunk Certification at .conf24. We're enticing you again this ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Now On-Demand Join us to learn more about how you can leverage Service Level Objectives (SLOs) and the new ...

Database Performance Sidebar Panel Now on APM Database Query Performance & Service ...

We’ve streamlined the troubleshooting experience for database-related service issues by adding a database ...