Solved: Using splunk to create and view table metadata

sjanwity · ‎10-01-2014

I am completely new to Splunk!

I have a database where updates are performed in a very ad-hoc way: I delete records and insert a new record with the new values. I hook this up to another 'logging' database which records each insert/delete operation and the details of the affected record (e.g. the timestamp of the operation, the ID of the record, etc.) . I'd like to use splunk to group these update operations and view them - what field was changed when - , bearing in mind that I also do normal inserts and deletes.

I am aware that you can input SQL and use splunk to present the data, but how do I go about almost creating a new field by itself?

For example:

Table Customers:

Customer ID | Customer Name | Customer Address |
001 | John F | 213 Privet Drive
002 | Kyle A | 16 Gammon Road


Table Customers-History:

TIMESTAMP         | OPERATION | Customer ID | Customer Name | Customer Address
1-Dec-2010 09:52:1232| INSERT | 002       | Kyle A          | 10 Gammon Road
2-Dec-2010 09:54:9500| DELETE| 002         | Kyle A         | 10 Gammon Road
2-Dec-2010 09:54:9500| INSERT | 002         | Kyle A        | 16 Gammon Road
2-Dec-2010 09:55:9921| DELETE | 003         | Josh C        | 21 Drury Lane

In the above example, the 2nd and 3rd logs of the Customers-History table show an edit operation. I want splunk to go through and record all the changes to the Customers table, automatically grouping up the edit operations as well. How should I do this?

aweitzman · ‎10-01-2014

What you want to do is quite doable, but if you're completely new to Splunk, then you should start with the tutorial. Your first order of business will be getting your logs into Splunk and ensuring that the fields that you want to appear are being extracted properly. Then your question about a search that will group update operations will be easier to answer.

View solution in original post

sjanwity · ‎10-13-2014

I have accepted @aweitzman's answer as, although it did not really solve my question, his help from the comments did.

sjanwity · ‎10-02-2014

@aweitzman It seems I've completely misunderstood splunk...yes that is the case, I am viewing this table through dbquery. Yes I would like it imported on splunk, but my understanding was that I would use splunk to display this data first, then try to create a scheduler later to get data into splunk itself. I thought, though, that I could form a dbquery which indexes the database data automatically....is this not the case?

By 'grouping' I mean, have splunk see that the insert-delete operations are actually edit operations and present the information as such. For example on the data above splunk would show 2 records of 'changes' for Customer 'Kyle A': an insert on 1-Dec-2010 9:52 and an edit on 2-Dec-2010 9:54

sjanwity · ‎10-02-2014

@aweitzman My key is a compound key though (it has to get ID and transaction time so it doesn't mistake seperate independent insert-deletes as edits). When trying your snippet out I get a command="dbquery", A database error occurred: ORA-00933: SQL command not properly ended when appending the code to the end. Am I missing something? When you say 'main search goes here' what do you mean?

sjanwity · ‎10-03-2014

@aweitzman I did, just abit confused how the db connector works! I know the searches and the search commands but I am completely unsure as to how to use it on the db connector context. I thought at first that the dbconnector is a seperate app where you can simply combine SQL and splunk syntax together and it'll automatically help you query your database then display the data (with the data being stored on splunk as an interim), but I think from your messages this is not the case?

aweitzman · ‎10-03-2014

You understand things perfectly. You use the dbquery command for your SQL syntax to get some results, and then you pipe those results through Splunk commands.

However, I must note I informed you incorrectly about transaction. I just realized that since dbquery returns a table rather than events, transaction won't work. (However, that just means your search should get you 0 results rather than an error.)

What that means for you is that your ultimate solution here is going to involve creating a database input, so that all of your logs are in Splunk natively. Then you'll be able to work on the data using the transaction command to get the answers you want.

(Just to be clear, using dbquery to get a table and then piping it through a Splunk command to get another table, or a chart, will work with many, many Splunk commands, but transaction is not one of them. Unfortunately, transaction is the capability you're asking for in your question, so you'll need to go the database input route to solve that problem.)

sjanwity · ‎10-06-2014

By "creating a database input", which tutorial topic should I be looking at, and do you mean through dbconnect or native splunk? I'm confused because I don't know whether this is from a dbconnector perspective or a native splunk perspective (in fact I think that's the root of my problem - I jumped right into dbconnect before trying to understand splunk itself, so now I don't know what is unique to which perspective...)

aweitzman · ‎10-06-2014

Database inputs are a dbconnect concept. Look here: http://docs.splunk.com/Documentation/DBX/latest/DeployDBX/Configuredatabasemonitoring

That said, understanding Splunk itself is key, especially understanding the ways in which it differs from SQL. In any Splunk setup there is native Splunk data you can see by doing the search index=_internal. Exercise the concepts you learned in the tutorial by spending some time doing searches on this data to better "get" it, and then apply what you know about dbconnect to bring more data into the system to work with.

sjanwity · ‎10-06-2014

I think I've gotten the data in - I've posted a screenshot of my dbconnect app and a sample code sql query I tried (with meaningful data blotted out, sorry!): http://pbrd.co/1q4a4Sa

What would my next steps be - use transactions? Note that a data model has already been built
Would this task be easier editing the splunk config files rather than on the dbconnector itself?

aweitzman · ‎10-06-2014

Your picture shows you doing a database query within the DB Connect app, which appears to be the functional equivalent of running dbquery from the search box. So this will not help you.

Once you have created a database input as described in the link above, you should be able to search your data entirely within Splunk, without using dbquery or the Database Query part of the DB Connect app. Once this is established, you should be able to run regular Splunk commands (like transaction) on the raw events and get useful responses.

sjanwity · ‎10-06-2014

Oh alright, I think I understand now...

Wouldn't this be easier defining in the config files, and if so, I'm guessing on outputs.conf? Also, if I get the data into splunk, how do I search by a specific table? Or will that be indexed as one of the fields, i.e. sourcetype?

aweitzman · ‎10-07-2014

Wouldn't what be easier, exactly? Creating a database input? I think at the level you're at right now, using the UI to help you as much as you can is your best bet.

There aren't really tables in Splunk the way you're thinking about them. It's a very different mindset than SQL. You can use the sourcetype to segregate your data into meaningful sections. It would probably work for you at this point to assign a different sourcetype to each table/view you input. (Over time, you might realize that you may get more performant searches if some tables/views were grouped together into a single sourcetype, but you can attack that problem later.)

sjanwity · ‎10-09-2014

Thanks for all your help so far! I've gotten the data in splunk and gotten transactions to work. My splunking adventure is almost complete, however the data is not displaying as I want it, though. I've also changed the output abit: I think it is better to see the changes on a field as well as grouping update events. My transaction command is currently
transaction TYPE_NAME FIELD_NAME OBJECT_KEY
but it's not displaying the data as I want.

Currently the data is displayed like this:
http://i.stack.imgur.com/RZwib.jpg

But in reality the database is this:
http://i.stack.imgur.com/ulV2A.jpg

As you can see, the initial insert has completely disappeared and the last delete as well.

Why is the last delete event and initial insert event missing? It should group the insert and delete event that occured on 11-May-11 but if I use maxspan then the initial insert and subsequent delete becomes a seperate event. How should I do this, and how do I make splunk rename the INSERT-DELETE event to an UPDATE?

aweitzman · ‎10-09-2014

The last delete and initial insert are missing because they're not part of your defined transactions. If you want to include them, add keeporphans=true to your transaction command.

As for renaming the event, create a new field based on the contents of the OPERATION field and use that instead. Add the following to the end of your search:

| eval TOTALOPERATION=if(OPERATION=="INSERT" AND OPERATION=="DELETE","UPDATE",OPERATION)

This creates a new field TOTALOPERATION which contains either UPDATE if your original OPERATION value contains both INSERT and DELETE, or the original value of the OPERATION field if it doesn't. (I know it seems strange to test for equality against two different values in the same logical operation, but it works in Splunk because the field is multivalued.)

sjanwity · ‎10-10-2014

Thanks for the TOTALOPERATIONfield, it worked nicely! However keeporphans is still returning me this result:

http://i.stack.imgur.com/ulV2A.jpg

My transaction command is currently | transaction TYPE_NAME FIELD_NAME OBJECT_KEY keeporphans=true |, am I doing something wrong?

It's also not showing the operation on records which were changed multiple times:
http://i.stack.imgur.com/NgaHw.jpg

aweitzman · ‎10-10-2014

(Your first image is the one with your table, not Splunk results, so I'm not sure what it means, but I'll try and interpret it anyways.)

You need to add the qualifiers I mentioned earlier to your transaction command, to limit them to just the UPDATE part of things. You'll end up with something like this:

| transaction TYPE_NAME FIELD_NAME OBJECT_KEY keeporphans=true maxspan=1s maxevents=2 startswith="DELETE" endswith="INSERT"|

sjanwity · ‎10-10-2014

Whoops sorry, I copied the wrong image to you. The image is this:
http://i.stack.imgur.com/1hVDt.jpg

i.e. it's still not showing the initial insert and the ending delete operation even when I put keeporphans=true

The problem with maxevents=2 is that it then splits all the events that are associated with that key chain (i.e. TYPE_NAME, FIELD_NAME OBJECT_KEY). Is it possible to show all the events associated with that key chain and show the UPDATE/INSERT/DELETE events? For example the expected output of my field would be the 4 events recorded on the database as INSERT, DELETE, INSERT, DELETE but shown on Splunk with the same data but with operations as INSERT, UPDATE, DELETE and all on the same splunk statistic/'event'.

aweitzman · ‎10-10-2014

I'm not sure I understand what the problem is. You get three separate events (INSERT, UPDATE, DELETE) when you do what I suggested, right? You don't want to cram all three separate events into a single Splunk event because you'll lose all the information about what order each separate thing occurred.

In other words, resist the urge to try and combine these into a single event. You'll be able to do a lot more with your data by keeping them separate. What is your motivation for trying to put them into a single event?

sjanwity · ‎10-10-2014

I apologise, the customers example was abit poor; the table is supposed to be a sort of ledger table for the database. Here TYPE_NAME represents a table name, OBJECT_KEY represents a key and FIELD_NAME is a field of that key in that table. So on http://i.stack.imgur.com/1hVDt.jpg we have "In the table RiskManagementBookConfig, for field 'enabled' on the key 'Y:LD_ERISK_A' we deleted the previous record of 'false' and inserted the value 'true'".

From this architecture, I just think it's easier to see the report as a log of changes for that specific key-chain. Each key-chain represents a field on the database, so in effect it'll be a log for the viewer on how the field was changed in the past specified time range. Does this affect your advice to me?

aweitzman · ‎10-10-2014

Are you sorting your results? I think if you sorted your results by your key-chain fields, and then by time, you'd get the kind of thing you're looking for without having to cram everything into a single event. You'd get each key-chain's events next to each other in your results, thus being able to see the kind of thing you're trying to see.

sjanwity · ‎10-10-2014

Yes I am, but I still think it's better to view them as a single event (and I'm curious as to how it would work anyway). Purely academically, is this possible and if so, how?

Using splunk to create and view table metadata

Splunk MCP & Agentic AI: Machine Data Without Limits

Finding Based Detections General Availability

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience

Join the Conversation

Using splunk to create and view table metadata

Splunk MCP & Agentic AI: Machine Data Without Limits

Finding Based Detections General Availability

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience