Splunk Search

Is it possible to tell Splunk to group events based on a column?

sjanwity
Communicator

This is an extension of the question http://answers.splunk.com/answers/171571/using-splunk-to-create-and-view-table-metadata.html

I am namely talking about the stat command or the transaction command, unless there is a command which would do this job better! I am receiving input from a database and grouping them up into delete-insert pairs based on their key and update time, due to a technical debt which disallowed us from doing proper SQL updates.

Some rows however, are not joining, as even though the actions are supposed to be atomic, I found that some rows get updated into the database 1ms apart, breaking the update_time field. I therefore want to add a special column in splunk and then use a simple equals evaluation to indicate that these rows should be joined. However, I am having trouble telling splunk's transaction or statto combine rows based on these 2 columns being equal. Can anyone help?

sjanwity
Communicator

@aweitzman This is an extension to my old question, but I figured out you'd be bored of helping me by now! I found that using stat combined the fields better than transaction did because I make database updates in batches and whilst the operations are atomic, it still created situations abit like this:

DELETE  (key: fdgd)
INSERT   (key: abcd)
INSERT   (key: fdgd)

i.e. actions performed on another row got in between the updates, which I think is confusing the transaction command and not making it work properly.

The stat command combines everything beautifully when I use it with the _time field, however there are cases where the operations were performed 1 milisecond away which is causing the operation to break on those few cases.

aweitzman
Motivator

transaction should handle it, but when it doesn't do what you want, sometimes you need to find another way. Glad you were able to get what you needed.

sjanwity
Communicator

wait, are you saying it was supposed to be able to handle it? How does transaction work on the background, anyway? I thought it was breaking on those few cases because there were rows for other records being executed in between the insert-delete pair, thus confusing the transaction command.

0 Karma

aweitzman
Motivator

Yes, transaction is supposed to be able to handle it even when there are other records in between, since the command is qualified with the fields that you're working with, and it should be able to separate them out based on that. But sometimes reality doesn't match documentation, and you have to hack your way around it. And it sounds like you've done that.

sjanwity
Communicator

would you perchance have any idea why? or how I could make them match up? I don't get how stat works the way I'm expecting transaction to work and am afraid that this little technical debt will bite me back big time in the future

0 Karma

aweitzman
Motivator

Can you give an example of what you're doing with stats that works for you? It does surprise me at some level that it solves your problem, but I'm not seeing how you're using it, so I can't really tell.

sjanwity
Communicator

nevermind, I just wasn't sorting the transaction results properly. Transactions are now grouping them up correctly after changing from UPDATE_TIME -> _time to EXECUTED_TIME (which, I think, is the timestamp of when the command was actually sent from the client) -> _time which I found on another table and did a join on. Yay!

aweitzman
Motivator

Woohoo! Glad to hear you got it working.

aweitzman
Motivator

I have no idea why. Maybe it's a bug? I don't use transaction very much, but in order to help you I cobbled together some examples, most using standard Splunk data, but some even using dbquery to pull the data in, and it seemed to work fine for me.

0 Karma

aweitzman
Motivator

You haven't told us what the contents of your special new column is, so it's going to be hard to understand what you're really trying to do. You wouldn't use transaction to test whether the values of two separate columns are equal anyways, and while you could do it with stats, that's only if you want to aggregate that information across your entire result set, which it doesn't sound like you want to do.

It seems like you want that information for each event, in which case you can simply create a new field using eval, like eval newfield=if(field1==field2,1,0) and then work with that.

(That said, if the times are really just a millisecond off, maybe expanding the maxspan of your transaction from 1s to 2s might solve your problem, I would guess.)

the_wolverine
Champion

YES. Your description is not clear but using a transaction should work if the column value is really the same. Basically:

<your search> | transaction <your_column_name> 
0 Karma
Get Updates on the Splunk Community!

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...