Knowledge Management

Create a summary index that creates fields for each result per day

dsollen
Explorer

I have data coming in which can roughly be looked at as having four fields

Timestamp, source, flag, count

What I want to do is create a summary index which gives a count of all flags seen for a source in a day. Specifically I want to make a new field for each unique flag value that contains a count So if I had something like

Tuesday, foo, A 20
Tuesday, foo, B, 30
Tuesday, foo, C, 21

I would want a search for Tuesday from source foo to return

 Timestamp = Tuesday, source = foo, A=20, B=30, C=21

I want to use a summary index that keeps track of these tallies by day (based off of the linux time stamp), so that it updates by the fly.

Can this be done using a summary index? I'm not certain primarily how to create new fields based off of the values of "flag". I'm also not entirely certain the most efficient way to have symbolic index generate data per day for the least say 90 days.

0 Karma
1 Solution

somesoni2
Revered Legend

You can create a field for column values like this (run anywhere sample, first two lines are to just generate data)

| gentimes start=-1 | eval temp="Tuesday, foo, A, 20# Tuesday, foo, B, 30# Tuesday, foo, C, 21" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<Timestamp>\w+),\s*(?<source>\w+),\s*(?<flag>\w+),\s*(?<count>\w+)" | fields - temp 
| eval temp=Timestamp."##".source | xyseries temp flag count | rex field=temp "(?<Timestamp>\w+)##(?<source>\w+)" | fields - temp| table Timestamp source *

OR
Updated to handle duplicate events which xyseries doesn't handle

| gentimes start=-1 | eval temp="Tuesday, foo, A, 20# Tuesday, foo, B, 30# Tuesday, foo, C, 21" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<Timestamp>\w+),\s*(?<source>\w+),\s*(?<flag>\w+),\s*(?<count>\w+)" | fields - temp | eval temp=Timestamp."##".source | chart sum(count) over temp by flag  | rex field=temp "(?<Timestamp>\w+)##(?<source>\w+)" | fields - temp| table Timestamp source *

View solution in original post

somesoni2
Revered Legend

You can create a field for column values like this (run anywhere sample, first two lines are to just generate data)

| gentimes start=-1 | eval temp="Tuesday, foo, A, 20# Tuesday, foo, B, 30# Tuesday, foo, C, 21" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<Timestamp>\w+),\s*(?<source>\w+),\s*(?<flag>\w+),\s*(?<count>\w+)" | fields - temp 
| eval temp=Timestamp."##".source | xyseries temp flag count | rex field=temp "(?<Timestamp>\w+)##(?<source>\w+)" | fields - temp| table Timestamp source *

OR
Updated to handle duplicate events which xyseries doesn't handle

| gentimes start=-1 | eval temp="Tuesday, foo, A, 20# Tuesday, foo, B, 30# Tuesday, foo, C, 21" | table temp | makemv temp delim="#" | mvexpand temp | rex field=temp "(?<Timestamp>\w+),\s*(?<source>\w+),\s*(?<flag>\w+),\s*(?<count>\w+)" | fields - temp | eval temp=Timestamp."##".source | chart sum(count) over temp by flag  | rex field=temp "(?<Timestamp>\w+)##(?<source>\w+)" | fields - temp| table Timestamp source *

dsollen
Explorer

As to my question, can you clarify how the xyseries handles duplicate field values, the specification doesn't seem to specify this. If I have:

Tuesday, foo, A 20
Tuesday, foo, A 30

Would the above command return? Ideally I would want a single foo field with 50. I could sum these values ahead of time, but if the xyseries already knows to do this then it would be wasteful to do that summation earlier. Do you know if it would already be smart enough to sum values?

0 Karma

dsollen
Explorer

Answered my own question. It does not sum, it just returned the last value seen. I switched to the chart command from you second example, except that I used chart sum(count) over temp by flag, which does the summation as part of the same step, which I presume is more efficient, at the very least it's mildly easier to read. Plus, there is already a sichart command so using the chart format allows me to switch to sichart easily

0 Karma

somesoni2
Revered Legend

The xyseries will remove the duplicate values (for the temp, flag combination) taking the last value into consideration. So, for your example data above, it would just give A=30.

To handle this per your requirement, try the udpated answer using chart command.

0 Karma

dsollen
Explorer

Thank you. With this I was able to figure out the date part myself, I'll be figuring out how to make this a symbolic index next (I umm...think I can do that with this, I'm a little surprised that the convenience si* commands don't seem to have a clear fit, but I've only glanced over them so were see). I wanted to elaborate on what the command above did for future splunk newbs doing this, and I'll post a second comment with a small followup question 🙂

First, elaboration on what's happening. I had a little trouble parsing what was happening above at first, due to my lack of splunk experience and the first two (2.5 on my monitor) lines doing other things (I'm glad you clarified that they were just building data first so I didn't have to waste as much time parsing it, but again it was technically 2.5 lines on my screen so still took a second for me to figure out that the first fields - temp command was where sample data generation stopped).

For anyone else looking at this in the future the important command was the xyseries (or chart in the second example) command(s) which does almost exactly what I wanted. It takes three arguments, first one field to provide a unique key, with one event per unique key generated, then it takes an "x name" and "x value" as the next two arguments, in this case flag and count respectively. since flag is the x_name that equates to making one column for each value in flag, and with count as x_value each column created will have the contents of count stored into them.

Prior to XYseries the eval combines timestamp and src, my two unique keys, to have a single unique key to use for xyseries. After the xyseries then the rex and "fields - temp" commands break that unique key back up the temp field created by eval back into the original source and timestamp values.

I added an eval day = strtime(_time, "$x") in front of everything to convert the timestamps to unique dates per my other requirement. I also had to change the final rex comand to replace both the \w+ regex with a simpler .+, since both my date and source included non word characters at this point.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...