Reporting

summary index merges multiple line values into one row

LearningGuy
Builder


summary index merges multiple line values into one row, while regular index put the values into a separate lines, so when I used stats values command on summary idnex to group by ip, the merged values are not unique.
Questions:
1) How do I make summary index put multiple values into separate lines like on a regular index? 
2) When I use stats values command, should it return unique values?   
Thank you so much for your help

See below example:  


1a) Search using regular index

index=regular_index 
| table company, ip
company ip
companyA
companyA
1.1.1.1
companyB
companyB
companyB
1.1.1.2


1b) Search regular index after grouped with stats values

index=regular_index 
| stats values(company) by ip
| table company, ip

 

company ip
companyA 1.1.1.1
companyB 1.1.1.2



2a) Search using summary index

index=summary  report=test_ip
| table company, ip
company ip
companyA companyA 1.1.1.1
companyB companyB companyB 1.1.1.2


2b) Search summary index after grouped with stats values

index=regular_index 
| stats values(company) by ip
| table company, ip

 

company ip
companyA companyA 1.1.1.1
companyB companyB companyB 1.1.1.2

 

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Ok. We need to get the terminology straight.

1. There is no such thing as "summary index" as a type of index. Splunk has only two types of indexes - events and metrics. You can have a summary index as an index which gets your summaries but it's purely organizational issue.

1a. You can have have both summary events and any other kind of events in the same index.

2. There is summary indexing meaning a process in which you generate data which is saved into your indexes for summarizing purposes.

3. There is no such thing as commands in the index. Searches can read from an index and write to them but they are not in an index.

So you're either using collect explicitly or it's done implicitly as a result of summary indexing option in scheduled search.

4. Indexes just hold data. They don't do anything with it. The data is either permanently transformed before being written to the index (that's what happens when the data is collected to the summary index) or is being dynamically transformed on read according to sourcetype/source/host definition (in your case it's the definition for the stash sourcetype). Index has nothing to do with it.

The summary indexing in the scheduled search works the same way as the collect command does - the results are getting written to an intermediate csv file from which they are ingested into the destination index. But here you can't decide on the details as you can do with the manually spawned collect command. So either fiddle with the configuration described in the article I linked (might work, might not; haven't tried it myself), manually split the results on search (but that might be problematic if you have spaces in your field values; in such case you could try to delimit multivalued fields differently before collecting) or split your events so that you don't have multivalued fields before collecting the summaries.

View solution in original post

LearningGuy
Builder

Hello,
What do you mean I have to seek there for answers?  
As I mentioned in the example, the index summary does not change the data, but it merged data into one row

companyA
companyA

to

Company A company A

Here's my original questions:
1) How do I make summary index put multiple values into separate lines like on a regular index?
2) When I use stats values command, should it return unique values?

Thank you for your help

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Summary index is not something that materializes out of thin air. Something is puting data there. Find out what and how.

0 Karma

LearningGuy
Builder

Perhaps I think you don't understand my question. I put an example very specific on my original post.
I do understand the data and how it got here because I was the one that created the summary index, but I could not post company's data in here, so everything is just imaginary fields.   So, the point is after summary index, the data does not change, but it got merged into one line


Can you provide an example what you meant?

Thanks

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The collect command (assuming that is what you are using to populate your summary index), merely puts the events from the pipeline into your index. What @PickleRick is driving at is that it is how you construct your search prior to the collect command that determines how the events look in the summary index. Perhaps you could share an anonymised version of your search SPL (preferably in a code block </>), so we can suggest changes prior to the collect command.

LearningGuy
Builder

Hello @ITWhisperer @PickleRick ,
Maybe I don't explain it clearly. I am confused on why you need to know how I constructed the search
I don't think it matters because the summary index derived from the same source of commands
I attached the picture below. I hope that explains.  Thank you
CloudGuy_0-1698594330291.png



0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

So, the search you are using to populate the summary index has multiple values in each row for the company, as does the company field in summary index. Is this not what you expected?

LearningGuy
Builder

Yes, I expect they are formatted the same..
CompanyA CompanyA  are merged into a single row instead of a separate row, so when I used stats value on summary index, it will not consider CompanyA as 1 unique value, but instead "CompanyA CompanyA CompanyA" is one unique value. Thanks
See below. 

CloudGuy_0-1698595749378.png

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. So it's not that the "index" merges the values. It's the collect command that sometimes can work funny on them.

See https://docs.splunk.com/Documentation/Splunk/9.1.1/SearchReference/Collect#Change_how_collect_summar...

LearningGuy
Builder

Hello,

1) There is no "collect" command used in ***the group of commands*** (refer to the photo)
2) The merging into one row happened after summary index

The group of commands came from performing lookup on company.csv from vulnerability_index:
(imaginary tables, but the concept is the same -  I can't use the real data/field)

index=vulnerability_index
| lookup company.csv ip as ip OUPUTNEW ip, company, location

vulnerability_index

ipvulnerability
1.1.1.1vuln1
1.1.1.2vuln2


 company.csv

companyiplocation
companyA1.1.1.1locationA1
companyA1.1.1.1locationA2


Results:    
Note that there is "enter" / "Carriage Return"  between companyA and companyA (in not same row)

companyiplocation
companyA
companyA
1.1.1.1locationA1
locationA2


After moving ***the group of commands*** into summary index and used summary index..  it merged the company into one row:   companyA companyA (and also locationA1 locationA2)

companyiplocation
companyA companyA1.1.1.1locationA1 locationA2



Is this a normal behavior for summary index?
If yes, is there a way to keep the regular format?

Thank you!!




0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok. We need to get the terminology straight.

1. There is no such thing as "summary index" as a type of index. Splunk has only two types of indexes - events and metrics. You can have a summary index as an index which gets your summaries but it's purely organizational issue.

1a. You can have have both summary events and any other kind of events in the same index.

2. There is summary indexing meaning a process in which you generate data which is saved into your indexes for summarizing purposes.

3. There is no such thing as commands in the index. Searches can read from an index and write to them but they are not in an index.

So you're either using collect explicitly or it's done implicitly as a result of summary indexing option in scheduled search.

4. Indexes just hold data. They don't do anything with it. The data is either permanently transformed before being written to the index (that's what happens when the data is collected to the summary index) or is being dynamically transformed on read according to sourcetype/source/host definition (in your case it's the definition for the stash sourcetype). Index has nothing to do with it.

The summary indexing in the scheduled search works the same way as the collect command does - the results are getting written to an intermediate csv file from which they are ingested into the destination index. But here you can't decide on the details as you can do with the manually spawned collect command. So either fiddle with the configuration described in the article I linked (might work, might not; haven't tried it myself), manually split the results on search (but that might be problematic if you have spaces in your field values; in such case you could try to delimit multivalued fields differently before collecting) or split your events so that you don't have multivalued fields before collecting the summaries.

LearningGuy
Builder

Hello,
Thanks for your assistance. I will accept your solution. Can you also comment below?  
The *** groups of commands***, I meant ** group of searches*** , will use this term moving forward

When I checked "enable summary indexing" on a scheduled report, it automatically appended the following statement at the end of the searches
| summaryindex spool=t uselb=t addtime=t index="summary" file="[filename].stash_new" name="test_ip" marker="hostname=\"https://test.com/\",report=\"test_ip\""

index=summary  report=test_ip | dedup sourcetype
sourcetype is stash, while the original sourcetype is syslog

I read the link you sent, it states that if I change the sourcetype, it will incur license usage: 
sourcetypeSyntax: sourcetype=<string>Description: The name of the source type that you want to specify for the events. By specifying a sourcetype outside of stash, you will incur license usage.This option is not valid when output_format=hec.Default: stash

The solution you suggested is:
split the events so it won't have multivalues before the summary index..

Or can I split multivalues after summary index?

Thanks

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The "summaryindex" command is just an alias for "collect" command (I told you you're using that command, didn't I? 😉)

But seriously - yes, summary indexing is a way of producing synthetic events containing some pre-aggregated values so you can later rely on those values instead of calculating the statistics from the raw data.

So the idea is that you produce some set of pre-calculated fields which will be stored in the summary index in a predefined format - that's why you use the stash sourcetype and that's why this sourcetype does not incur any additional license usage.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Index as such doesn't do anything with the data. It just stores it.

So if your data is transformed somehow it's up to your searches which generate the summaries - you have to seek there for answers.

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...