Splunk Search

mvexpand memory issue

mbasharat
Builder

Hi,

I have very large dataset that appears as multivalued as below:

 

 

| makeresults

| eval data1="Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10"
| eval data2="LAPTOP PC,SERVER,APPLIANCE,DESKTOP,ROUTER,SWITCH"

| makemv delim="," data1
| makemv delim="," data2

 

 

When I try to use mvexpand using below technique, it exceeds default memory limit of 500MB. I did increase it to 5000MB but problem remains. I did try to limit _raw and multiple other techniques but to no avail. I can not use stats because cardinality is high as I have more fields, ~ 8 fields to expand. Below is a great solution which I have been using at many places for smaller dataset but not for larger dataset. For example, my saved search produces ~6000 records. When those records are to be expanded, I am expecting them to be around 30,000 rows. I need output to be like below but without mvexpand so we do not have to worry about mvexpand memory limits yet still get complete dataset as needed. Is this possible?

 

 

| makeresults

| eval data1="Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10"
| eval data2="LAPTOP PC,SERVER,APPLIANCE,DESKTOP,ROUTER,SWITCH"

| makemv delim="," data1
| makemv delim="," data2

| eval Asset_Category=mvrange(0,mvcount(data1))
| mvexpand Asset_Category
| eval data1=mvindex(data1,Asset_Category)
| eval data2=mvindex(data2,Asset_Category)

 

 

 

Thanks in-advance!!!!

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Are you able to re-run yesterdays by setting the earliest and latest times? Does it still work?

As I said, if you have any events where the multivalue field (CATEGORY) has zero elements, the process as it currently stands breaks down and will give the duplicate values error. Can you run the search up to the point that the mvexpand would take place and then do a mvcount(CATEGORY) to see if there are any events where this is zero?

Btw, good catch on the sort 0 (and increasing your limits)

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

By the way, instead of 

| sort 0 - subrow

you could try 

| reverse

which might be quicker and not have the same restrictions as sort

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Depending on what fields you absolutely need AFTER the mvexpand, try first to remove any fields you will not use after you have expanded the events. If you are working with the raw events but also have fields extracted, then remove _raw if you do not need it any more.

From your question, I understand that you want to add expand the rows, so there is one row for each of the 

Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10

attributes.

You should also NOT create the multi value fields BEFORE you mvexpand as Splunk then has to expand all those fields too.

Do this AFTER the expansion, e.g.

| makeresults

| eval Asset_Category=mvrange(0,mvcount(split("Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10", ","))
| mvexpand Asset_Category
| eval data1=mvindex(split("Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10", ","),Asset_Category)
| eval data2=mvindex(split("LAPTOP PC,SERVER,APPLIANCE,DESKTOP,ROUTER,SWITCH", ","),Asset_Category)

There is no point in expanding all the multi-value fields, as all you are trying to do is to get a SINGLE value corresponding to Asset_Category in each of the expanded events

 

0 Karma

mbasharat
Builder

Hi @bowesmana ,

I have tried all that but to no avail. See my search below:

index=nessus sourcetype="tenable:sc:vuln" severity!=informational ip=* dnsName=*

| fields - index, source, sourcetype, splunk_server, splunk_server_group, host, eventtype, field, linecount, punct, tag, tag::eventtype, _raw

| fields ip, dnsName, firstSeen, plugin_id, cve, cvss2_base_score, cvssV3BaseScore, severity, vprScore, state

| makemv delim="," cve

| stats

dc(ip) AS Affected_IP_Count,
dc(dnsName) AS Affected_DNS_Count,
latest(firstSeen) AS First_Seen_Date,
latest(state) AS State,
latest(vprScore) AS VPR_Score,
latest(severity) AS Tenable_Severity,
latest(cvss2_base_score) AS cvss2_base_score,
latest(cvssV3BaseScore) AS cvssV3BaseScore

by plugin_id, cve

| lookup vuln_affected_entities_info CVE AS cve OUTPUT Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

| eval Asset_Category=mvrange(0,mvcount(CATEGORY))
| mvexpand Asset_Category
| eval Scan_Type=mvindex(Scan_Type,Asset_Category)
| eval CATEGORY=mvindex(CATEGORY,Asset_Category)
| eval KISAM_OS_Grouped=mvindex(KISAM_OS_Grouped,Asset_Category)
| eval Nessus_OS_Grouped=mvindex(Nessus_OS_Grouped,Asset_Category)
| eval EOPS_Assignment_Group=mvindex(EOPS_Assignment_Group,Asset_Category)
| eval Responsible_Organization=mvindex(Responsible_Organization,Asset_Category)
| eval KISAM_GSS=mvindex(KISAM_GSS,Asset_Category)
| eval Project_Type=mvindex(Project_Type,Asset_Category)
| eval KISAM_Projects=mvindex(KISAM_Projects,Asset_Category)
| eval UCMDB_Environment=mvindex(UCMDB_Environment,Asset_Category)
| eval Boundary_Name=mvindex(Boundary_Name,Asset_Category)
| eval Boundary_System=mvindex(Boundary_System,Asset_Category)

| rename cvss2_base_score AS CVSS_Score_v2, cvssV3BaseScore AS CVSS_Score_v3, cve AS CVE, plugin_id AS Plugin_ID

| lookup cvss_score_v2_nvd_severity CVSS_Score_v2 AS CVSS_Score_v2 OUTPUT NVD_Severity_v2
| lookup cvss_score_v3_nvd_severity    CVSS_Score_v3 AS CVSS_Score_v3 OUTPUT NVD_Severity_v3

| eval First_Seen_Date=strftime(First_Seen_Date,"%m/%d/%y %H:%M:%S")

| eval Age_Days=ceiling((now()-strptime(First_Seen_Date,"%m/%d/%y"))/86400)
| eval CVE_Age=case(
Age_Days<30,"A_0 to 29 Days",
Age_Days>=365,"G_365 Days+",
Age_Days>=180,"F_180 to 364 Days",
Age_Days>=120,"E_120 to 179 Days",
Age_Days>=90,"D_90 to 119 Days",
Age_Days>=60,"C_60 to 89 Days",
Age_Days>=30,"B_30 to 59 Days",
0==0,"H_No Age Data")

| eval Remediation_Status=case(
State="fixed","Remediated",
State="open","Not_Remediated",
State="reopened","Not_Remediated")

| eval Time_to_Remediation_Days=if(Remediation_Status=="Remediated",'Age_Days',"Not_Remediated_Yet")

| eval CVE_Age_Remediation=case(
Time_to_Remediation_Days<30,"A_0 to 29 Days",
Time_to_Remediation_Days>=365,"G_365 Days+",
Time_to_Remediation_Days>=180,"F_180 to 364 Days",
Time_to_Remediation_Days>=120,"E_120 to 179 Days",
Time_to_Remediation_Days>=90,"D_90 to 119 Days",
Time_to_Remediation_Days>=60,"C_60 to 89 Days",
Time_to_Remediation_Days>=30,"B_30 to 59 Days",
0==0,"H_Not Remediated Yet")

| eval Tenable_Severity = upper(Tenable_Severity)

| fields Plugin_ID, CVE, CVSS_Score_v2, CVSS_Score_v3, NVD_Severity_v2, NVD_Severity_v3, Tenable_Severity, VPR_Score, First_Seen_Date, Age_Days, CVE_Age, State, Remediation_Status, Time_to_Remediation_Days, CVE_Age_Remediation, Affected_IP_Count, Affected_DNS_Count, Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@mbasharat 

It looks like you are still expanding all the multivalue fields. Your lookup is BEFORE the mvexpand and from the eval statements following, those are the fields causing the memory overflow.

If you place the lookup AFTER the mvexpand like this

| lookup vuln_affected_entities_info CVE AS cve OUTPUT CATEGORY
| eval Asset_Category=mvrange(0,mvcount(CATEGORY))
| mvexpand Asset_Category

| lookup vuln_affected_entities_info CVE AS cve OUTPUT Scan_Type,  KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

| eval Scan_Type=mvindex(Scan_Type,Asset_Category)
| eval CATEGORY=mvindex(CATEGORY,Asset_Category)
| eval KISAM_OS_Grouped=mvindex(KISAM_OS_Grouped,Asset_Category)
| eval Nessus_OS_Grouped=mvindex(Nessus_OS_Grouped,Asset_Category)
| eval EOPS_Assignment_Group=mvindex(EOPS_Assignment_Group,Asset_Category)
| eval Responsible_Organization=mvindex(Responsible_Organization,Asset_Category)
| eval KISAM_GSS=mvindex(KISAM_GSS,Asset_Category)
| eval Project_Type=mvindex(Project_Type,Asset_Category)
| eval KISAM_Projects=mvindex(KISAM_Projects,Asset_Category)
| eval UCMDB_Environment=mvindex(UCMDB_Environment,Asset_Category)
| eval Boundary_Name=mvindex(Boundary_Name,Asset_Category)
| eval Boundary_System=mvindex(Boundary_System,Asset_Category)

So, you do a simple lookup to get CATEGORY to get the count to expand, then do the expansion followed by a lookup that will then give you all the MV fields from the lookup, which you then mvindex out.

This is a very simple change that avoids mvexpand having to add all the MV fields. Adding those fields after the expansion achieves the same thing but doesn't hit the memory limits of mvexpand.

 

mbasharat
Builder

HI @bowesmana 

Just tried and same issue! 😞

0 Karma

mbasharat
Builder

Hi @bowesmana,

I am also testing your providing solution parallel. Will report back shortly. TY!

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...