Re: mvexpand memory issue

mbasharat · ‎04-18-2021

Hi,

I have very large dataset that appears as multivalued as below:

| makeresults

| eval data1="Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10"
| eval data2="LAPTOP PC,SERVER,APPLIANCE,DESKTOP,ROUTER,SWITCH"

| makemv delim="," data1
| makemv delim="," data2

When I try to use mvexpand using below technique, it exceeds default memory limit of 500MB. I did increase it to 5000MB but problem remains. I did try to limit _raw and multiple other techniques but to no avail. I can not use stats because cardinality is high as I have more fields, ~ 8 fields to expand. Below is a great solution which I have been using at many places for smaller dataset but not for larger dataset. For example, my saved search produces ~6000 records. When those records are to be expanded, I am expecting them to be around 30,000 rows. I need output to be like below but without mvexpand so we do not have to worry about mvexpand memory limits yet still get complete dataset as needed. Is this possible?

| makeresults

| eval data1="Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10"
| eval data2="LAPTOP PC,SERVER,APPLIANCE,DESKTOP,ROUTER,SWITCH"

| makemv delim="," data1
| makemv delim="," data2

| eval Asset_Category=mvrange(0,mvcount(data1))
| mvexpand Asset_Category
| eval data1=mvindex(data1,Asset_Category)
| eval data2=mvindex(data2,Asset_Category)

Thanks in-advance!!!!

njmott · ‎04-22-2021

This is a test.

ITWhisperer · ‎04-19-2021

Try something like this

| makeresults

| eval data1="Windows_7,Unknown,Windows_2012,Windows_7,Windows_8,Windows_10"
| eval data2="LAPTOP PC,SERVER,APPLIANCE,DESKTOP,ROUTER,SWITCH"
| append 
    [| makeresults

| eval data1="Unknown,Windows_2012,Windows_7,Windows_8,Windows_10"
| eval data2="LAPTOP PC,APPLIANCE,ROUTER,SWITCH,DESKTOP"
]
| makemv delim="," data1
| makemv delim="," data2
| streamstats count as row
| eval subrows=mvcount(data1)
| streamstats sum(subrows) as subrow
| eval topsubrow=subrow
| append
    [| makeresults
    | eval subrow=1
    ]
| fields - _time
| makecontinuous subrow
| sort - subrow
| filldown
| eval data1=mvindex(data1,topsubrow-subrow)
| eval data2=mvindex(data2,topsubrow-subrow)

I added an extra event to show it working across multiple events. The only gotcha is if the first event has only one multi-value, but this can gotten around with a bit more SPL.

If the order of the subrows is important/needs to be preserved, you just need to do an extra calculation with the mvindex.

mbasharat · ‎04-19-2021

Hi @ITWhisperer,

Thanks to both you and @bowesmana . I have attached my search below, can you please check and assist in tweaking based on your above response?

Thanks in -advance!!!!

index=nessus sourcetype="tenable:sc:vuln" severity!=informational ip=* dnsName=*

| fields - index, source, sourcetype, splunk_server, splunk_server_group, host, eventtype, field, linecount, punct, tag, tag::eventtype, _raw

| fields ip, dnsName, firstSeen, plugin_id, cve, cvss2_base_score, cvssV3BaseScore, severity, vprScore, state

| makemv delim="," cve

| stats

dc(ip) AS Affected_IP_Count,
dc(dnsName) AS Affected_DNS_Count,
latest(firstSeen) AS First_Seen_Date,
latest(state) AS State,
latest(vprScore) AS VPR_Score,
latest(severity) AS Tenable_Severity,
latest(cvss2_base_score) AS cvss2_base_score,
latest(cvssV3BaseScore) AS cvssV3BaseScore

by plugin_id, cve

| lookup vuln_affected_entities_info CVE AS cve OUTPUT Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

| eval Asset_Category=mvrange(0,mvcount(CATEGORY))
| mvexpand Asset_Category
| eval Scan_Type=mvindex(Scan_Type,Asset_Category)
| eval CATEGORY=mvindex(CATEGORY,Asset_Category)
| eval KISAM_OS_Grouped=mvindex(KISAM_OS_Grouped,Asset_Category)
| eval Nessus_OS_Grouped=mvindex(Nessus_OS_Grouped,Asset_Category)
| eval EOPS_Assignment_Group=mvindex(EOPS_Assignment_Group,Asset_Category)
| eval Responsible_Organization=mvindex(Responsible_Organization,Asset_Category)
| eval KISAM_GSS=mvindex(KISAM_GSS,Asset_Category)
| eval Project_Type=mvindex(Project_Type,Asset_Category)
| eval KISAM_Projects=mvindex(KISAM_Projects,Asset_Category)
| eval UCMDB_Environment=mvindex(UCMDB_Environment,Asset_Category)
| eval Boundary_Name=mvindex(Boundary_Name,Asset_Category)
| eval Boundary_System=mvindex(Boundary_System,Asset_Category)

| rename cvss2_base_score AS CVSS_Score_v2, cvssV3BaseScore AS CVSS_Score_v3, cve AS CVE, plugin_id AS Plugin_ID

| lookup cvss_score_v2_nvd_severity CVSS_Score_v2 AS CVSS_Score_v2 OUTPUT NVD_Severity_v2
| lookup cvss_score_v3_nvd_severity    CVSS_Score_v3 AS CVSS_Score_v3 OUTPUT NVD_Severity_v3

| eval First_Seen_Date=strftime(First_Seen_Date,"%m/%d/%y %H:%M:%S")

| eval Age_Days=ceiling((now()-strptime(First_Seen_Date,"%m/%d/%y"))/86400)
| eval CVE_Age=case(
Age_Days<30,"A_0 to 29 Days",
Age_Days>=365,"G_365 Days+",
Age_Days>=180,"F_180 to 364 Days",
Age_Days>=120,"E_120 to 179 Days",
Age_Days>=90,"D_90 to 119 Days",
Age_Days>=60,"C_60 to 89 Days",
Age_Days>=30,"B_30 to 59 Days",
0==0,"H_No Age Data")

| eval Remediation_Status=case(
State="fixed","Remediated",
State="open","Not_Remediated",
State="reopened","Not_Remediated")

| eval Time_to_Remediation_Days=if(Remediation_Status=="Remediated",'Age_Days',"Not_Remediated_Yet")

| eval CVE_Age_Remediation=case(
Time_to_Remediation_Days<30,"A_0 to 29 Days",
Time_to_Remediation_Days>=365,"G_365 Days+",
Time_to_Remediation_Days>=180,"F_180 to 364 Days",
Time_to_Remediation_Days>=120,"E_120 to 179 Days",
Time_to_Remediation_Days>=90,"D_90 to 119 Days",
Time_to_Remediation_Days>=60,"C_60 to 89 Days",
Time_to_Remediation_Days>=30,"B_30 to 59 Days",
0==0,"H_Not Remediated Yet")

| eval Tenable_Severity = upper(Tenable_Severity)

| fields Plugin_ID, CVE, CVSS_Score_v2, CVSS_Score_v3, NVD_Severity_v2, NVD_Severity_v3, Tenable_Severity, VPR_Score, First_Seen_Date, Age_Days, CVE_Age, State, Remediation_Status, Time_to_Remediation_Days, CVE_Age_Remediation, Affected_IP_Count, Affected_DNS_Count, Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

ITWhisperer · ‎04-19-2021

Try something like this:

index=nessus sourcetype="tenable:sc:vuln" severity!=informational ip=* dnsName=*

| fields - index, source, sourcetype, splunk_server, splunk_server_group, host, eventtype, field, linecount, punct, tag, tag::eventtype, _raw

| fields ip, dnsName, firstSeen, plugin_id, cve, cvss2_base_score, cvssV3BaseScore, severity, vprScore, state

| makemv delim="," cve

| stats

dc(ip) AS Affected_IP_Count,
dc(dnsName) AS Affected_DNS_Count,
latest(firstSeen) AS First_Seen_Date,
latest(state) AS State,
latest(vprScore) AS VPR_Score,
latest(severity) AS Tenable_Severity,
latest(cvss2_base_score) AS cvss2_base_score,
latest(cvssV3BaseScore) AS cvssV3BaseScore

by plugin_id, cve

| lookup vuln_affected_entities_info CVE AS cve OUTPUT Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

| eval subrows=mvcount(CATEGORY)
| streamstats sum(subrows) as subrow
| eval topsubrow=subrow
| append
    [| makeresults
    | eval subrow=1
    ]
| fields - _time
| makecontinuous subrow
| sort - subrow
| filldown
| eval Asset_Category=topsubrow-subrow

| eval Scan_Type=mvindex(Scan_Type,Asset_Category)
| eval CATEGORY=mvindex(CATEGORY,Asset_Category)
| eval KISAM_OS_Grouped=mvindex(KISAM_OS_Grouped,Asset_Category)
| eval Nessus_OS_Grouped=mvindex(Nessus_OS_Grouped,Asset_Category)
| eval EOPS_Assignment_Group=mvindex(EOPS_Assignment_Group,Asset_Category)
| eval Responsible_Organization=mvindex(Responsible_Organization,Asset_Category)
| eval KISAM_GSS=mvindex(KISAM_GSS,Asset_Category)
| eval Project_Type=mvindex(Project_Type,Asset_Category)
| eval KISAM_Projects=mvindex(KISAM_Projects,Asset_Category)
| eval UCMDB_Environment=mvindex(UCMDB_Environment,Asset_Category)
| eval Boundary_Name=mvindex(Boundary_Name,Asset_Category)
| eval Boundary_System=mvindex(Boundary_System,Asset_Category)

| rename cvss2_base_score AS CVSS_Score_v2, cvssV3BaseScore AS CVSS_Score_v3, cve AS CVE, plugin_id AS Plugin_ID

| lookup cvss_score_v2_nvd_severity CVSS_Score_v2 AS CVSS_Score_v2 OUTPUT NVD_Severity_v2
| lookup cvss_score_v3_nvd_severity    CVSS_Score_v3 AS CVSS_Score_v3 OUTPUT NVD_Severity_v3

| eval First_Seen_Date=strftime(First_Seen_Date,"%m/%d/%y %H:%M:%S")

| eval Age_Days=ceiling((now()-strptime(First_Seen_Date,"%m/%d/%y"))/86400)
| eval CVE_Age=case(
Age_Days<30,"A_0 to 29 Days",
Age_Days>=365,"G_365 Days+",
Age_Days>=180,"F_180 to 364 Days",
Age_Days>=120,"E_120 to 179 Days",
Age_Days>=90,"D_90 to 119 Days",
Age_Days>=60,"C_60 to 89 Days",
Age_Days>=30,"B_30 to 59 Days",
0==0,"H_No Age Data")

| eval Remediation_Status=case(
State="fixed","Remediated",
State="open","Not_Remediated",
State="reopened","Not_Remediated")

| eval Time_to_Remediation_Days=if(Remediation_Status=="Remediated",'Age_Days',"Not_Remediated_Yet")

| eval CVE_Age_Remediation=case(
Time_to_Remediation_Days<30,"A_0 to 29 Days",
Time_to_Remediation_Days>=365,"G_365 Days+",
Time_to_Remediation_Days>=180,"F_180 to 364 Days",
Time_to_Remediation_Days>=120,"E_120 to 179 Days",
Time_to_Remediation_Days>=90,"D_90 to 119 Days",
Time_to_Remediation_Days>=60,"C_60 to 89 Days",
Time_to_Remediation_Days>=30,"B_30 to 59 Days",
0==0,"H_Not Remediated Yet")

| eval Tenable_Severity = upper(Tenable_Severity)

| fields Plugin_ID, CVE, CVSS_Score_v2, CVSS_Score_v3, NVD_Severity_v2, NVD_Severity_v3, Tenable_Severity, VPR_Score, First_Seen_Date, Age_Days, CVE_Age, State, Remediation_Status, Time_to_Remediation_Days, CVE_Age_Remediation, Affected_IP_Count, Affected_DNS_Count, Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

mbasharat · ‎04-20-2021

Hi @ITWhisperer

In addition to previous error I mentioned, I am also seeing stripped results and below error:

The specified span would result in too many (>250000) rows.

mbasharat · ‎04-20-2021

Hi @ITWhisperer ,

Your provided solution works. However, I am getting below error:

Unexpected duplicate values in field 'subrow' have been detected.

ITWhisperer · ‎04-20-2021

This is because the first row has only a single value in the multi-value field which is the gotcha I mentioned earlier. I will try and dig out the solution to that and repost.

mbasharat · ‎04-20-2021

Thank you!!! @ITWhisperer. Awaiting response.

So still need complete dataset and no results truncation pls.

ITWhisperer · ‎04-20-2021

Sounds like you need to increase your limits.

ITWhisperer · ‎04-20-2021

OK I added a couple of lines before the makecontinuous to effectively remove the additional event if the mvcount for the first original event is 1

index=nessus sourcetype="tenable:sc:vuln" severity!=informational ip=* dnsName=*

| fields - index, source, sourcetype, splunk_server, splunk_server_group, host, eventtype, field, linecount, punct, tag, tag::eventtype, _raw

| fields ip, dnsName, firstSeen, plugin_id, cve, cvss2_base_score, cvssV3BaseScore, severity, vprScore, state

| makemv delim="," cve

| stats

dc(ip) AS Affected_IP_Count,
dc(dnsName) AS Affected_DNS_Count,
latest(firstSeen) AS First_Seen_Date,
latest(state) AS State,
latest(vprScore) AS VPR_Score,
latest(severity) AS Tenable_Severity,
latest(cvss2_base_score) AS cvss2_base_score,
latest(cvssV3BaseScore) AS cvssV3BaseScore

by plugin_id, cve

| lookup vuln_affected_entities_info CVE AS cve OUTPUT Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

| eval subrows=mvcount(CATEGORY)
| streamstats sum(subrows) as subrow
| eval topsubrow=subrow
| append
    [| makeresults
    | eval subrow=1
    ]
| fields - _time
| eventstats min(topsubrow) as firsttop
| where isnotnull(topsubrow) or topsubrow != firsttop
| makecontinuous subrow
| sort - subrow
| filldown
| eval Asset_Category=topsubrow-subrow

| eval Scan_Type=mvindex(Scan_Type,Asset_Category)
| eval CATEGORY=mvindex(CATEGORY,Asset_Category)
| eval KISAM_OS_Grouped=mvindex(KISAM_OS_Grouped,Asset_Category)
| eval Nessus_OS_Grouped=mvindex(Nessus_OS_Grouped,Asset_Category)
| eval EOPS_Assignment_Group=mvindex(EOPS_Assignment_Group,Asset_Category)
| eval Responsible_Organization=mvindex(Responsible_Organization,Asset_Category)
| eval KISAM_GSS=mvindex(KISAM_GSS,Asset_Category)
| eval Project_Type=mvindex(Project_Type,Asset_Category)
| eval KISAM_Projects=mvindex(KISAM_Projects,Asset_Category)
| eval UCMDB_Environment=mvindex(UCMDB_Environment,Asset_Category)
| eval Boundary_Name=mvindex(Boundary_Name,Asset_Category)
| eval Boundary_System=mvindex(Boundary_System,Asset_Category)

| rename cvss2_base_score AS CVSS_Score_v2, cvssV3BaseScore AS CVSS_Score_v3, cve AS CVE, plugin_id AS Plugin_ID

| lookup cvss_score_v2_nvd_severity CVSS_Score_v2 AS CVSS_Score_v2 OUTPUT NVD_Severity_v2
| lookup cvss_score_v3_nvd_severity    CVSS_Score_v3 AS CVSS_Score_v3 OUTPUT NVD_Severity_v3

| eval First_Seen_Date=strftime(First_Seen_Date,"%m/%d/%y %H:%M:%S")

| eval Age_Days=ceiling((now()-strptime(First_Seen_Date,"%m/%d/%y"))/86400)
| eval CVE_Age=case(
Age_Days<30,"A_0 to 29 Days",
Age_Days>=365,"G_365 Days+",
Age_Days>=180,"F_180 to 364 Days",
Age_Days>=120,"E_120 to 179 Days",
Age_Days>=90,"D_90 to 119 Days",
Age_Days>=60,"C_60 to 89 Days",
Age_Days>=30,"B_30 to 59 Days",
0==0,"H_No Age Data")

| eval Remediation_Status=case(
State="fixed","Remediated",
State="open","Not_Remediated",
State="reopened","Not_Remediated")

| eval Time_to_Remediation_Days=if(Remediation_Status=="Remediated",'Age_Days',"Not_Remediated_Yet")

| eval CVE_Age_Remediation=case(
Time_to_Remediation_Days<30,"A_0 to 29 Days",
Time_to_Remediation_Days>=365,"G_365 Days+",
Time_to_Remediation_Days>=180,"F_180 to 364 Days",
Time_to_Remediation_Days>=120,"E_120 to 179 Days",
Time_to_Remediation_Days>=90,"D_90 to 119 Days",
Time_to_Remediation_Days>=60,"C_60 to 89 Days",
Time_to_Remediation_Days>=30,"B_30 to 59 Days",
0==0,"H_Not Remediated Yet")

| eval Tenable_Severity = upper(Tenable_Severity)

| fields Plugin_ID, CVE, CVSS_Score_v2, CVSS_Score_v3, NVD_Severity_v2, NVD_Severity_v3, Tenable_Severity, VPR_Score, First_Seen_Date, Age_Days, CVE_Age, State, Remediation_Status, Time_to_Remediation_Days, CVE_Age_Remediation, Affected_IP_Count, Affected_DNS_Count, Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

mbasharat · ‎04-20-2021

Hi @ITWhisperer

Same error + truncated results/rows.

Unexpected duplicate values in field 'subrow' have been detected.

ITWhisperer · ‎04-20-2021

Do you have any events where CATEGORY is null? This would cause a problem. If so, presumably these events can be removed?

| where isnotnull(CATEGORY)
| eval subrows=mvcount(CATEGORY)

mbasharat · ‎04-20-2021

I do have NULLs in the data but they have to be included. If it helps, see attached screenshot. Thats how fields from lookup get output.

ITWhisperer · ‎04-20-2021

The issue would be if specifically CATEGORY is null, which it isn't in your screenshot. Can you try this to see if you get any events?

index=nessus sourcetype="tenable:sc:vuln" severity!=informational ip=* dnsName=*

| fields - index, source, sourcetype, splunk_server, splunk_server_group, host, eventtype, field, linecount, punct, tag, tag::eventtype, _raw

| fields ip, dnsName, firstSeen, plugin_id, cve, cvss2_base_score, cvssV3BaseScore, severity, vprScore, state

| makemv delim="," cve

| stats

dc(ip) AS Affected_IP_Count,
dc(dnsName) AS Affected_DNS_Count,
latest(firstSeen) AS First_Seen_Date,
latest(state) AS State,
latest(vprScore) AS VPR_Score,
latest(severity) AS Tenable_Severity,
latest(cvss2_base_score) AS cvss2_base_score,
latest(cvssV3BaseScore) AS cvssV3BaseScore

by plugin_id, cve

| lookup vuln_affected_entities_info CVE AS cve OUTPUT Scan_Type, CATEGORY, KISAM_OS_Grouped, Nessus_OS_Grouped, EOPS_Assignment_Group, Responsible_Organization, KISAM_GSS, Project_Type, KISAM_Projects, UCMDB_Environment, Boundary_Name, Boundary_System

| where isnull(CATEGORY)

If you do, what does that mean for the "mvexpand"?

mbasharat · ‎04-20-2021

Hi @ITWhisperer

It has some nulls/unknowns. This means that asset does not have a category but there is organization, gss, etc. so yes, there is other data as aforementioned which means I have to find a solution to replace mvexpand but still get all rows expanded.

ITWhisperer · ‎04-20-2021

Can you identify a different field which is always fully populated that could be used for the expand?

mbasharat · ‎04-20-2021

They all have Nulls but I have them filled with a value Null/Unknown/Unassigned already from another job. So in my provided search, blanks already have a value. It is in the screenshot.

mbasharat · ‎04-21-2021

Hi @ITWhisperer

So I changed two things.

> sort 0 on subrow

> increased row output limit to 1000000.

Re-ran the search and result sets came out complete. The today, just re-ran the same saved searche, and it started populating results fine but in few seconds, it truncated the results and I saw stats flipping from ~30k down to 4k with same error, Unexpected duplicate values in field "subrow"........BTW, I had excluded the eventstats piece because I needed all results.

ITWhisperer · ‎04-21-2021

The issue for this solution is where the field that the "mvexpand-like" process is being performed on has no values for one or more events. This is true even if you were using the real mvexpand, that is, if you mvexpand a field which has no entries for a particular event, this event gets removed. In order to emulate this behaviour, you could evaluate mvcount for the mvexpand field and remove all events which have no values. Alternatively, you could find another fields that does have a consistent mvcount for all the events you want to keep and use that instead. With the solution proposed, there is the potential to use more than one field. By this I mean do mvcounts on two or more fields and for each event take the maximum of these counts as the basis for the expansion. This is something that you can't do with mvexpand.

As for removing the eventstats, eventstats doesn't remove any events. The purpose of the eventstats in the middle of the process is to deal with the case of the first event only having one value in the mv field, since the append adds an event with topsubrow number 1 and the first event in this case would also have topsubrow number 1 (hence the duplicate). The where clause removes the appended event since it was only there to ensure that the sequence generated by makecontinuous starts with 1.

mbasharat · ‎04-21-2021

Hmmm, then the question becomes, why it ran perfectly fine yesterday with my mentioned 2 adjustments but not today?

mvexpand memory issue

eval

regex

stats

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Inside Event Intelligence: How ITSI Turns Network Alerts into Actionable Incidents

Observability Simplified: Combining User Experience, Application Performance & ...

Join the Conversation

mvexpand memory issue

eval

regex

stats

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Inside Event Intelligence: How ITSI Turns Network Alerts into Actionable Incidents

Observability Simplified: Combining User Experience, Application Performance & ...