I have a query that extracts useful info from a storage system report.
rex "quota list --verbose (?<fs>[A-Z0-9_]+) " | rex max_match=1000 "ViVol: (?<vivol>(?!user)[A-Za-z0-9]+)\nUsage\s+:\s+(?<usage>[0-9. A-Za-z]+)\n\s*Limit\s+:\s+(?<limit>[0-9A-Z. ]+)" | table fs, vivol, usage, limit
There is a single line at the start of the report with the filesystem which I extract as the "fs" field. Then there are several volume descriptions containing separate lines for the volume, usage and limit.
This query produces a single-value field for "fs" then three multi-value fields "vivol", "usage" and "limit". e.g.
fs vivol usage limit
VOL_XYZ 320 800
VOL_123 50 150
When I export this to Excel (using CSV) the multi-value fields are all within a single cell. I want them on separate rows. If I use mvexpand I get the unexpected behaviour that it will properly expand one field but leave the others unexpanded. If I expand all three fields they lose correlation so I get rows that are mixed-up.
FIRST_FS VOL_123 320 300
How do I turn my three multi-value fields into tuples? I want to keep them together so the first row in "vivol" matches the first rows in "usage" and "limit". Bear in mind there are many "fs" events (about 100 of them).
Combine the corresponding values with mvzip, then mvexpand, and extract the fields.
| eval combined_data=mvzip(mvzip(vivol,usage,"|"),limit,"|")
| mvexpand combined_data
| rex field=combined_data "^(?<vivol>[^|]*)\|(?<usage>[^|]*)\|(?<limit>[^|]*)"
| makeresults
| eval x="another_single_value_field"
| eval f1=split("a1,a2,a3",",")
| eval f2=split("b1,b2,b3",",")
| eval f3=split("c1,c2,c3",",")
| eval f4=split("d1,d2,d3",",")
`comment(" this is solution multiple fields mvexpand ")`
`comment(" create counter ")`
| eval _counter=mvrange(0,mvcount(f1))
`comment(" prepare to batch ")`
| stats list(*) as * by _counter
`comment(" batch all fields except _counter ")`
| foreach *
[ eval <<FIELD>>=if(mvcount(<<FIELD>>)=1,<<FIELD>>,mvindex(<<FIELD>>,_counter))]
Mvexpand_memory_limits FREE
Hi @to4kawa
This method works perfectly if there is only one row. Would it be possible to make it work for multiple rows?
| makeresults | eval x="another_single_value_field" | eval f1=split("a1,a2,a3",",") | eval f2=split("b1,b2,b3",",") | eval f3=split("c1,c2,c3",",") | eval f4=split("d1,d2,d3",",")
| append
[ | makeresults | eval x="another_single_value_field" | eval f1=split("x1,y2,z3",",") | eval f2=split("x1,y2,z3",",") | eval f3=split("x1,y2,z3",",") | eval f4=split("x1,y2,z3",",") ]
maybe https://splunkbase.splunk.com/app/3936/ is of some use?
Here is another answer to this question:
Please upvote, if it helps.
I downvoted this post because .
Separate the data into lines (events):
| rex max_match=0 "(?<line>[^\)]+\)\n\N+)" | mvexpand line | table line
Next, do your extractions:
| rex field=line "quota list --verbose (?<fs>[A-Z0-9_]+) " | rex field=line max_match=1000 "ViVol: (?<vivol>(?!user)[A-Za-z0-9]+)\nUsage\s+:\s+(?<usage>[0-9. A-Za-z]+)\n\s*Limit\s+:\s+(?<limit>[0-9A-Z. ]+)" | table fs, vivol, usage, limit
Use the SPL command filldown:
| filldown fs
to get:
fs vivol usage limit
FIRST_FS VOL_123 50 150
then take it from there.
Updated regex a bit to select the values as per the example:
| rex field=line "quota list --verbose (?[A-Z0-9_]+) "
| rex field=line max_match=1000 "ViVol: (?(?!user)[A-Za-z0-9_]+)\nUsage\s+:\s+(?[0-9.]+)[A-Za-z\s\n]+Limit\s+:\s+(?[0-9]+)[A-Za-z\s+()]+"
| table fs, vivol, usage, limit
Use mvzip, makemv and then reset the fields based on index.
First, mvzip the multi-values into a new field:
| eval reading=mvzip(vivol, usage) // create multi-value field for reading
| eval reading=mvzip(reading, limit) // add the third field
At this point you'll have a multi-value field called reading. Here's an example of a field value (a list of four items):
"VOL_ABC,100,300", "VOL_XYZ,320,800", "VOL_123, 50,150", "VOL_FOO, 80,120"
Expand the field and restore the values:
| mvexpand reading // separate multi-value into into separate events
| makemv reading delim="," // convert the reading into a multi-value
| eval vivol=mvindex(reading, 0) // set vivol to the first value of reading
| eval usage=mvindex(reading, 1) // set usage to the second value of reading
| eval limit=mvindex(reading, -1) // set limit to the last value of reading
Here is another solution to this problem:
Thanks , It helped me so much 🙂 .
Assuming that all the mv fields MUST have the same number of items...
| eval myFan=mvrange(0,mvcount(vivol))
| mvexpand myFan
| eval vivol=mvindex(vivol,myFan)
| eval usage=mvindex(usage,myFan)
| eval limit=mvindex(limit,myFan)
Thanks a lot!
Hi DalJeanis,
your solution is ingenious.
Thanks so much!
Seriously this is a great help
Thanks @sk314. To be fair, this question was left unanswered for four years and 35 hours. Some improvements have been made to the docs since this answer, but this example is still better, IMO.
This is so great. I am writing this comment (and upvoting) AFTER searching for this answer and using it for the third time. Quite ungrateful. 😕
this worked for some JSON data I had where I needed to preserve relationships among elements of an array
Very helpful, thanks. I ended up with a completed search that did exactly what I wanted using the above stuff.
source="/Znfs200g/Mainframe/splunk/volSpaceReport.txt" | rex max_match=0 "(?:PRIVATE\s+)(?\d+)\s+(?\d+)" | eval my_zip=mvzip(vol,vol_pct) | mvexpand my_zip | makemv my_zip delim="," | eval vol=mvindex(my_zip,0) | eval vol_pct=mvindex(my_zip,1) | eventstats sum(vol) as vol_sum | eval weighted_vol_pct=(vol_pct*vol/vol_sum) | stats sum(weighted_vol_pct) as Average_HardDisk_Utilization
I ran into the same issue with two multi-valued fields, and arrived at a different solution - make a copy of the field to preserve the order for an mvfind, then use mvexpand, look up the value in the added field, lookup each field that was NOT expanded, then drop the added field. It would look something like:
...| eval vivolIndex=vivol | mvexpand vivol | eval idx=mvfind(vivolIndex,vivol) | eval usage=mvindex(usage,idx) | eval limit=mvindex(limit,idx) | fields - vivolIndex ...
This solution worked better for me as I was using a stats list(x) list(y) and needed to keep the values correlated.