Solved: Need to expand and create a table from one row to ...

Mrig342 · ‎12-17-2023

Hi All,

I got a logs like below and I need to create a table out of it.

<p align='center'><font size='4' color=blue>Disk Utilization for gcgnamslap in Asia Testing -10.100.158.51 </font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>Filesystem</th><th>Type</th><th>Blocks</th><th>Used %</th><th>Available %</th><th>Usage %</th><th>Mounted on</th></tr>
<tr><td> /dev/root </td><td> ext3 </td><td> 5782664 </td><td> 1807636 </td><td> 3674620 </td><td bgcolor=red> 33% </td><td> / </td></tr>
<tr><td> devtmpfs </td><td> devtmpfs </td><td> 15872628 </td><td> 0 </td><td> 15872628 </td><td bgcolor=white> 0% </td><td> /dev </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 10580284 </td><td> 5298356 </td><td bgcolor=red> 67% </td><td> /dev/shm </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 26984 </td><td> 15851656 </td><td bgcolor=white> 1% </td><td> /run </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 0 </td><td> 15878640 </td><td bgcolor=white> 0% </td><td> /sys/fs/cgroup </td></tr>
<tr><td> /dev/md1 </td><td> ext3 </td><td> 96922 </td><td> 36667 </td><td> 55039 </td><td bgcolor=red> 40% </td><td> /boot </td></tr>
<tr><td> /dev/md6 </td><td> ext3 </td><td> 62980468 </td><td> 28501072 </td><td> 31278452 </td><td bgcolor=red> 48% </td><td> /usr/sw </td></tr>
<tr><td> /dev/mapper/cp1 </td><td> ext4 </td><td> 1126568640 </td><td> 269553048 </td><td> 800534468 </td><td bgcolor=white> 26% </td><td> /usr/p1 </td></tr>
<tr><td> /dev/mapper/cp2 </td><td> ext4 </td><td> 1126568640 </td><td> 85476940 </td><td> 984610576 </td><td bgcolor=white> 8% </td><td> /usr/p2 </td></tr>
</table></body></html>

I used below query to get the table:

... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On | dedup Server,Region,Environment,Server_IP

And below is the table I am getting:

Server	Region	Environment	Server_IP	Filesystem	Type	Blocks	Used	Available	Usage	Mounted_On
gcgnamslap	Asia	Testing	10.100.158.51	/dev/root devtmpfs tmpfs tmpfs tmpfs /dev/md1 /dev/md6 /dev/mapper/p1 /dev/mapper/p2	ext3 devtmpfs tmpfs tmpfs tmpfs ext3 ext3 ext4 ext4	5782664 15872628 15878640 15878640 15878640 96922 62980468 1126568640 1126568640	1807636 0 10580284 26984 0 36667 28501072 269553048 85476940	3674620 15872628 5298356 15851656 15878640 55039 31278452 800534468 984610576	33 0 67 1 0 40 48 26 8	/ /dev /dev/shm /run /sys/fs/cgroup /boot /usr/sw /usr/p1 /usr/p2

Here, the fields Filesystem,Type,Blocks,Used,Available,Usage_Percent and Mounted_On are coming up in one row. I want the table to be separated according to the rows like below:

Server	Region	Environment	Server_IP	Filesystem	Type	Blocks	Used	Available	Usage	Mounted_On
gcgnamslap	Asia	Testing	10.100.158.51	/dev/root	ext3	5782664	1807636	3674620	33	/
gcgnamslap	Asia	Testing	10.100.158.51	devtmpfs	devtmpfs	15872628	0	15872628	0	/dev
gcgnamslap	Asia	Testing	10.100.158.51	tmpfs	tmpfs	15878640	10580284	5298356	67	/dev/shm
gcgnamslap	Asia	Testing	10.100.158.51	tmpfs	tmpfs	15878640	26984	15851656	1	/run
gcgnamslap	Asia	Testing	10.100.158.51	tmpfs	tmpfs	15878640	1807636	15878640	0	/sys/fs/cgroup
gcgnamslap	Asia	Testing	10.100.158.51	/dev/md1	ext3	96922	36667	55039	40	/boot
gcgnamslap	Asia	Testing	10.100.158.51	/dev/md6	ext3	62980468	28501072	31278452	48	/usr/sw
gcgnamslap	Asia	Testing	10.100.158.51	/dev/mapper/p1	ext4	1126568640	269553048	800534468	26	/usr/p1
gcgnamslap	Asia	Testing	10.100.158.51	/dev/mapper/p2	ext4	1126568640	85476940	984610576	8	/usr/p2

Please help to create a query to get the table in the above expected manner.

Your kind inputs are highly appreciated..!!

Thank You..!!

richgalloway · ‎12-17-2023

You can use the mvexpand command to separate the multi-value fields into separate rows. First, however, you must preserve the relation between the field values by converting them into single-value tuples. Do that using mvzip then break the tuples apart using split.

... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
``` Combine related values ```
| eval tuple = mvzip(Filesystem, mvzip(Type, mvzip(Blocks, mvzip(Used, mvzip(Available, mvzip(Usage, Mounted_On))))))
``` Create a new row for each tuple ```
| mvexpand tuple
``` Break the tuple apart ```
| eval tuple = split(tuple, ",")
| eval Filesystem = mvindex(tuple,0), Type = mvindex(tuple,1), Blocks = mvindex(tuple, 2), Used = mvindex(tuple,3), Available = mvindex(tuple, 4), Usage = mvindex(tuple, 5), Mounted_On = mvindex(tuple, 6)
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On 
| dedup Server,Region,Environment,Server_IP

---
If this reply helps you, Karma would be appreciated.

View solution in original post

dtburrows3 · ‎12-17-2023

I agree with @richgalloway response here for this particular example.
However if you want a more generalize approach to parsing out a table being ingested in Splunk with this format in _raw (capable of accounting for different headers and not having to build out a regex for each time) you may want to try the SPL below.

<base_search>

    ``` Doing this before an mvexpand occurring later in the search pipeline can help attribute the expanded data back to it's orginal event (if needed) ```
    | streamstats
        count as event_count
    ``` Extract Info from table title (this is the one line that would need to be adjusted for each unique table depending on title format and data you would want extracted from it ```
    | rex field=_raw "Disk\s+Utilization\s+for\s+(?<Server>[^\s]+)\s+in\s+(?<Region>[^\s]+)\s+(?<Environment>[^\s]+)\s+\-(?<Server_IP>[^\s]+)"
    ``` Parse individual rows from the table as a multivalued field ```
    | spath input=_raw path=table.tr output=data
    | fields - _raw
    ``` mvexpand each row to it's own event ```
    | mvexpand data
    ``` replace tags with standardized breakers for parsing ```
    | eval
        item_data=replace(replace(replace(replace('data', "\<\/t(?:h|d)\>\<t(?:h|d)[^\>]*\>", "<BREAK>"), "^\<t(?:h|d)[^\>]*\>", "<START>"), "\<\/t(?:h|d)\>", "<END>"), "(\<START\>|\<BREAK\>)(\<BREAK\>|\<END\>)", "\1#NULL#\2")
    ``` set a row count for each table ```
    | streamstats
        count as row_number
            by event_count
    ``` assign row type depending on the row number (assumes that the table has a header) ```
    | eval
        row_number='row_number'-1,
        row_type=if(
            'row_number'==0,
                "header",
                "data"
            ),
        ``` remove start and end tags ```
        data_mv=split(replace(replace(item_data, "\<START\>", ""), "\<END\>", ""), "<BREAK>"),
        ``` trim of any leading or tailing spaces from each entry in the table ```
        data_mv=case(
            mvcount(data_mv)==1, trim(data_mv, " "),
            mvcount(data_mv)>1, mvmap(data_mv, trim(data_mv, " "))
            )
    | fields - data, item_data
    ``` pull header fieldnames into each row of the table to stitch it all together ```
    | eventstats
        list(eval(case('row_type'=="header", 'data_mv'))) as header
            by event_count
    ``` mvzip to attribute each table entry to the header field (we will loop through this field to build the table further down the search pipeline) ```
    | eval
        data_mv_zip=mvzip(header, data_mv, "<DELIM>")
    | fields - header, data_mv
    ``` we no longer need the row that only holds the header info now that it has been pulled into each data entry row of the table ```
    | where NOT 'row_type'=="header"
    ``` initialize json object that will hold KV pairs of header_fieldname/row_entry ```
    | eval
        table_json=json_object()
    ``` loop through the MV field and build the json_object containing all KV assignments ```
    | foreach mode=multivalue data_mv_zip
        [
            | eval
                key=mvindex(split('<<ITEM>>', "<DELIM>"), 0),
                value=mvindex(split('<<ITEM>>', "<DELIM>"), 1),
                table_json=json_set(table_json, 'key', 'value')
            ]
    | fields - key, value, row_type, data_mv_zip
    ``` spath the json_object just built ```
    | spath input=table_json
    ``` remove the json_object now that we dont need it ```
    | fields - table_json
    ``` list out data in the order that you want it displayed ```
    | table _time, event_count, Server, Region, Environment, Server_IP, "Filesystem", "Type", "Blocks", "Used %", "Available %", "Usage %", "Mounted on"
    ``` remove any placeholder #NULL# values with actual null() values since table has been generated ```
    | foreach *
        [
            | eval
                <<FIELD>>=if(
                    '<<FIELD>>'=="#NULL#",
                        null(),
                        '<<FIELD>>'
                    )
            ]

You can see the output I got locally using this method below.

This method should also account for null values in a the html table as well, whereas the regex shared from your original post makes the assumption that table entries always have a leading and trailing space wrapping a non-null value. Which is true for this table you shared, but for generalizing parsing a html table you may run into issues doing it that way.

As a proof on concept here is some output of some simulation data I put together with 3 separate events, each with their own html table and sets of data with their own headers and piping them through this method.

Below is the SPL used to generate this screenshot.

| makeresults
    | eval
        _raw="<p align='center'><font size='4' color=blue>Example: Table 1</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_1</th><th>sample_field_2</th><th>sample_field_3</th><th>sample_field_4</th></tr>
<tr><td>table1_row1_column1</td><td></td><td>table1_row1_column3</td><td></td></tr>
<tr><td>table1_row2_column1</td><td>table1_row2_column2</td><td>table1_row2_column3</td><td>table1_row2_column4</td></tr>
<tr><td>table1_row3_column1</td><td></td><td>table1_row3_column3</td><td>table1_row3_column4</td></tr>
<tr><td>table1_row4_column1</td><td>table1_row4_column2</td><td>table1_row4_column3</td><td>table1_row4_column4</td></tr>
<tr><td></td><td></td><td></td><td>table1_row5_column4</td></tr>
</table></body></html>"
    | append
        [
            | makeresults
                | eval
                    _raw="<p align='center'><font size='4' color=blue>Example: Table 2</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_5</th><th>sample_field_6</th><th>sample_field_7</th></tr>
<tr><td></td><td>table2_row1_column2</td><td>table2_row1_column3</td></tr>
<tr><td>table2_row2_column1</td><td>table2_row2_column2</td><td>table2_row2_column3</td></tr>
<tr><td>table2_row3_column1</td><td></td><td>table2_row3_column3</td></tr>
<tr><td></td><td>table2_row4_column2</td><td></td></tr>
<tr><td>table2_row5_column1</td><td>table2_row5_column2</td><td>table2_row5_column3</td></tr>
</table></body></html>"
            ]
    | append
        [
            | makeresults
                | eval
                    _raw="<p align='center'><font size='4' color=blue>Example: Table 3</font></p>
            <table align=center border=2>
            <tr style=background-color:#2711F0 ><th>sample_field_1</th><th>sample_field_2</th><th>sample_field_3</th><th>sample_field_4</th><th>sample_field_5</th><th>sample_field_6</th></tr>
            <tr><td>table3_row1_column1</td><td></td><td>table1_row1_column3</td><td></td><td>table1_row1_column5</td><td>table1_row1_column6</td></tr>
            <tr><td>table3_row2_column1</td><td>table3_row2_column2</td><td>table1_row2_column3</td><td>table3_row2_column4</td><td>table1_row2_column5</td><td>table1_row2_column6</td></tr>
            </table></body></html>"
            ]         
             
    ``` Extract Table Titles ```
    | spath input=_raw path=p.font output=table_title
    | streamstats
        count as event_count
    | spath input=_raw path=table.tr output=data
    | fields - _raw
    | mvexpand data
    | eval
        item_data=replace(replace(replace(replace('data', "\<\/t(?:h|d)\>\<t(?:h|d)[^\>]*\>", "<BREAK>"), "^\<t(?:h|d)[^\>]*\>", "<START>"), "\<\/t(?:h|d)\>", "<END>"), "(\<START\>|\<BREAK\>)(\<BREAK\>|\<END\>)", "\1#NULL#\2")
    | streamstats
        count as row_number
            by event_count
    | eval
        row_number='row_number'-1,
        row_type=if(
            'row_number'==0,
                "header",
                "data"
            ),
       data_mv=split(replace(replace(item_data, "\<START\>", ""), "\<END\>", ""), "<BREAK>"),
       data_mv=case(
           mvcount(data_mv)==1, trim(data_mv, " "),
           mvcount(data_mv)>1, mvmap(data_mv, trim(data_mv, " "))
           )
    | fields - data, item_data
    | eventstats
        list(eval(case('row_type'=="header", 'data_mv'))) as header
            by event_count
    | eval
        data_mv_zip=mvzip(header, data_mv, "<DELIM>")
    | fields - header, data_mv
    | where NOT 'row_type'=="header"
    | eval
        table_json=json_object()
    | foreach mode=multivalue data_mv_zip
        [
            | eval
                key=mvindex(split('<<ITEM>>', "<DELIM>"), 0),
                value=mvindex(split('<<ITEM>>', "<DELIM>"), 1),
                table_json=json_set(table_json, 'key', 'value')
            ]
    | fields - key, value, row_type, data_mv_zip
    | spath input=table_json
    | fields - table_json
    | fields + _time, table_title, event_count, sample_field_*
    | foreach *
        [
            | eval
                <<FIELD>>=if(
                    '<<FIELD>>'=="#NULL#",
                        null(),
                        '<<FIELD>>'
                    )
            ]

richgalloway · ‎12-17-2023

You can use the mvexpand command to separate the multi-value fields into separate rows. First, however, you must preserve the relation between the field values by converting them into single-value tuples. Do that using mvzip then break the tuples apart using split.

... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
``` Combine related values ```
| eval tuple = mvzip(Filesystem, mvzip(Type, mvzip(Blocks, mvzip(Used, mvzip(Available, mvzip(Usage, Mounted_On))))))
``` Create a new row for each tuple ```
| mvexpand tuple
``` Break the tuple apart ```
| eval tuple = split(tuple, ",")
| eval Filesystem = mvindex(tuple,0), Type = mvindex(tuple,1), Blocks = mvindex(tuple, 2), Used = mvindex(tuple,3), Available = mvindex(tuple, 4), Usage = mvindex(tuple, 5), Mounted_On = mvindex(tuple, 6)
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On 
| dedup Server,Region,Environment,Server_IP

---
If this reply helps you, Karma would be appreciated.

Mrig342 · ‎12-18-2023

Hi @richgalloway

Thank you for your inputs.. !! I am able to get the table as expected with the help of the query.

Cheers..!!

Need to expand and create a table from one row to actual number of rows it should be

Classic dashboard

table

3 Ways to Make OpenTelemetry Even Better

What's New in Splunk Cloud Platform 9.2.2406?

Enterprise Security Content Update (ESCU) | New Releases