Dashboards & Visualizations

Need to expand and create a table from one row to actual number of rows it should be

Mrig342
Contributor

Hi All,

I got a logs like below and I need to create a table out of it.

 

<p align='center'><font size='4' color=blue>Disk Utilization for gcgnamslap in Asia Testing -10.100.158.51 </font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>Filesystem</th><th>Type</th><th>Blocks</th><th>Used %</th><th>Available %</th><th>Usage %</th><th>Mounted on</th></tr>
<tr><td> /dev/root </td><td> ext3 </td><td> 5782664 </td><td> 1807636 </td><td> 3674620 </td><td bgcolor=red> 33% </td><td> / </td></tr>
<tr><td> devtmpfs </td><td> devtmpfs </td><td> 15872628 </td><td> 0 </td><td> 15872628 </td><td bgcolor=white> 0% </td><td> /dev </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 10580284 </td><td> 5298356 </td><td bgcolor=red> 67% </td><td> /dev/shm </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 26984 </td><td> 15851656 </td><td bgcolor=white> 1% </td><td> /run </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 0 </td><td> 15878640 </td><td bgcolor=white> 0% </td><td> /sys/fs/cgroup </td></tr>
<tr><td> /dev/md1 </td><td> ext3 </td><td> 96922 </td><td> 36667 </td><td> 55039 </td><td bgcolor=red> 40% </td><td> /boot </td></tr>
<tr><td> /dev/md6 </td><td> ext3 </td><td> 62980468 </td><td> 28501072 </td><td> 31278452 </td><td bgcolor=red> 48% </td><td> /usr/sw </td></tr>
<tr><td> /dev/mapper/cp1 </td><td> ext4 </td><td> 1126568640 </td><td> 269553048 </td><td> 800534468 </td><td bgcolor=white> 26% </td><td> /usr/p1 </td></tr>
<tr><td> /dev/mapper/cp2 </td><td> ext4 </td><td> 1126568640 </td><td> 85476940 </td><td> 984610576 </td><td bgcolor=white> 8% </td><td> /usr/p2 </td></tr>
</table></body></html>

 

I used below query to get the table:

 

... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On | dedup Server,Region,Environment,Server_IP

 

And below is the table  I am getting:

ServerRegionEnvironmentServer_IPFilesystemTypeBlocksUsedAvailableUsageMounted_On
gcgnamslapAsiaTesting10.100.158.51
/dev/root
devtmpfs
tmpfs
tmpfs
tmpfs
/dev/md1
/dev/md6
/dev/mapper/p1
/dev/mapper/p2
ext3
devtmpfs
tmpfs
tmpfs
tmpfs
ext3
ext3
ext4
ext4
5782664
15872628 15878640
15878640
15878640
96922
62980468
1126568640
1126568640
1807636
0
10580284
26984
0
36667
28501072 269553048
85476940
3674620
15872628
5298356
15851656
15878640
55039
31278452
800534468
984610576
33
0
67
1
0
40
48
26
8
/
/dev
/dev/shm
/run
/sys/fs/cgroup
/boot
/usr/sw
/usr/p1
/usr/p2

Here, the fields Filesystem,Type,Blocks,Used,Available,Usage_Percent and Mounted_On are coming up in one row. I want the table to be separated according to the rows like below:

ServerRegionEnvironmentServer_IPFilesystemTypeBlocksUsedAvailableUsageMounted_On
gcgnamslapAsiaTesting10.100.158.51/dev/rootext357826641807636367462033/
gcgnamslapAsiaTesting10.100.158.51devtmpfsdevtmpfs158726280158726280/dev
gcgnamslapAsiaTesting10.100.158.51tmpfstmpfs1587864010580284529835667/dev/shm
gcgnamslapAsiaTesting10.100.158.51tmpfstmpfs1587864026984158516561/run
gcgnamslapAsiaTesting10.100.158.51tmpfstmpfs158786401807636158786400/sys/fs/cgroup
gcgnamslapAsiaTesting10.100.158.51/dev/md1ext396922366675503940/boot
gcgnamslapAsiaTesting10.100.158.51/dev/md6ext362980468285010723127845248/usr/sw
gcgnamslapAsiaTesting10.100.158.51/dev/mapper/p1ext4112656864026955304880053446826/usr/p1
gcgnamslapAsiaTesting10.100.158.51/dev/mapper/p2ext41126568640854769409846105768 /usr/p2

Please help to create a query to get the table in the above expected manner.

Your kind inputs are highly appreciated..!!

Thank You..!!

Labels (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

You can use the mvexpand command to separate the multi-value fields into separate rows.  First, however, you must preserve the relation between the field values by converting them into single-value tuples.  Do that using mvzip then break the tuples apart using split.

... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
``` Combine related values ```
| eval tuple = mvzip(Filesystem, mvzip(Type, mvzip(Blocks, mvzip(Used, mvzip(Available, mvzip(Usage, Mounted_On))))))
``` Create a new row for each tuple ```
| mvexpand tuple
``` Break the tuple apart ```
| eval tuple = split(tuple, ",")
| eval Filesystem = mvindex(tuple,0), Type = mvindex(tuple,1), Blocks = mvindex(tuple, 2), Used = mvindex(tuple,3), Available = mvindex(tuple, 4), Usage = mvindex(tuple, 5), Mounted_On = mvindex(tuple, 6)
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On 
| dedup Server,Region,Environment,Server_IP

 

---
If this reply helps you, Karma would be appreciated.

View solution in original post

dtburrows3
Builder

I agree with @richgalloway response here for this particular example. 
However if you want a more generalize approach to parsing out a table being ingested in Splunk with this format in _raw (capable of accounting for different headers and not having to build out a regex for each time) you may want to try the SPL below.

<base_search>

    ``` Doing this before an mvexpand occurring later in the search pipeline can help attribute the expanded data back to it's orginal event (if needed) ```
    | streamstats
        count as event_count
    ``` Extract Info from table title (this is the one line that would need to be adjusted for each unique table depending on title format and data you would want extracted from it ```
    | rex field=_raw "Disk\s+Utilization\s+for\s+(?<Server>[^\s]+)\s+in\s+(?<Region>[^\s]+)\s+(?<Environment>[^\s]+)\s+\-(?<Server_IP>[^\s]+)"
    ``` Parse individual rows from the table as a multivalued field ```
    | spath input=_raw path=table.tr output=data
    | fields - _raw
    ``` mvexpand each row to it's own event ```
    | mvexpand data
    ``` replace tags with standardized breakers for parsing ```
    | eval
        item_data=replace(replace(replace(replace('data', "\<\/t(?:h|d)\>\<t(?:h|d)[^\>]*\>", "<BREAK>"), "^\<t(?:h|d)[^\>]*\>", "<START>"), "\<\/t(?:h|d)\>", "<END>"), "(\<START\>|\<BREAK\>)(\<BREAK\>|\<END\>)", "\1#NULL#\2")
    ``` set a row count for each table ```
    | streamstats
        count as row_number
            by event_count
    ``` assign row type depending on the row number (assumes that the table has a header) ```
    | eval
        row_number='row_number'-1,
        row_type=if(
            'row_number'==0,
                "header",
                "data"
            ),
        ``` remove start and end tags ```
        data_mv=split(replace(replace(item_data, "\<START\>", ""), "\<END\>", ""), "<BREAK>"),
        ``` trim of any leading or tailing spaces from each entry in the table ```
        data_mv=case(
            mvcount(data_mv)==1, trim(data_mv, " "),
            mvcount(data_mv)>1, mvmap(data_mv, trim(data_mv, " "))
            )
    | fields - data, item_data
    ``` pull header fieldnames into each row of the table to stitch it all together ```
    | eventstats
        list(eval(case('row_type'=="header", 'data_mv'))) as header
            by event_count
    ``` mvzip to attribute each table entry to the header field (we will loop through this field to build the table further down the search pipeline) ```
    | eval
        data_mv_zip=mvzip(header, data_mv, "<DELIM>")
    | fields - header, data_mv
    ``` we no longer need the row that only holds the header info now that it has been pulled into each data entry row of the table ```
    | where NOT 'row_type'=="header"
    ``` initialize json object that will hold KV pairs of header_fieldname/row_entry ```
    | eval
        table_json=json_object()
    ``` loop through the MV field and build the json_object containing all KV assignments ```
    | foreach mode=multivalue data_mv_zip
        [
            | eval
                key=mvindex(split('<<ITEM>>', "<DELIM>"), 0),
                value=mvindex(split('<<ITEM>>', "<DELIM>"), 1),
                table_json=json_set(table_json, 'key', 'value')
            ]
    | fields - key, value, row_type, data_mv_zip
    ``` spath the json_object just built ```
    | spath input=table_json
    ``` remove the json_object now that we dont need it ```
    | fields - table_json
    ``` list out data in the order that you want it displayed ```
    | table _time, event_count, Server, Region, Environment, Server_IP, "Filesystem", "Type", "Blocks", "Used %", "Available %", "Usage %", "Mounted on"
    ``` remove any placeholder #NULL# values with actual null() values since table has been generated ```
    | foreach *
        [
            | eval
                <<FIELD>>=if(
                    '<<FIELD>>'=="#NULL#",
                        null(),
                        '<<FIELD>>'
                    )
            ]


You can see the output I got locally using this method below.

dtburrows3_0-1702834876527.png


This method should also account for null values in a the html table as well, whereas the regex shared from your original post makes the assumption that table entries always have a leading and trailing space wrapping a non-null value. Which is true for this table you shared, but for generalizing parsing a html table you may run into issues doing it that way.

As a proof on concept here is some output of some simulation data I put together with 3 separate events, each with their own html table and sets of data with their own headers and piping them through this method.

dtburrows3_1-1702836519739.png


Below is the SPL used to generate this screenshot.

| makeresults
    | eval
        _raw="<p align='center'><font size='4' color=blue>Example: Table 1</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_1</th><th>sample_field_2</th><th>sample_field_3</th><th>sample_field_4</th></tr>
<tr><td>table1_row1_column1</td><td></td><td>table1_row1_column3</td><td></td></tr>
<tr><td>table1_row2_column1</td><td>table1_row2_column2</td><td>table1_row2_column3</td><td>table1_row2_column4</td></tr>
<tr><td>table1_row3_column1</td><td></td><td>table1_row3_column3</td><td>table1_row3_column4</td></tr>
<tr><td>table1_row4_column1</td><td>table1_row4_column2</td><td>table1_row4_column3</td><td>table1_row4_column4</td></tr>
<tr><td></td><td></td><td></td><td>table1_row5_column4</td></tr>
</table></body></html>"
    | append
        [
            | makeresults
                | eval
                    _raw="<p align='center'><font size='4' color=blue>Example: Table 2</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_5</th><th>sample_field_6</th><th>sample_field_7</th></tr>
<tr><td></td><td>table2_row1_column2</td><td>table2_row1_column3</td></tr>
<tr><td>table2_row2_column1</td><td>table2_row2_column2</td><td>table2_row2_column3</td></tr>
<tr><td>table2_row3_column1</td><td></td><td>table2_row3_column3</td></tr>
<tr><td></td><td>table2_row4_column2</td><td></td></tr>
<tr><td>table2_row5_column1</td><td>table2_row5_column2</td><td>table2_row5_column3</td></tr>
</table></body></html>"
            ]
    | append
        [
            | makeresults
                | eval
                    _raw="<p align='center'><font size='4' color=blue>Example: Table 3</font></p>
            <table align=center border=2>
            <tr style=background-color:#2711F0 ><th>sample_field_1</th><th>sample_field_2</th><th>sample_field_3</th><th>sample_field_4</th><th>sample_field_5</th><th>sample_field_6</th></tr>
            <tr><td>table3_row1_column1</td><td></td><td>table1_row1_column3</td><td></td><td>table1_row1_column5</td><td>table1_row1_column6</td></tr>
            <tr><td>table3_row2_column1</td><td>table3_row2_column2</td><td>table1_row2_column3</td><td>table3_row2_column4</td><td>table1_row2_column5</td><td>table1_row2_column6</td></tr>
            </table></body></html>"
            ]         
             
    ``` Extract Table Titles ```
    | spath input=_raw path=p.font output=table_title
    | streamstats
        count as event_count
    | spath input=_raw path=table.tr output=data
    | fields - _raw
    | mvexpand data
    | eval
        item_data=replace(replace(replace(replace('data', "\<\/t(?:h|d)\>\<t(?:h|d)[^\>]*\>", "<BREAK>"), "^\<t(?:h|d)[^\>]*\>", "<START>"), "\<\/t(?:h|d)\>", "<END>"), "(\<START\>|\<BREAK\>)(\<BREAK\>|\<END\>)", "\1#NULL#\2")
    | streamstats
        count as row_number
            by event_count
    | eval
        row_number='row_number'-1,
        row_type=if(
            'row_number'==0,
                "header",
                "data"
            ),
       data_mv=split(replace(replace(item_data, "\<START\>", ""), "\<END\>", ""), "<BREAK>"),
       data_mv=case(
           mvcount(data_mv)==1, trim(data_mv, " "),
           mvcount(data_mv)>1, mvmap(data_mv, trim(data_mv, " "))
           )
    | fields - data, item_data
    | eventstats
        list(eval(case('row_type'=="header", 'data_mv'))) as header
            by event_count
    | eval
        data_mv_zip=mvzip(header, data_mv, "<DELIM>")
    | fields - header, data_mv
    | where NOT 'row_type'=="header"
    | eval
        table_json=json_object()
    | foreach mode=multivalue data_mv_zip
        [
            | eval
                key=mvindex(split('<<ITEM>>', "<DELIM>"), 0),
                value=mvindex(split('<<ITEM>>', "<DELIM>"), 1),
                table_json=json_set(table_json, 'key', 'value')
            ]
    | fields - key, value, row_type, data_mv_zip
    | spath input=table_json
    | fields - table_json
    | fields + _time, table_title, event_count, sample_field_*
    | foreach *
        [
            | eval
                <<FIELD>>=if(
                    '<<FIELD>>'=="#NULL#",
                        null(),
                        '<<FIELD>>'
                    )
            ]


 

richgalloway
SplunkTrust
SplunkTrust

You can use the mvexpand command to separate the multi-value fields into separate rows.  First, however, you must preserve the relation between the field values by converting them into single-value tuples.  Do that using mvzip then break the tuples apart using split.

... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
``` Combine related values ```
| eval tuple = mvzip(Filesystem, mvzip(Type, mvzip(Blocks, mvzip(Used, mvzip(Available, mvzip(Usage, Mounted_On))))))
``` Create a new row for each tuple ```
| mvexpand tuple
``` Break the tuple apart ```
| eval tuple = split(tuple, ",")
| eval Filesystem = mvindex(tuple,0), Type = mvindex(tuple,1), Blocks = mvindex(tuple, 2), Used = mvindex(tuple,3), Available = mvindex(tuple, 4), Usage = mvindex(tuple, 5), Mounted_On = mvindex(tuple, 6)
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On 
| dedup Server,Region,Environment,Server_IP

 

---
If this reply helps you, Karma would be appreciated.

Mrig342
Contributor

Hi @richgalloway

Thank you for your inputs.. !! I am able to get the table as expected with the help of the query.

Cheers..!!

 

Get Updates on the Splunk Community!

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

We love our Splunk Community and want you to feel inspired by all your hard work! Eric Fusilero, our VP of ...