Hi All,
I got a logs like below and I need to create a table out of it.
<p align='center'><font size='4' color=blue>Disk Utilization for gcgnamslap in Asia Testing -10.100.158.51 </font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>Filesystem</th><th>Type</th><th>Blocks</th><th>Used %</th><th>Available %</th><th>Usage %</th><th>Mounted on</th></tr>
<tr><td> /dev/root </td><td> ext3 </td><td> 5782664 </td><td> 1807636 </td><td> 3674620 </td><td bgcolor=red> 33% </td><td> / </td></tr>
<tr><td> devtmpfs </td><td> devtmpfs </td><td> 15872628 </td><td> 0 </td><td> 15872628 </td><td bgcolor=white> 0% </td><td> /dev </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 10580284 </td><td> 5298356 </td><td bgcolor=red> 67% </td><td> /dev/shm </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 26984 </td><td> 15851656 </td><td bgcolor=white> 1% </td><td> /run </td></tr>
<tr><td> tmpfs </td><td> tmpfs </td><td> 15878640 </td><td> 0 </td><td> 15878640 </td><td bgcolor=white> 0% </td><td> /sys/fs/cgroup </td></tr>
<tr><td> /dev/md1 </td><td> ext3 </td><td> 96922 </td><td> 36667 </td><td> 55039 </td><td bgcolor=red> 40% </td><td> /boot </td></tr>
<tr><td> /dev/md6 </td><td> ext3 </td><td> 62980468 </td><td> 28501072 </td><td> 31278452 </td><td bgcolor=red> 48% </td><td> /usr/sw </td></tr>
<tr><td> /dev/mapper/cp1 </td><td> ext4 </td><td> 1126568640 </td><td> 269553048 </td><td> 800534468 </td><td bgcolor=white> 26% </td><td> /usr/p1 </td></tr>
<tr><td> /dev/mapper/cp2 </td><td> ext4 </td><td> 1126568640 </td><td> 85476940 </td><td> 984610576 </td><td bgcolor=white> 8% </td><td> /usr/p2 </td></tr>
</table></body></html>
I used below query to get the table:
... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On | dedup Server,Region,Environment,Server_IP
And below is the table I am getting:
Server | Region | Environment | Server_IP | Filesystem | Type | Blocks | Used | Available | Usage | Mounted_On |
gcgnamslap | Asia | Testing | 10.100.158.51 | /dev/root devtmpfs tmpfs tmpfs tmpfs /dev/md1 /dev/md6 /dev/mapper/p1 /dev/mapper/p2 | ext3 devtmpfs tmpfs tmpfs tmpfs ext3 ext3 ext4 ext4 | 5782664 15872628 15878640 15878640 15878640 96922 62980468 1126568640 1126568640 | 1807636 0 10580284 26984 0 36667 28501072 269553048 85476940 | 3674620 15872628 5298356 15851656 15878640 55039 31278452 800534468 984610576 | 33 0 67 1 0 40 48 26 8 | / /dev /dev/shm /run /sys/fs/cgroup /boot /usr/sw /usr/p1 /usr/p2 |
Here, the fields Filesystem,Type,Blocks,Used,Available,Usage_Percent and Mounted_On are coming up in one row. I want the table to be separated according to the rows like below:
Server | Region | Environment | Server_IP | Filesystem | Type | Blocks | Used | Available | Usage | Mounted_On |
gcgnamslap | Asia | Testing | 10.100.158.51 | /dev/root | ext3 | 5782664 | 1807636 | 3674620 | 33 | / |
gcgnamslap | Asia | Testing | 10.100.158.51 | devtmpfs | devtmpfs | 15872628 | 0 | 15872628 | 0 | /dev |
gcgnamslap | Asia | Testing | 10.100.158.51 | tmpfs | tmpfs | 15878640 | 10580284 | 5298356 | 67 | /dev/shm |
gcgnamslap | Asia | Testing | 10.100.158.51 | tmpfs | tmpfs | 15878640 | 26984 | 15851656 | 1 | /run |
gcgnamslap | Asia | Testing | 10.100.158.51 | tmpfs | tmpfs | 15878640 | 1807636 | 15878640 | 0 | /sys/fs/cgroup |
gcgnamslap | Asia | Testing | 10.100.158.51 | /dev/md1 | ext3 | 96922 | 36667 | 55039 | 40 | /boot |
gcgnamslap | Asia | Testing | 10.100.158.51 | /dev/md6 | ext3 | 62980468 | 28501072 | 31278452 | 48 | /usr/sw |
gcgnamslap | Asia | Testing | 10.100.158.51 | /dev/mapper/p1 | ext4 | 1126568640 | 269553048 | 800534468 | 26 | /usr/p1 |
gcgnamslap | Asia | Testing | 10.100.158.51 | /dev/mapper/p2 | ext4 | 1126568640 | 85476940 | 984610576 | 8 | /usr/p2 |
Please help to create a query to get the table in the above expected manner.
Your kind inputs are highly appreciated..!!
Thank You..!!
You can use the mvexpand command to separate the multi-value fields into separate rows. First, however, you must preserve the relation between the field values by converting them into single-value tuples. Do that using mvzip then break the tuples apart using split.
... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
``` Combine related values ```
| eval tuple = mvzip(Filesystem, mvzip(Type, mvzip(Blocks, mvzip(Used, mvzip(Available, mvzip(Usage, Mounted_On))))))
``` Create a new row for each tuple ```
| mvexpand tuple
``` Break the tuple apart ```
| eval tuple = split(tuple, ",")
| eval Filesystem = mvindex(tuple,0), Type = mvindex(tuple,1), Blocks = mvindex(tuple, 2), Used = mvindex(tuple,3), Available = mvindex(tuple, 4), Usage = mvindex(tuple, 5), Mounted_On = mvindex(tuple, 6)
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On
| dedup Server,Region,Environment,Server_IP
I agree with @richgalloway response here for this particular example.
However if you want a more generalize approach to parsing out a table being ingested in Splunk with this format in _raw (capable of accounting for different headers and not having to build out a regex for each time) you may want to try the SPL below.
<base_search>
``` Doing this before an mvexpand occurring later in the search pipeline can help attribute the expanded data back to it's orginal event (if needed) ```
| streamstats
count as event_count
``` Extract Info from table title (this is the one line that would need to be adjusted for each unique table depending on title format and data you would want extracted from it ```
| rex field=_raw "Disk\s+Utilization\s+for\s+(?<Server>[^\s]+)\s+in\s+(?<Region>[^\s]+)\s+(?<Environment>[^\s]+)\s+\-(?<Server_IP>[^\s]+)"
``` Parse individual rows from the table as a multivalued field ```
| spath input=_raw path=table.tr output=data
| fields - _raw
``` mvexpand each row to it's own event ```
| mvexpand data
``` replace tags with standardized breakers for parsing ```
| eval
item_data=replace(replace(replace(replace('data', "\<\/t(?:h|d)\>\<t(?:h|d)[^\>]*\>", "<BREAK>"), "^\<t(?:h|d)[^\>]*\>", "<START>"), "\<\/t(?:h|d)\>", "<END>"), "(\<START\>|\<BREAK\>)(\<BREAK\>|\<END\>)", "\1#NULL#\2")
``` set a row count for each table ```
| streamstats
count as row_number
by event_count
``` assign row type depending on the row number (assumes that the table has a header) ```
| eval
row_number='row_number'-1,
row_type=if(
'row_number'==0,
"header",
"data"
),
``` remove start and end tags ```
data_mv=split(replace(replace(item_data, "\<START\>", ""), "\<END\>", ""), "<BREAK>"),
``` trim of any leading or tailing spaces from each entry in the table ```
data_mv=case(
mvcount(data_mv)==1, trim(data_mv, " "),
mvcount(data_mv)>1, mvmap(data_mv, trim(data_mv, " "))
)
| fields - data, item_data
``` pull header fieldnames into each row of the table to stitch it all together ```
| eventstats
list(eval(case('row_type'=="header", 'data_mv'))) as header
by event_count
``` mvzip to attribute each table entry to the header field (we will loop through this field to build the table further down the search pipeline) ```
| eval
data_mv_zip=mvzip(header, data_mv, "<DELIM>")
| fields - header, data_mv
``` we no longer need the row that only holds the header info now that it has been pulled into each data entry row of the table ```
| where NOT 'row_type'=="header"
``` initialize json object that will hold KV pairs of header_fieldname/row_entry ```
| eval
table_json=json_object()
``` loop through the MV field and build the json_object containing all KV assignments ```
| foreach mode=multivalue data_mv_zip
[
| eval
key=mvindex(split('<<ITEM>>', "<DELIM>"), 0),
value=mvindex(split('<<ITEM>>', "<DELIM>"), 1),
table_json=json_set(table_json, 'key', 'value')
]
| fields - key, value, row_type, data_mv_zip
``` spath the json_object just built ```
| spath input=table_json
``` remove the json_object now that we dont need it ```
| fields - table_json
``` list out data in the order that you want it displayed ```
| table _time, event_count, Server, Region, Environment, Server_IP, "Filesystem", "Type", "Blocks", "Used %", "Available %", "Usage %", "Mounted on"
``` remove any placeholder #NULL# values with actual null() values since table has been generated ```
| foreach *
[
| eval
<<FIELD>>=if(
'<<FIELD>>'=="#NULL#",
null(),
'<<FIELD>>'
)
]
You can see the output I got locally using this method below.
This method should also account for null values in a the html table as well, whereas the regex shared from your original post makes the assumption that table entries always have a leading and trailing space wrapping a non-null value. Which is true for this table you shared, but for generalizing parsing a html table you may run into issues doing it that way.
As a proof on concept here is some output of some simulation data I put together with 3 separate events, each with their own html table and sets of data with their own headers and piping them through this method.
Below is the SPL used to generate this screenshot.
| makeresults
| eval
_raw="<p align='center'><font size='4' color=blue>Example: Table 1</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_1</th><th>sample_field_2</th><th>sample_field_3</th><th>sample_field_4</th></tr>
<tr><td>table1_row1_column1</td><td></td><td>table1_row1_column3</td><td></td></tr>
<tr><td>table1_row2_column1</td><td>table1_row2_column2</td><td>table1_row2_column3</td><td>table1_row2_column4</td></tr>
<tr><td>table1_row3_column1</td><td></td><td>table1_row3_column3</td><td>table1_row3_column4</td></tr>
<tr><td>table1_row4_column1</td><td>table1_row4_column2</td><td>table1_row4_column3</td><td>table1_row4_column4</td></tr>
<tr><td></td><td></td><td></td><td>table1_row5_column4</td></tr>
</table></body></html>"
| append
[
| makeresults
| eval
_raw="<p align='center'><font size='4' color=blue>Example: Table 2</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_5</th><th>sample_field_6</th><th>sample_field_7</th></tr>
<tr><td></td><td>table2_row1_column2</td><td>table2_row1_column3</td></tr>
<tr><td>table2_row2_column1</td><td>table2_row2_column2</td><td>table2_row2_column3</td></tr>
<tr><td>table2_row3_column1</td><td></td><td>table2_row3_column3</td></tr>
<tr><td></td><td>table2_row4_column2</td><td></td></tr>
<tr><td>table2_row5_column1</td><td>table2_row5_column2</td><td>table2_row5_column3</td></tr>
</table></body></html>"
]
| append
[
| makeresults
| eval
_raw="<p align='center'><font size='4' color=blue>Example: Table 3</font></p>
<table align=center border=2>
<tr style=background-color:#2711F0 ><th>sample_field_1</th><th>sample_field_2</th><th>sample_field_3</th><th>sample_field_4</th><th>sample_field_5</th><th>sample_field_6</th></tr>
<tr><td>table3_row1_column1</td><td></td><td>table1_row1_column3</td><td></td><td>table1_row1_column5</td><td>table1_row1_column6</td></tr>
<tr><td>table3_row2_column1</td><td>table3_row2_column2</td><td>table1_row2_column3</td><td>table3_row2_column4</td><td>table1_row2_column5</td><td>table1_row2_column6</td></tr>
</table></body></html>"
]
``` Extract Table Titles ```
| spath input=_raw path=p.font output=table_title
| streamstats
count as event_count
| spath input=_raw path=table.tr output=data
| fields - _raw
| mvexpand data
| eval
item_data=replace(replace(replace(replace('data', "\<\/t(?:h|d)\>\<t(?:h|d)[^\>]*\>", "<BREAK>"), "^\<t(?:h|d)[^\>]*\>", "<START>"), "\<\/t(?:h|d)\>", "<END>"), "(\<START\>|\<BREAK\>)(\<BREAK\>|\<END\>)", "\1#NULL#\2")
| streamstats
count as row_number
by event_count
| eval
row_number='row_number'-1,
row_type=if(
'row_number'==0,
"header",
"data"
),
data_mv=split(replace(replace(item_data, "\<START\>", ""), "\<END\>", ""), "<BREAK>"),
data_mv=case(
mvcount(data_mv)==1, trim(data_mv, " "),
mvcount(data_mv)>1, mvmap(data_mv, trim(data_mv, " "))
)
| fields - data, item_data
| eventstats
list(eval(case('row_type'=="header", 'data_mv'))) as header
by event_count
| eval
data_mv_zip=mvzip(header, data_mv, "<DELIM>")
| fields - header, data_mv
| where NOT 'row_type'=="header"
| eval
table_json=json_object()
| foreach mode=multivalue data_mv_zip
[
| eval
key=mvindex(split('<<ITEM>>', "<DELIM>"), 0),
value=mvindex(split('<<ITEM>>', "<DELIM>"), 1),
table_json=json_set(table_json, 'key', 'value')
]
| fields - key, value, row_type, data_mv_zip
| spath input=table_json
| fields - table_json
| fields + _time, table_title, event_count, sample_field_*
| foreach *
[
| eval
<<FIELD>>=if(
'<<FIELD>>'=="#NULL#",
null(),
'<<FIELD>>'
)
]
You can use the mvexpand command to separate the multi-value fields into separate rows. First, however, you must preserve the relation between the field values by converting them into single-value tuples. Do that using mvzip then break the tuples apart using split.
... | rex field=_raw "Disk\sUtilization\sfor\s(?P<Server>[^\s]+)\sin\s(?P<Region>[^\s]+)\s(?P<Environment>[^\s]+)\s\-(?P<Server_IP>[^\s]+)\s\<"
| rex field=_raw max_match=0 "\<tr\>\<td\>\s(?P<Filesystem>[^\s]+)\s\<\/td\>\<td\>\s(?P<Type>[^\s]+)\s\<\/td\>\<td\>\s(?P<Blocks>[^\s]+)\s\<\/td\>\<td\>\s(?P<Used>[^\s]+)\s\<\/td\>\<td\>\s(?P<Available>[^\s]+)\s\<\/td\>\<td\sbgcolor\=\w+\>\s(?P<Usage>[^\%]+)\%\s\<\/td\>\<td\>\s(?P<Mounted_On>[^\s]+)\s\<\/td\>\<\/tr\>"
``` Combine related values ```
| eval tuple = mvzip(Filesystem, mvzip(Type, mvzip(Blocks, mvzip(Used, mvzip(Available, mvzip(Usage, Mounted_On))))))
``` Create a new row for each tuple ```
| mvexpand tuple
``` Break the tuple apart ```
| eval tuple = split(tuple, ",")
| eval Filesystem = mvindex(tuple,0), Type = mvindex(tuple,1), Blocks = mvindex(tuple, 2), Used = mvindex(tuple,3), Available = mvindex(tuple, 4), Usage = mvindex(tuple, 5), Mounted_On = mvindex(tuple, 6)
| table Server,Region,Environment,Server_IP,Filesystem,Type,Blocks,Used,Available,Usage,Mounted_On
| dedup Server,Region,Environment,Server_IP
Thank you for your inputs.. !! I am able to get the table as expected with the help of the query.
Cheers..!!