Dashboards & Visualizations
Highlighted

Splunking HTML Formatted Log Files

Contributor

We have a third-party application that uses HTML formatted logs; we cannot change this. The data we want to use is defined in a table. I cannot figure out a way to use field extractions to pull this data, but this is a weak area for me (for now). What would you suggest to pull this data from the logs?

Tags (3)
0 Karma
Highlighted

Re: Splunking HTML Formatted Log Files

Splunk Employee
Splunk Employee

Can you please post a sanitized example? That would certainly help us help you.

0 Karma
Highlighted

Re: Splunking HTML Formatted Log Files

Contributor

I have requested that info from the primary contact for this application group. Will post as soon as I get it.

0 Karma
Highlighted

Re: Splunking HTML Formatted Log Files

Contributor

Here is an example from the HTML formatted log. We want to use the data from the Headers to be the keyword and the data from the rows as its value.

<table width="100%" cellPadding="4" cellSpacing="0" align="right" style="table-layout:fixed;word-break:break-all;border-width:1pt">
<tr bgcolor="gray">
<td width="10%" style="color: Yellow"><b>Date<br>and Time</b></td>
<td width="20%" style="color: Yellow"><b>Thread</b></td>
<td width="8%" style="color: Yellow"><b>Login</b></td>
<td width="7%" style="color: Yellow"><b>IP</b></td>
<td width="5%" style="color: Yellow"><b>Type</b></td>
<td width="20%" style="color: Yellow"><b>Method</b></td>
<td width="30%" style="color: Yellow"><b>Message</b></td>
</tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.018</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(333)</td><td>Invalid request: Remote host: 10.175.226.11, Meta Data: [Function Name: GetBugValue, Login Session ID: 1339572, Project Session ID: 1029005, Call ID: 28]. Error: The session authentication has failed..</td></tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.030</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(353)</td><td>&nbsp<p>com.mercury.optane.core.CTdException<p>Messages:<br>The session authentication has failed.;<br><p>Stack Trace:<br>com.mercury.optane.core.CTdException: The session authentication has failed.<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:115)<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:94)<br>at com.mercury.td.web.CAbsServlet.assertRequestValidity(CAbsServlet.java:209)<br>at com.mercury.td.web.CAbsServlet.doPost(CAbsServlet.java:330)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:763)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1213)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1154)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:145)<br>at com.hp.qc.core.utils.gzipfilter.GZIPFilter.doFilter(GZIPFilter.java:30)<br>at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:190)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:130)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain._doFilter(WebAppFilterChain.java:87)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:848)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:691)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:654)<br>at com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:526)<br>at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90)<br>at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:764)<br>at com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1478)<br>at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:133)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:457)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:515)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:300)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:271)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)<br>at com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)<br>at com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)<br>at com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)<br>at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:136)<br>at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:196)<br>at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:751)<br>at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:881)<br>at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1551)<br></td></tr>

NOTE: I tried to paste this code in so that it would display but the whole table does not display. To see the rendered code, you will need to copy and paste into a file to feed to your browser. If anyone knows how to make the whole table display here, I'd like to know the way to make it so.

Highlighted

Re: Splunking HTML Formatted Log Files

Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

View solution in original post

Highlighted

Re: Splunking HTML Formatted Log Files

Contributor

I really appreciate your time on this. I am busy right now with a migration of my Splunk environments to a new platform and will get back to this as either time allows (I may have a delay between the TEST and PROD migrations) or after these are complete. Reading over this, I see this just might do the trick. Again, thank you for sharing your expertise.

0 Karma
Highlighted

Re: Splunking HTML Formatted Log Files

Splunk Employee
Splunk Employee

No problem, we are here to help! BTW, you should be able to use the replace search command to get rid of or swap out the <br> with spaces.

0 Karma