Dashboards & Visualizations

Splunking HTML Formatted Log Files

rgcurry
Contributor

We have a third-party application that uses HTML formatted logs; we cannot change this. The data we want to use is defined in a table. I cannot figure out a way to use field extractions to pull this data, but this is a weak area for me (for now). What would you suggest to pull this data from the logs?

Tags (3)
0 Karma
1 Solution

araitz
Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

View solution in original post

araitz
Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

araitz
Splunk Employee
Splunk Employee

No problem, we are here to help! BTW, you should be able to use the replace search command to get rid of or swap out the <br> with spaces.

0 Karma

rgcurry
Contributor

I really appreciate your time on this. I am busy right now with a migration of my Splunk environments to a new platform and will get back to this as either time allows (I may have a delay between the TEST and PROD migrations) or after these are complete. Reading over this, I see this just might do the trick. Again, thank you for sharing your expertise.

0 Karma

rgcurry
Contributor

Here is an example from the HTML formatted log. We want to use the data from the Headers to be the keyword and the data from the rows as its value.

<table width="100%" cellPadding="4" cellSpacing="0" align="right" style="table-layout:fixed;word-break:break-all;border-width:1pt">
<tr bgcolor="gray">
<td width="10%" style="color: Yellow"><b>Date<br>and Time</b></td>
<td width="20%" style="color: Yellow"><b>Thread</b></td>
<td width="8%" style="color: Yellow"><b>Login</b></td>
<td width="7%" style="color: Yellow"><b>IP</b></td>
<td width="5%" style="color: Yellow"><b>Type</b></td>
<td width="20%" style="color: Yellow"><b>Method</b></td>
<td width="30%" style="color: Yellow"><b>Message</b></td>
</tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.018</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(333)</td><td>Invalid request: Remote host: 10.175.226.11, Meta Data: [Function Name: GetBugValue, Login Session ID: 1339572, Project Session ID: 1029005, Call ID: 28]. Error: The session authentication has failed..</td></tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.030</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(353)</td><td>&nbsp<p>com.mercury.optane.core.CTdException<p>Messages:<br>The session authentication has failed.;<br><p>Stack Trace:<br>com.mercury.optane.core.CTdException: The session authentication has failed.<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:115)<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:94)<br>at com.mercury.td.web.CAbsServlet.assertRequestValidity(CAbsServlet.java:209)<br>at com.mercury.td.web.CAbsServlet.doPost(CAbsServlet.java:330)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:763)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1213)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1154)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:145)<br>at com.hp.qc.core.utils.gzipfilter.GZIPFilter.doFilter(GZIPFilter.java:30)<br>at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:190)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:130)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain._doFilter(WebAppFilterChain.java:87)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:848)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:691)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:654)<br>at com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:526)<br>at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90)<br>at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:764)<br>at com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1478)<br>at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:133)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:457)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:515)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:300)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:271)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)<br>at com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)<br>at com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)<br>at com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)<br>at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:136)<br>at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:196)<br>at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:751)<br>at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:881)<br>at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1551)<br></td></tr>

NOTE: I tried to paste this code in so that it would display but the whole table does not display. To see the rendered code, you will need to copy and paste into a file to feed to your browser. If anyone knows how to make the whole table display here, I'd like to know the way to make it so.

rgcurry
Contributor

I have requested that info from the primary contact for this application group. Will post as soon as I get it.

0 Karma

araitz
Splunk Employee
Splunk Employee

Can you please post a sanitized example? That would certainly help us help you.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...