Dashboards & Visualizations

Splunking HTML Formatted Log Files

rgcurry
Contributor

We have a third-party application that uses HTML formatted logs; we cannot change this. The data we want to use is defined in a table. I cannot figure out a way to use field extractions to pull this data, but this is a weak area for me (for now). What would you suggest to pull this data from the logs?

Tags (3)
0 Karma
1 Solution

araitz
Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

View solution in original post

araitz
Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

araitz
Splunk Employee
Splunk Employee

No problem, we are here to help! BTW, you should be able to use the replace search command to get rid of or swap out the <br> with spaces.

0 Karma

rgcurry
Contributor

I really appreciate your time on this. I am busy right now with a migration of my Splunk environments to a new platform and will get back to this as either time allows (I may have a delay between the TEST and PROD migrations) or after these are complete. Reading over this, I see this just might do the trick. Again, thank you for sharing your expertise.

0 Karma

rgcurry
Contributor

Here is an example from the HTML formatted log. We want to use the data from the Headers to be the keyword and the data from the rows as its value.

<table width="100%" cellPadding="4" cellSpacing="0" align="right" style="table-layout:fixed;word-break:break-all;border-width:1pt">
<tr bgcolor="gray">
<td width="10%" style="color: Yellow"><b>Date<br>and Time</b></td>
<td width="20%" style="color: Yellow"><b>Thread</b></td>
<td width="8%" style="color: Yellow"><b>Login</b></td>
<td width="7%" style="color: Yellow"><b>IP</b></td>
<td width="5%" style="color: Yellow"><b>Type</b></td>
<td width="20%" style="color: Yellow"><b>Method</b></td>
<td width="30%" style="color: Yellow"><b>Message</b></td>
</tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.018</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(333)</td><td>Invalid request: Remote host: 10.175.226.11, Meta Data: [Function Name: GetBugValue, Login Session ID: 1339572, Project Session ID: 1029005, Call ID: 28]. Error: The session authentication has failed..</td></tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.030</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(353)</td><td>&nbsp<p>com.mercury.optane.core.CTdException<p>Messages:<br>The session authentication has failed.;<br><p>Stack Trace:<br>com.mercury.optane.core.CTdException: The session authentication has failed.<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:115)<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:94)<br>at com.mercury.td.web.CAbsServlet.assertRequestValidity(CAbsServlet.java:209)<br>at com.mercury.td.web.CAbsServlet.doPost(CAbsServlet.java:330)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:763)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1213)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1154)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:145)<br>at com.hp.qc.core.utils.gzipfilter.GZIPFilter.doFilter(GZIPFilter.java:30)<br>at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:190)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:130)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain._doFilter(WebAppFilterChain.java:87)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:848)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:691)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:654)<br>at com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:526)<br>at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90)<br>at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:764)<br>at com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1478)<br>at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:133)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:457)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:515)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:300)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:271)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)<br>at com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)<br>at com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)<br>at com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)<br>at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:136)<br>at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:196)<br>at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:751)<br>at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:881)<br>at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1551)<br></td></tr>

NOTE: I tried to paste this code in so that it would display but the whole table does not display. To see the rendered code, you will need to copy and paste into a file to feed to your browser. If anyone knows how to make the whole table display here, I'd like to know the way to make it so.

rgcurry
Contributor

I have requested that info from the primary contact for this application group. Will post as soon as I get it.

0 Karma

araitz
Splunk Employee
Splunk Employee

Can you please post a sanitized example? That would certainly help us help you.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...