Dashboards & Visualizations

Splunking HTML Formatted Log Files

rgcurry
Contributor

We have a third-party application that uses HTML formatted logs; we cannot change this. The data we want to use is defined in a table. I cannot figure out a way to use field extractions to pull this data, but this is a weak area for me (for now). What would you suggest to pull this data from the logs?

Tags (3)
0 Karma
1 Solution

araitz
Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

View solution in original post

araitz
Splunk Employee
Splunk Employee

Sorry it took me so long to get back to this!

props.conf:

[your_sourcetype]
KV_MODE=none
SHOULD_LINEMERGE=True
BREAK_ONLY_BEFORE=^\<table
DATETIME_CONFIG=CURRENT
REPORT-12312=headers,row,values

transforms.conf:

[headers]
REGEX=\<td.*?Yellow\"\>\<b\>(.*?)\<\/b\>\<\/td\>
FORMAT=field::$1
MV_ADD=true
REPEAT_MATCH=True

[row]
REGEX=(?m)\<tr\sbgcolor\=\"tomato\"\>(.*)\<\/tr\>
FORMAT=row::$1

[values]
SOURCE_KEY=row
REGEX=\<td\>(.*?)\<\/td\>
FORMAT=value::$1
MV_ADD=true
REPEAT_MATCH=True

This search will yield a multi-valued field called 'key_val' where the first value will be:

"Date<br>and Time,Jan 18<br>18:48:36.018"

sourcetype=your_sourcetype | eval key_val=mvzip(field,value) 

araitz
Splunk Employee
Splunk Employee

No problem, we are here to help! BTW, you should be able to use the replace search command to get rid of or swap out the <br> with spaces.

0 Karma

rgcurry
Contributor

I really appreciate your time on this. I am busy right now with a migration of my Splunk environments to a new platform and will get back to this as either time allows (I may have a delay between the TEST and PROD migrations) or after these are complete. Reading over this, I see this just might do the trick. Again, thank you for sharing your expertise.

0 Karma

rgcurry
Contributor

Here is an example from the HTML formatted log. We want to use the data from the Headers to be the keyword and the data from the rows as its value.

<table width="100%" cellPadding="4" cellSpacing="0" align="right" style="table-layout:fixed;word-break:break-all;border-width:1pt">
<tr bgcolor="gray">
<td width="10%" style="color: Yellow"><b>Date<br>and Time</b></td>
<td width="20%" style="color: Yellow"><b>Thread</b></td>
<td width="8%" style="color: Yellow"><b>Login</b></td>
<td width="7%" style="color: Yellow"><b>IP</b></td>
<td width="5%" style="color: Yellow"><b>Type</b></td>
<td width="20%" style="color: Yellow"><b>Method</b></td>
<td width="30%" style="color: Yellow"><b>Message</b></td>
</tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.018</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(333)</td><td>Invalid request: Remote host: 10.175.226.11, Meta Data: [Function Name: GetBugValue, Login Session ID: 1339572, Project Session ID: 1029005, Call ID: 28]. Error: The session authentication has failed..</td></tr>
<tr bgcolor="tomato"><td>Jan 18<br>18:48:36.030</td><td>WebContainer : 2</td><td>N/A</td><td>N/A</td><td>ERR</td><td>CAbsServlet.doPost(353)</td><td>&nbsp<p>com.mercury.optane.core.CTdException<p>Messages:<br>The session authentication has failed.;<br><p>Stack Trace:<br>com.mercury.optane.core.CTdException: The session authentication has failed.<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:115)<br>at com.mercury.td.tdserver.authentication.CLoginSessionDirectory.getItem(CLoginSessionDirectory.java:94)<br>at com.mercury.td.web.CAbsServlet.assertRequestValidity(CAbsServlet.java:209)<br>at com.mercury.td.web.CAbsServlet.doPost(CAbsServlet.java:330)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:763)<br>at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1213)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1154)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:145)<br>at com.hp.qc.core.utils.gzipfilter.GZIPFilter.doFilter(GZIPFilter.java:30)<br>at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:190)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:130)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterChain._doFilter(WebAppFilterChain.java:87)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:848)<br>at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:691)<br>at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:654)<br>at com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:526)<br>at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90)<br>at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:764)<br>at com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1478)<br>at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:133)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:457)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:515)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:300)<br>at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:271)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)<br>at com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)<br>at com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)<br>at com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)<br>at com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)<br>at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:136)<br>at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:196)<br>at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:751)<br>at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:881)<br>at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1551)<br></td></tr>

NOTE: I tried to paste this code in so that it would display but the whole table does not display. To see the rendered code, you will need to copy and paste into a file to feed to your browser. If anyone knows how to make the whole table display here, I'd like to know the way to make it so.

rgcurry
Contributor

I have requested that info from the primary contact for this application group. Will post as soon as I get it.

0 Karma

araitz
Splunk Employee
Splunk Employee

Can you please post a sanitized example? That would certainly help us help you.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...