Splunk Search

Field extraction from html email

manderson7
Contributor

I've been banging my head against the wall trying to get this to work, and not succeeding, obviously. I have a 217 line email from my power company I've ingested using the imap app, and I want to get a fields out of it. The built in field extractor will only show the first 15 lines of the email, so that's not helpful, and the regex's I've cobbled together aren't pulling out the data. The data that I want is:

 font face="Arial, Helvetica, sans-serif" size="2" color="" style="font-size: 13px">43 kWh </font

and I want the "43 kWh" from that line, and to name the field "previousday_energy_use".
and the next line I want is:

 font face="Arial, Helvetica, sans-serif" size="2" color="#3e3e3e" style="font-size: 13px">$5</font>

and I want the "$5" data.

I'd appreciate any help here.
If I do a

rex field=_raw "(?<kWh>13px\">.*?<)"   

the field created just shows

13px"><
0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

Does this work for you?:

... | rex "13px\">(?P<previousday_energy_use>\d+\skWh)" | rex "13px\">(?P<cost>\$\d+)"

View solution in original post

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Does this work for you?:

... | rex "13px\">(?P<previousday_energy_use>\d+\skWh)" | rex "13px\">(?P<cost>\$\d+)"
0 Karma

manderson7
Contributor

Yes thank you! That's it exactly. Forgot my slashes apparently.

0 Karma
Get Updates on the Splunk Community!

The OpenTelemetry Certified Associate (OTCA) Exam

What’s this OTCA exam? The Linux Foundation offers the OpenTelemetry Certified Associate (OTCA) credential to ...

From Manual to Agentic: Level Up Your SOC at Cisco Live

Welcome to the Era of the Agentic SOC   Are you tired of being a manual alert responder? The security ...

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

Welcome back to Splunk Classroom Chronicles, our ongoing series where we shine a light on what really happens ...