Getting Data In

Why PREAMBLE_REGEX doesn't work on my indexer?

a101755
Explorer

I want to index 'earthquake' data. Source is "https://earthquake.usgs.gov/fdsnws/event/1/query?format=xml&starttime=2014-01-01&endtime=2014-01-02&...".

My first step was downloading data and try to upload per splunk-web the data once.
After building the correct parameterset (on the gui), i build a props.conf for the indexer.
Put the props.conf in the right place on the indexer the result is different. It seems that the parameter PREAMBLE_REGEX doesn't work on my indexers.
Details:
Splunk Version 7.0.0
Splunk Build c8a78efdd40f

Searchhead, two indexers and to forwarders.
Earthquake-Data on forwarder-one
Filemonitoring on earthquake-Data works fine

props.conf on both indexers

[mg_earthquake_data]
BREAK_ONLY_BEFORE = </event>
DATETIME_CONFIG = 
NO_BINARY_CHECK = true
PREAMBLE_REGEX = <[?qe][x:v][mqe][lun][ at][vkP]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3Q
TIME_PREFIX = <time><value>
category = Miscellaneous
disabled = false
pulldown_type = true

example-Earthquake-Data - you can see, that the first three lines appear as on event.

11/12/17
3:33:15.000 PM  
<?xml version="1.0" encoding="UTF-8"?>
<q:quakeml xmlns="http://quakeml.org/xmlns/bed/1.2" xmlns:anss="http://anss.org/xmlns/event/0.1" xmlns:catalog="http://anss.org/xmlns/catalog/0.1" xmlns:q="http://quakeml.org/xmlns/quakeml/1.2">
<eventParameters publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?format=xml&amp;starttime=2017-11-10T182126&amp;endtime=2017-11-12T153312&amp;minmagnitude=-0.9">

    host =  www 
    source =    /opt/www/earthquakeData 
    sourcetype =    mg_earthquake_data  

    11/12/17
2:26:05.650 PM  
<event catalog:datasource="nc" catalog:eventsource="nc" catalog:eventid="72923246" publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=nc72923246&amp;format=quakeml"><description><type>earthquake name</type><text>13km ESE of Mammoth Lakes, California</text></description><origin catalog:datasource="nc" catalog:dataid="nc72923246" catalog:eventsource="nc" catalog:eventid="72923246" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml"><time><value>2017-11-12T14:26:05.650Z</value></time><longitude><value>-118.8258362</value></longitude><latitude><value>37.6068344</value></latitude><depth><value>3690</value><uncertainty>390</uncertainty></depth><originUncertainty><horizontalUncertainty>280</horizontalUncertainty><preferredDescription>horizontal uncertainty</preferredDescription></originUncertainty><quality><usedPhaseCount>21</usedPhaseCount><usedStationCount>21</usedStationCount><standardError>0.04</standardError><azimuthalGap>128</azimuthalGap><minimumDistance>0.02469</minimumDistance></quality><evaluationMode>automatic</evaluationMode><creationInfo><agencyID>NC</agencyID><creationTime>2017-11-12T14:27:41.430Z</creationTime><version>0</version></creationInfo></origin><magnitude catalog:datasource="nc" catalog:dataid="nc72923246" catalog:eventsource="nc" catalog:eventid="72923246" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml#magnitude"><mag><value>2.04</value><uncertainty>0.16</uncertainty></mag><type>md</type><stationCount>18</stationCount><originID>quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml</originID><evaluationMode>automatic</evaluationMode><creationInfo><agencyID>NC</agencyID><creationTime>2017-11-12T14:27:41.430Z</creationTime></creationInfo></magnitude><preferredOriginID>quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml</preferredOriginID><preferredMagnitudeID>quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml#magnitude</preferredMagnitudeID><type>earthquake</type><creationInfo><agencyID>nc</agencyID><creationTime>2017-11-12T14:30:04.241Z</creationTime><version>0</version></creationInfo></event>

    host =  www 
    source =    /opt/www/earthquakeData 
    sourcetype =    mg_earthquake_data  

    11/12/17
2:07:44.660 PM  
<event catalog:datasource="ak" catalog:eventsource="ak" catalog:eventid="17185809" publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak17185809&amp;format=quakeml"><description><type>earthquake name</type><text>59km SSW of Deltana, Alaska</text></description><origin catalog:datasource="ak" catalog:dataid="AK17185809" catalog:eventsource="ak" catalog:eventid="17185809" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml"><time><value>2017-11-12T14:07:44.660Z</value></time><longitude><value>-145.5431</value></longitude><latitude><value>63.3555</value></latitude><depth><value>0</value><uncertainty>300</uncertainty></depth><originUncertainty><horizontalUncertainty>0</horizontalUncertainty><preferredDescription>horizontal uncertainty</preferredDescription></originUncertainty><quality><usedPhaseCount>19</usedPhaseCount><standardError>0.82</standardError></quality><evaluationMode>automatic</evaluationMode><creationInfo><creationTime>2017-11-12T14:15:35.560Z</creationTime><version>1</version></creationInfo></origin><magnitude catalog:datasource="ak" catalog:dataid="AK17185809" catalog:eventsource="ak" catalog:eventid="17185809" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml#magnitude"><mag><value>2.1</value></mag><type>ml</type><originID>quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml</originID><evaluationMode>automatic</evaluationMode><creationInfo><creationTime>2017-11-12T14:15:35.560Z</creationTime></creationInfo></magnitude><preferredOriginID>quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml</preferredOriginID><preferredMagnitudeID>quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml#magnitude</preferredMagnitudeID><type>earthquake</type><creationInfo><agencyID>ak</agencyID><creationTime>2017-11-12T14:15:35.560Z</creationTime><version>1</version></creationInfo></event>

Please tell me, where the failure is. I'm not very interested in alternatives to PREAMBLE_REGEX.
The other parameters, for example Time-Extraction and event-isolation works fine.
I want to understand the difference between the gui-generated props.conf and the mechanism on splunk indexer

0 Karma

a101755
Explorer

The answer is ...
... a mistake in my REGEX. There had been missing the escape-character in front of <(!)
So the right syntax is

PREAMBLE_REGEX=\<[?qe][x:v][mqe][lun][ at][vkP]

But there is a different behavior between REG-Interpretation over GUI-Data-Import and the entry in the props.conf!
Over the GUI the \ isn't needed.

Additions 1
For those, who wants to know, how the right idea had came to me.
After many hours intensive work and study and use of btool my eye was catched by the fact, that for example the default-Entry for TIME_PREFIX is often

TIME_PREFIX = \[

So i had had the idea, that there is the same mechanism in my REGEX!

Addition 2
I organized my work in different trials, Between these trials i clean the index mg_earthquake, to get an unique system.
After i had found the solution, i had made an opposite trial, with the wrong REGEX (without the escape-character). And again i had put the right solution, but had forgotten the cleaning. Surprisingly for me the right solution didn't work!
After restart the indexer (cleaning the index is not possible, when splunk is running) IT WORKS!
One explanation is for me, that there is some communication between forwarder and indexer and the indexer held this information until restart.

Addition 3
Many thanks to MuS, who had inspired me to continue this (for me) hard work!

0 Karma

a101755
Explorer

Addition 4
props.conf is on the forwarder-one!

0 Karma

a101755
Explorer

So, i had tried out MuS hint, but now nothing had been indexed.
So, i have to think about it, but no idea.. at the moment.

0 Karma

MuS
Legend

I found a even easier one, use the https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2014-01-01&endtime=2014-01-02&... csv option and Splunk uses sourcetype=csv on it's own.

Also remember, that data once indexed will not be re-indexed by Splunk.

0 Karma

a101755
Explorer

Hello MuS, thank you again for your contribution!
At the moment i will study the concept of splunk's Datapipeline and so i had found a couple of hints, that parsing on Universal Forwarders is reduced to the parameter INDEXED_EXTRACTIONS. (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Configurationparametersandthedatapipeline#). I will verify this today and post another comment about this.
At the end, when nothing will work, i follow your hint to csv 🙂

0 Karma

a101755
Explorer

Hello, because i was busy the last days, i am only now able to go further with this thread.
I had tried out the whole props.conf on the forwarder-one, but it doesn't work.
The first, second and third event were still indexed.
1.

2.

3.

Again i modify the props.conf and had tried it only on the forwarder and then i had tried it on the indexers. For sure i restart forwarder and indexers after every modification.
But the result had been always the same.

The new props.conf is now:
[mg_earthquake_data]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3Q
TIME_PREFIX=
TZ=UTC
PREAMBLE_REGEX=<[?qe][x:v][mqe][lun][ at][vkP]

Now i will try the hint from MuS with INDEXED_EXTRACTIONS = TSV and will tell the expirience.

0 Karma

a101755
Explorer

Sorry, because of the escape-sequences the 3 events doesn't appear.
1.
<?xml version="1.0" encoding="UTF-8"?>
2.

3.

0 Karma

MuS
Legend

Can you provide more detail on what exactly is not working?
You are using PREAMBLE_REGEX which is an input time setting according to the docs

* This feature and all of its settings apply at input time, when data is
  first read by Splunk.  The setting is used on a Splunk system that has
  configured inputs acquiring the data.

So the answer could be to move the props.conf setting to the forwarder-one which reads the data.
It did work in the UI because in this case the UI Splunk instance was the first one reading the file.

cheers, MuS

0 Karma

a101755
Explorer

First of all: thx to MuS for his comment. I hope he (or she) bring me on the right way.

I tried out to promote 'ONLY' the PREAMBLE_REGEX separated in a props.conf to the forwarder.
In the props.conf on the indexer i commented this parameter out.
But the result is disappointing. Many earthquake-events now appears in one event. So i think, it wasn't a good idea, to split the parameters from props.conf between the both indexer and the one forwarder.

Tomorrow i'll try to promote the hole props.conf from the indexers to the one forwarder.

My expectation to the result is, that only the earthquake-data will appear as result of a search. The 'three lines above' shouldn't appear in my opinion. The aren't earthquake-data, but metadata for it.
I mean the data, beginning with

0 Karma

MuS
Legend

Why not use the tab separated API endpoint https://earthquake.usgs.gov/fdsnws/event/1/query?format=text&starttime=2014-01-01&endtime=2014-01-02... and then configure the props.conf on the forwarder to use:

 INDEXED_EXTRACTIONS = TSV

Nothing else should be needed beside that, even the time stamp should be discovered by default.

cheers, MuS

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...