I want to index 'earthquake' data. Source is "https://earthquake.usgs.gov/fdsnws/event/1/query?format=xml&starttime=2014-01-01&endtime=2014-01-02&...".
My first step was downloading data and try to upload per splunk-web the data once.
After building the correct parameterset (on the gui), i build a props.conf for the indexer.
Put the props.conf in the right place on the indexer the result is different. It seems that the parameter PREAMBLE_REGEX doesn't work on my indexers.
Details:
Splunk Version 7.0.0
Splunk Build c8a78efdd40f
Searchhead, two indexers and to forwarders.
Earthquake-Data on forwarder-one
Filemonitoring on earthquake-Data works fine
props.conf on both indexers
[mg_earthquake_data]
BREAK_ONLY_BEFORE = </event>
DATETIME_CONFIG =
NO_BINARY_CHECK = true
PREAMBLE_REGEX = <[?qe][x:v][mqe][lun][ at][vkP]
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3Q
TIME_PREFIX = <time><value>
category = Miscellaneous
disabled = false
pulldown_type = true
example-Earthquake-Data - you can see, that the first three lines appear as on event.
11/12/17
3:33:15.000 PM
<?xml version="1.0" encoding="UTF-8"?>
<q:quakeml xmlns="http://quakeml.org/xmlns/bed/1.2" xmlns:anss="http://anss.org/xmlns/event/0.1" xmlns:catalog="http://anss.org/xmlns/catalog/0.1" xmlns:q="http://quakeml.org/xmlns/quakeml/1.2">
<eventParameters publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?format=xml&starttime=2017-11-10T182126&endtime=2017-11-12T153312&minmagnitude=-0.9">
host = www
source = /opt/www/earthquakeData
sourcetype = mg_earthquake_data
11/12/17
2:26:05.650 PM
<event catalog:datasource="nc" catalog:eventsource="nc" catalog:eventid="72923246" publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=nc72923246&format=quakeml"><description><type>earthquake name</type><text>13km ESE of Mammoth Lakes, California</text></description><origin catalog:datasource="nc" catalog:dataid="nc72923246" catalog:eventsource="nc" catalog:eventid="72923246" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml"><time><value>2017-11-12T14:26:05.650Z</value></time><longitude><value>-118.8258362</value></longitude><latitude><value>37.6068344</value></latitude><depth><value>3690</value><uncertainty>390</uncertainty></depth><originUncertainty><horizontalUncertainty>280</horizontalUncertainty><preferredDescription>horizontal uncertainty</preferredDescription></originUncertainty><quality><usedPhaseCount>21</usedPhaseCount><usedStationCount>21</usedStationCount><standardError>0.04</standardError><azimuthalGap>128</azimuthalGap><minimumDistance>0.02469</minimumDistance></quality><evaluationMode>automatic</evaluationMode><creationInfo><agencyID>NC</agencyID><creationTime>2017-11-12T14:27:41.430Z</creationTime><version>0</version></creationInfo></origin><magnitude catalog:datasource="nc" catalog:dataid="nc72923246" catalog:eventsource="nc" catalog:eventid="72923246" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml#magnitude"><mag><value>2.04</value><uncertainty>0.16</uncertainty></mag><type>md</type><stationCount>18</stationCount><originID>quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml</originID><evaluationMode>automatic</evaluationMode><creationInfo><agencyID>NC</agencyID><creationTime>2017-11-12T14:27:41.430Z</creationTime></creationInfo></magnitude><preferredOriginID>quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml</preferredOriginID><preferredMagnitudeID>quakeml:earthquake.usgs.gov/archive/product/origin/nc72923246/nc/1510496861430/product.xml#magnitude</preferredMagnitudeID><type>earthquake</type><creationInfo><agencyID>nc</agencyID><creationTime>2017-11-12T14:30:04.241Z</creationTime><version>0</version></creationInfo></event>
host = www
source = /opt/www/earthquakeData
sourcetype = mg_earthquake_data
11/12/17
2:07:44.660 PM
<event catalog:datasource="ak" catalog:eventsource="ak" catalog:eventid="17185809" publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak17185809&format=quakeml"><description><type>earthquake name</type><text>59km SSW of Deltana, Alaska</text></description><origin catalog:datasource="ak" catalog:dataid="AK17185809" catalog:eventsource="ak" catalog:eventid="17185809" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml"><time><value>2017-11-12T14:07:44.660Z</value></time><longitude><value>-145.5431</value></longitude><latitude><value>63.3555</value></latitude><depth><value>0</value><uncertainty>300</uncertainty></depth><originUncertainty><horizontalUncertainty>0</horizontalUncertainty><preferredDescription>horizontal uncertainty</preferredDescription></originUncertainty><quality><usedPhaseCount>19</usedPhaseCount><standardError>0.82</standardError></quality><evaluationMode>automatic</evaluationMode><creationInfo><creationTime>2017-11-12T14:15:35.560Z</creationTime><version>1</version></creationInfo></origin><magnitude catalog:datasource="ak" catalog:dataid="AK17185809" catalog:eventsource="ak" catalog:eventid="17185809" publicID="quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml#magnitude"><mag><value>2.1</value></mag><type>ml</type><originID>quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml</originID><evaluationMode>automatic</evaluationMode><creationInfo><creationTime>2017-11-12T14:15:35.560Z</creationTime></creationInfo></magnitude><preferredOriginID>quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml</preferredOriginID><preferredMagnitudeID>quakeml:earthquake.usgs.gov/archive/product/origin/AK17185809/ak/1510496135560/product.xml#magnitude</preferredMagnitudeID><type>earthquake</type><creationInfo><agencyID>ak</agencyID><creationTime>2017-11-12T14:15:35.560Z</creationTime><version>1</version></creationInfo></event>
Please tell me, where the failure is. I'm not very interested in alternatives to PREAMBLE_REGEX.
The other parameters, for example Time-Extraction and event-isolation works fine.
I want to understand the difference between the gui-generated props.conf and the mechanism on splunk indexer
The answer is ...
... a mistake in my REGEX. There had been missing the escape-character in front of <(!)
So the right syntax is
PREAMBLE_REGEX=\<[?qe][x:v][mqe][lun][ at][vkP]
But there is a different behavior between REG-Interpretation over GUI-Data-Import and the entry in the props.conf!
Over the GUI the \ isn't needed.
Additions 1
For those, who wants to know, how the right idea had came to me.
After many hours intensive work and study and use of btool my eye was catched by the fact, that for example the default-Entry for TIME_PREFIX is often
TIME_PREFIX = \[
So i had had the idea, that there is the same mechanism in my REGEX!
Addition 2
I organized my work in different trials, Between these trials i clean the index mg_earthquake, to get an unique system.
After i had found the solution, i had made an opposite trial, with the wrong REGEX (without the escape-character). And again i had put the right solution, but had forgotten the cleaning. Surprisingly for me the right solution didn't work!
After restart the indexer (cleaning the index is not possible, when splunk is running) IT WORKS!
One explanation is for me, that there is some communication between forwarder and indexer and the indexer held this information until restart.
Addition 3
Many thanks to MuS, who had inspired me to continue this (for me) hard work!
Addition 4
props.conf is on the forwarder-one!
So, i had tried out MuS hint, but now nothing had been indexed.
So, i have to think about it, but no idea.. at the moment.
I found a even easier one, use the https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2014-01-01&endtime=2014-01-02&...
csv option and Splunk uses sourcetype=csv
on it's own.
Also remember, that data once indexed will not be re-indexed by Splunk.
Hello MuS, thank you again for your contribution!
At the moment i will study the concept of splunk's Datapipeline and so i had found a couple of hints, that parsing on Universal Forwarders is reduced to the parameter INDEXED_EXTRACTIONS. (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Configurationparametersandthedatapipeline#). I will verify this today and post another comment about this.
At the end, when nothing will work, i follow your hint to csv 🙂
Hello, because i was busy the last days, i am only now able to go further with this thread.
I had tried out the whole props.conf on the forwarder-one, but it doesn't work.
The first, second and third event were still indexed.
1.
2.
3.
Again i modify the props.conf and had tried it only on the forwarder and then i had tried it on the indexers. For sure i restart forwarder and indexers after every modification.
But the result had been always the same.
The new props.conf is now:
[mg_earthquake_data]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3Q
TIME_PREFIX=
TZ=UTC
PREAMBLE_REGEX=<[?qe][x:v][mqe][lun][ at][vkP]
Now i will try the hint from MuS with INDEXED_EXTRACTIONS = TSV and will tell the expirience.
Sorry, because of the escape-sequences the 3 events doesn't appear.
1.
<?xml version="1.0" encoding="UTF-8"?>
2.
3.
Can you provide more detail on what exactly is not working?
You are using PREAMBLE_REGEX
which is an input time setting according to the docs
* This feature and all of its settings apply at input time, when data is
first read by Splunk. The setting is used on a Splunk system that has
configured inputs acquiring the data.
So the answer could be to move the props.conf
setting to the forwarder-one which reads the data.
It did work in the UI because in this case the UI Splunk instance was the first one reading the file.
cheers, MuS
First of all: thx to MuS for his comment. I hope he (or she) bring me on the right way.
I tried out to promote 'ONLY' the PREAMBLE_REGEX separated in a props.conf to the forwarder.
In the props.conf on the indexer i commented this parameter out.
But the result is disappointing. Many earthquake-events now appears in one event. So i think, it wasn't a good idea, to split the parameters from props.conf between the both indexer and the one forwarder.
Tomorrow i'll try to promote the hole props.conf from the indexers to the one forwarder.
My expectation to the result is, that only the earthquake-data will appear as result of a search. The 'three lines above' shouldn't appear in my opinion. The aren't earthquake-data, but metadata for it.
I mean the data, beginning with
Why not use the tab separated API endpoint https://earthquake.usgs.gov/fdsnws/event/1/query?format=text&starttime=2014-01-01&endtime=2014-01-02...
and then configure the props.conf
on the forwarder to use:
INDEXED_EXTRACTIONS = TSV
Nothing else should be needed beside that, even the time stamp should be discovered by default.
cheers, MuS