Getting Data In

Why is one indexed field only giving me a multivalue containing "none" in addition to the field value?

andrewtrobec
Motivator

Hello,

I am receiving cloud data from AWS via HEC in JSON format but I am having trouble getting the "timestamp" field to index properly.  Here is a simplified sample JSON:

{
  metric_name: UnHealthyHostCount
  namespace: AWS/ApplicationELB
  timestamp: 1678204380000
}

In order to index I created the following sourcetype which has been replicated to HF, IDX cluster, and SH:

[aws:sourcetype]
SHOULD_LINEMERGE = false
TRUNCATE = 8388608
TIME_PREFIX = \"timestamp\"\s*\:\s*\"
TIME_FORMAT = %s%3N
TZ = UTC
MAX_TIMESTAMP_LOOKAHEAD = 40
KV_MODE = json

The event data gets indexed without issue, but I noticed that the "timestamp" field seems to be indexed as a multivalue containing the epoch as above, but also the value "none". I thought it had to do with indexed extractions, but it is the only field that displays this behaviour. Here is the table:

screenshot.png

Any ideas on how to get the data indexed without the "none" value?

Thank you and best regards,

Andrew

 

0 Karma
1 Solution

Tom_Lundie
Contributor

The _time field not extracting properly is going to be an independent problem from the none value in your search. The _time extraction occurs independently of field extractions, it purely looks at the _raw event data.

Starting with the _time problem, can you try the following props, at a minimum this will need to be set on the HF which is running the HEC collector.

 

[sourcetype]
TIME_PREFIX = "timestamp":\s*
TIME_FORMAT = %s%3N
TZ = UTC
MAX_TIMESTAMP_LOOKAHEAD = 40

 

As for the "none" mvfield issue, there must be a field extraction running somewhere that's extracting the timestamp field, as well as your JSON field extraction.

Let's start by working out if timestamp is an indexed field. We can check the tsidx file using the following search:

 

| tstats values(timestamp) where index=my_index sourcetype=my_sourcetype source=my_source by index

 

If that search returns a values(timestamp) field then your timestamp field is being extracted at index_time, you need to focus your debugging efforts on the HF running the HEC collector, you're looking for TRANSFORMS-xxxx or INGEST_EVAL entries in props. If that search returns nothing then this is a search_time field extraction and you need to focus on the SH. Here, you're looking for EVAL-xxxx, REPORT-xxxx or FIELDALIAS-xxxx.

 

/opt/splunk/bin/splunk btool props list my_sourcetype

 

Look through the output of this command, are any of the field-extractions running that were mentioned above?

If not, maybe this is a source/host based extraction, look through the output of the following commands looking for a reference to your data (note this could be a regex based stanza, so there is no perfect way to search through here.)

 

/opt/splunk/bin/splunk props list source::
/opt/splunk/bin/splunk props list host::

 

View solution in original post

Tom_Lundie
Contributor

Hi Andrew,

Firstly, from what you have shared so far, there is no reason to suspect that Splunk will be extracting the timestamp field separately. Can you make sure you've shared all of your relevant props.conf / transforms.conf entries and can you also please share an obfuscated sample of the entire JSON without removing any of the JSON syntax?

Secondly, are you sure that your search is not doing anything it shouldn't be? Can you share the search that you're using to generate this table?

And finally, your timestamp extraction is not working looking at:

 

TIME_PREFIX = \"timestamp\"\s*\:\s*\"

 

 Are you sure you need that last \", typically JSON numbers are not surrounded by quotes.

0 Karma

andrewtrobec
Motivator

@Tom_Lundie Hello Tom, thank you for the response.

You've hit the nail on the head, the badly associated _time is the problem I'm trying to solve.  When I started to investigate I noticed that the table produced a multivalue timestamp value where one value was "none".  That's what I assumed is causing the problem which is why I'm trying to figure out how to get rid of it.

As requested:

Entire JSON

{
   account_id: 1234567
   dimensions: {
     LoadBalancer: app/ingress/9c6692fe5eb2b0b8
     TargetGroup: targetgroup/report-89f6283838/917587a99b43effd
}
   metric_name: UnHealthyHostCount
   metric_stream_name: Cert
   namespace: AWS/ApplicationELB
   region: ca-central-1
   timestamp: 1678214700000
   unit: Count
   value: {
      count: 3
      max: 0
      min: 0
      sum: 0
   }
}

props.conf

[sourcetype]
SHOULD_LINEMERGE = false
TRUNCATE = 8388608
TIME_PREFIX = \"timestamp\"\s*\:\s*\"
TIME_FORMAT = %s%3N
TZ = UTC
MAX_TIMESTAMP_LOOKAHEAD = 40
KV_MODE = json
 
Search
 
index=index_name metric_name=UnHealthyHostCount namespace=AWS/ApplicationELB
| table _time timestamp namespace metric_name
 
I don't have any additional config files, but I can tell you that the sourcetype is a slightly modified version of one of the sourcetypes contained in the Splunk Add-on for Amazon Web Services TA.
 
If it helps, I'm running Splunk Enterprise 8.2.6 with HF pointed to IDX cluster (3) and a single SH.  The sourcetype is located on all components in a specific sourcetypes app (it only contains sourcetypes that I deploy wherever I need them).
 
Thanks for taking the time.
 
Regards,
 
Andrew
0 Karma

Tom_Lundie
Contributor

The _time field not extracting properly is going to be an independent problem from the none value in your search. The _time extraction occurs independently of field extractions, it purely looks at the _raw event data.

Starting with the _time problem, can you try the following props, at a minimum this will need to be set on the HF which is running the HEC collector.

 

[sourcetype]
TIME_PREFIX = "timestamp":\s*
TIME_FORMAT = %s%3N
TZ = UTC
MAX_TIMESTAMP_LOOKAHEAD = 40

 

As for the "none" mvfield issue, there must be a field extraction running somewhere that's extracting the timestamp field, as well as your JSON field extraction.

Let's start by working out if timestamp is an indexed field. We can check the tsidx file using the following search:

 

| tstats values(timestamp) where index=my_index sourcetype=my_sourcetype source=my_source by index

 

If that search returns a values(timestamp) field then your timestamp field is being extracted at index_time, you need to focus your debugging efforts on the HF running the HEC collector, you're looking for TRANSFORMS-xxxx or INGEST_EVAL entries in props. If that search returns nothing then this is a search_time field extraction and you need to focus on the SH. Here, you're looking for EVAL-xxxx, REPORT-xxxx or FIELDALIAS-xxxx.

 

/opt/splunk/bin/splunk btool props list my_sourcetype

 

Look through the output of this command, are any of the field-extractions running that were mentioned above?

If not, maybe this is a source/host based extraction, look through the output of the following commands looking for a reference to your data (note this could be a regex based stanza, so there is no perfect way to search through here.)

 

/opt/splunk/bin/splunk props list source::
/opt/splunk/bin/splunk props list host::

 

andrewtrobec
Motivator

@Tom_Lundie Thank you again for the response, I appreciate you taking the time.

I updated the sourcetype, deployed it, both problems solved: timestamp is no longer multivalue with "none" value and _time is associated correctly to the epochms value in the payload!

Before making the change I ran the tstats search you provided and it returned value "none".  After I made the change and ran it again the value was null.  Then I tabled the results as before and the multivalue is no longer there (I also added timestamp_human and index_time to provide additional understanding):

screenshot.png

Thank you so much for the suggestions.  Accepting your response as the answer, specifically the sourcetype.

Best regards,

Andrew

Tom_Lundie
Contributor

@andrewtrobec you're very welcome, glad to hear this helped.


Interesting, there must've been a rogue index_time field extraction that caused this. Starting afresh with a new sourcetype makes sense.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

MAX_TIMESTAMP_LOOKAHEAD seems also way too low at 40 bytes. And you should not need backslashes to escape quotes.

andrewtrobec
Motivator

Thanks for the input @PickleRick 

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...