Getting Data In

How to parse and index a litmonth in other language? (ex: ago (agosto) is August in Spanish)

MacaVergara
New Member

The date I'm trying to index is in a field inside of each row within a log, and looks like this:

Time Field
ago 31,2015 02:01:18 PM

"ago" is for agosto (August in English). In other words "ago" is the litmonth for august in Spanish language.

Every time I'm trying to index a litmonth that is different from English language, it doesn't catch them.

I tried to configure the datetime.xml like this:

<define name="_litmonth2"  extract="litmonth">
     <text><![CDATA[(?<![\d\w])(ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic)[a-z,\.;]*]]></text>
</define>

<define name="_otherdate" extract="litmonth, ignored_sep, day, zone, ignored_sep2, year">
     <text><![CDATA[(?<!\w|\d[:\.\-])]]></text>
     <use name="_litmonth2"/> 
         <text><![CDATA[([/\- ]) {0,2}]]></text>
         <use name="_day"/>
         <text><![CDATA[(?!:) {0,2}(?:\d\d:\d\d:\d\d(?:[\.\,]\d+)? {0,2}]]></text>
     <use name="_zone"/>
         <text><![CDATA[)?((?:\2|,) {0,2}]]></text>
         <use name="_year"/> 
         <text><![CDATA[)?(?!/|\w|\.\d)]]></text>
</define>

<timePatterns>
      <use name="_time"/>
      <use name="_hmtime"/>
      <use name="_hmtime"/>
      <use name="_dottime"/>
      <use name="_combdatetime"/>
      <use name="_utcepoch"/>
      <use name="_combdatetime2"/>
</timePatterns>
<datePatterns>
      <use name="_otherdate"/>
</datePatterns>

This actually catch the litmonth ago in the indexing time like the month number 8:

alt text

But when is indexed, then it shows this:

alt text

As you can see, it not the same date than before.

¿Any clues?

0 Karma

benjamin1337
Engager

You could try extracting the correct month in a new field (during index time) and then map that field onto _time (during search time).

So in props, you would have:

[testsourcetype]
TRANSFORMS-correctMonths = fixJanuar, fixFebruar, ...
EVAL-_time = strptime(correct_date,"%d.%m.%Y")

And in transforms.conf you would have:

[fixJanuar]
REGEX = (\d+). Januar (\d+)
FORMAT = correct_date::$1.01.$2
WRITE_META = true

[fixFebruar]
REGEX = (\d+). Februar (\d+)
FORMAT = correct_date::$1.02.$2
WRITE_META = true

...

This works fine for German timestamps in a format like: 03. Januar 2015

For Spanish, it needs to be adopted slightly (also to take time into consideration).

Best, Benjamin

jeffland
SplunkTrust
SplunkTrust

Just to quickly leave a note here for anyone googling this: using INGEST_EVAL in transforms.conf and targeting _time with := to overwrite it, this makes it even better than EVAL-searchtime_things because the event will be saved with a correct timestamp, meaning you'll find your events where you expect.

0 Karma

LukeMcfly3
Explorer

Thank you very much Benjamin. This does in fact work. The timestamp won't be recognized in the preview but as soon as it ist indexed the correct time stamp is shown.

0 Karma

jmallorquin
Builder

Hi,

Maybe a easy way to correct this problem is set 12 sedcmd with all possible months:

SEDCMD-AGOSTO = s/^ago/8/
SEDCMD-SEPTIEMBRE = s/^sep/9/
.....
0 Karma

LukeMcfly3
Explorer

Unfortunately Timestamp Extraction takes place before SED

0 Karma

jmallorquin
Builder

Thanks,

I answer without testing and now i can say that you are right.

Have you try a HF to do this change?

Regards,

0 Karma

LukeMcfly3
Explorer

Hi MacaVergara

Have you found a solution for this.
I'm stuck on the same problem. I replicated your solution for german months

<datetime>

<define name="_year" extract="year">
    <text><![CDATA[(20\d\d|19\d\d|[901]\d(?!\d))]]></text>
</define>

<define name="_day"  extract="day">
    <text><![CDATA[(0?[1-9]|[12]\d|3[01])]]></text> 
</define>


 <define name="_litmonth2"  extract="litmonth">
      <text><![CDATA[(?<![\d\w])(Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember)[a-z,\.;]*]]></text>
 </define>


<define name="_irequestdate"  extract="day, litmonth, year">
     <text><![CDATA[(?<![\d\w])]]></text>
     <use name="_day"/>
     <text><![CDATA[\. ]]></text>
     <use name="_litmonth2"/>
     <text><![CDATA[ ]]></text>
         <use name="_year"/>     
</define>


<datePatterns>
      <use name="_irequestdate"/>
</datePatterns>

</datetime>

But I don't even get as far as to see the stamp beeing recognized correctly.

Jeremiah
Motivator

Just to clarify, it catches the correct date in the preview, but not when you actually index the data?

0 Karma

LukeMcfly3
Explorer

Unfortunately It doesn't even recognize the correct date format in the preview.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...