Getting Data In

How to parse and index a litmonth in other language? (ex: ago (agosto) is August in Spanish)

MacaVergara
New Member

The date I'm trying to index is in a field inside of each row within a log, and looks like this:

Time Field
ago 31,2015 02:01:18 PM

"ago" is for agosto (August in English). In other words "ago" is the litmonth for august in Spanish language.

Every time I'm trying to index a litmonth that is different from English language, it doesn't catch them.

I tried to configure the datetime.xml like this:

<define name="_litmonth2"  extract="litmonth">
     <text><![CDATA[(?<![\d\w])(ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic)[a-z,\.;]*]]></text>
</define>

<define name="_otherdate" extract="litmonth, ignored_sep, day, zone, ignored_sep2, year">
     <text><![CDATA[(?<!\w|\d[:\.\-])]]></text>
     <use name="_litmonth2"/> 
         <text><![CDATA[([/\- ]) {0,2}]]></text>
         <use name="_day"/>
         <text><![CDATA[(?!:) {0,2}(?:\d\d:\d\d:\d\d(?:[\.\,]\d+)? {0,2}]]></text>
     <use name="_zone"/>
         <text><![CDATA[)?((?:\2|,) {0,2}]]></text>
         <use name="_year"/> 
         <text><![CDATA[)?(?!/|\w|\.\d)]]></text>
</define>

<timePatterns>
      <use name="_time"/>
      <use name="_hmtime"/>
      <use name="_hmtime"/>
      <use name="_dottime"/>
      <use name="_combdatetime"/>
      <use name="_utcepoch"/>
      <use name="_combdatetime2"/>
</timePatterns>
<datePatterns>
      <use name="_otherdate"/>
</datePatterns>

This actually catch the litmonth ago in the indexing time like the month number 8:

alt text

But when is indexed, then it shows this:

alt text

As you can see, it not the same date than before.

¿Any clues?

0 Karma

benjamin1337
Engager

You could try extracting the correct month in a new field (during index time) and then map that field onto _time (during search time).

So in props, you would have:

[testsourcetype]
TRANSFORMS-correctMonths = fixJanuar, fixFebruar, ...
EVAL-_time = strptime(correct_date,"%d.%m.%Y")

And in transforms.conf you would have:

[fixJanuar]
REGEX = (\d+). Januar (\d+)
FORMAT = correct_date::$1.01.$2
WRITE_META = true

[fixFebruar]
REGEX = (\d+). Februar (\d+)
FORMAT = correct_date::$1.02.$2
WRITE_META = true

...

This works fine for German timestamps in a format like: 03. Januar 2015

For Spanish, it needs to be adopted slightly (also to take time into consideration).

Best, Benjamin

jeffland
SplunkTrust
SplunkTrust

Just to quickly leave a note here for anyone googling this: using INGEST_EVAL in transforms.conf and targeting _time with := to overwrite it, this makes it even better than EVAL-searchtime_things because the event will be saved with a correct timestamp, meaning you'll find your events where you expect.

0 Karma

LukeMcfly3
Explorer

Thank you very much Benjamin. This does in fact work. The timestamp won't be recognized in the preview but as soon as it ist indexed the correct time stamp is shown.

0 Karma

jmallorquin
Builder

Hi,

Maybe a easy way to correct this problem is set 12 sedcmd with all possible months:

SEDCMD-AGOSTO = s/^ago/8/
SEDCMD-SEPTIEMBRE = s/^sep/9/
.....
0 Karma

LukeMcfly3
Explorer

Unfortunately Timestamp Extraction takes place before SED

0 Karma

jmallorquin
Builder

Thanks,

I answer without testing and now i can say that you are right.

Have you try a HF to do this change?

Regards,

0 Karma

LukeMcfly3
Explorer

Hi MacaVergara

Have you found a solution for this.
I'm stuck on the same problem. I replicated your solution for german months

<datetime>

<define name="_year" extract="year">
    <text><![CDATA[(20\d\d|19\d\d|[901]\d(?!\d))]]></text>
</define>

<define name="_day"  extract="day">
    <text><![CDATA[(0?[1-9]|[12]\d|3[01])]]></text> 
</define>


 <define name="_litmonth2"  extract="litmonth">
      <text><![CDATA[(?<![\d\w])(Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember)[a-z,\.;]*]]></text>
 </define>


<define name="_irequestdate"  extract="day, litmonth, year">
     <text><![CDATA[(?<![\d\w])]]></text>
     <use name="_day"/>
     <text><![CDATA[\. ]]></text>
     <use name="_litmonth2"/>
     <text><![CDATA[ ]]></text>
         <use name="_year"/>     
</define>


<datePatterns>
      <use name="_irequestdate"/>
</datePatterns>

</datetime>

But I don't even get as far as to see the stamp beeing recognized correctly.

Jeremiah
Motivator

Just to clarify, it catches the correct date in the preview, but not when you actually index the data?

0 Karma

LukeMcfly3
Explorer

Unfortunately It doesn't even recognize the correct date format in the preview.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...