Getting Data In

How to parse and index a litmonth in other language? (ex: ago (agosto) is August in Spanish)

MacaVergara
New Member

The date I'm trying to index is in a field inside of each row within a log, and looks like this:

Time Field
ago 31,2015 02:01:18 PM

"ago" is for agosto (August in English). In other words "ago" is the litmonth for august in Spanish language.

Every time I'm trying to index a litmonth that is different from English language, it doesn't catch them.

I tried to configure the datetime.xml like this:

<define name="_litmonth2"  extract="litmonth">
     <text><![CDATA[(?<![\d\w])(ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic)[a-z,\.;]*]]></text>
</define>

<define name="_otherdate" extract="litmonth, ignored_sep, day, zone, ignored_sep2, year">
     <text><![CDATA[(?<!\w|\d[:\.\-])]]></text>
     <use name="_litmonth2"/> 
         <text><![CDATA[([/\- ]) {0,2}]]></text>
         <use name="_day"/>
         <text><![CDATA[(?!:) {0,2}(?:\d\d:\d\d:\d\d(?:[\.\,]\d+)? {0,2}]]></text>
     <use name="_zone"/>
         <text><![CDATA[)?((?:\2|,) {0,2}]]></text>
         <use name="_year"/> 
         <text><![CDATA[)?(?!/|\w|\.\d)]]></text>
</define>

<timePatterns>
      <use name="_time"/>
      <use name="_hmtime"/>
      <use name="_hmtime"/>
      <use name="_dottime"/>
      <use name="_combdatetime"/>
      <use name="_utcepoch"/>
      <use name="_combdatetime2"/>
</timePatterns>
<datePatterns>
      <use name="_otherdate"/>
</datePatterns>

This actually catch the litmonth ago in the indexing time like the month number 8:

alt text

But when is indexed, then it shows this:

alt text

As you can see, it not the same date than before.

¿Any clues?

0 Karma

benjamin1337
Engager

You could try extracting the correct month in a new field (during index time) and then map that field onto _time (during search time).

So in props, you would have:

[testsourcetype]
TRANSFORMS-correctMonths = fixJanuar, fixFebruar, ...
EVAL-_time = strptime(correct_date,"%d.%m.%Y")

And in transforms.conf you would have:

[fixJanuar]
REGEX = (\d+). Januar (\d+)
FORMAT = correct_date::$1.01.$2
WRITE_META = true

[fixFebruar]
REGEX = (\d+). Februar (\d+)
FORMAT = correct_date::$1.02.$2
WRITE_META = true

...

This works fine for German timestamps in a format like: 03. Januar 2015

For Spanish, it needs to be adopted slightly (also to take time into consideration).

Best, Benjamin

jeffland
SplunkTrust
SplunkTrust

Just to quickly leave a note here for anyone googling this: using INGEST_EVAL in transforms.conf and targeting _time with := to overwrite it, this makes it even better than EVAL-searchtime_things because the event will be saved with a correct timestamp, meaning you'll find your events where you expect.

0 Karma

LukeMcfly3
Explorer

Thank you very much Benjamin. This does in fact work. The timestamp won't be recognized in the preview but as soon as it ist indexed the correct time stamp is shown.

0 Karma

jmallorquin
Builder

Hi,

Maybe a easy way to correct this problem is set 12 sedcmd with all possible months:

SEDCMD-AGOSTO = s/^ago/8/
SEDCMD-SEPTIEMBRE = s/^sep/9/
.....
0 Karma

LukeMcfly3
Explorer

Unfortunately Timestamp Extraction takes place before SED

0 Karma

jmallorquin
Builder

Thanks,

I answer without testing and now i can say that you are right.

Have you try a HF to do this change?

Regards,

0 Karma

LukeMcfly3
Explorer

Hi MacaVergara

Have you found a solution for this.
I'm stuck on the same problem. I replicated your solution for german months

<datetime>

<define name="_year" extract="year">
    <text><![CDATA[(20\d\d|19\d\d|[901]\d(?!\d))]]></text>
</define>

<define name="_day"  extract="day">
    <text><![CDATA[(0?[1-9]|[12]\d|3[01])]]></text> 
</define>


 <define name="_litmonth2"  extract="litmonth">
      <text><![CDATA[(?<![\d\w])(Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember)[a-z,\.;]*]]></text>
 </define>


<define name="_irequestdate"  extract="day, litmonth, year">
     <text><![CDATA[(?<![\d\w])]]></text>
     <use name="_day"/>
     <text><![CDATA[\. ]]></text>
     <use name="_litmonth2"/>
     <text><![CDATA[ ]]></text>
         <use name="_year"/>     
</define>


<datePatterns>
      <use name="_irequestdate"/>
</datePatterns>

</datetime>

But I don't even get as far as to see the stamp beeing recognized correctly.

Jeremiah
Motivator

Just to clarify, it catches the correct date in the preview, but not when you actually index the data?

0 Karma

LukeMcfly3
Explorer

Unfortunately It doesn't even recognize the correct date format in the preview.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...