Getting Data In

Why is my WS_FTP XML Log not parsing correctly?

smakovits
Explorer

I am attempting to import a ws_ftp log, but I am having issues parsing the log data. I can either get it to have no fields extracted or I end up with hundreds of entries for each event as it does not appear to break properly.

Sample log data:

 <?xml version="1.0" encoding="utf-8" ?>
    <log>
      <entry>
        <log_time>20170214-10:59:58</log_time>
        <description><![CDATA[Authentication succeed]]></description>
        <service>COM_API</service>
        <sessionid>00000001</sessionid>
        <type>0</type>    <severity>1</severity>
        <user>test</user>
        <host>ftp</host>
        <cmd>Login</cmd>
        <sguid>278AA2E9-04A9-4484-9EAC-DF1EACBDF372</sguid>
      </entry>
      <entry>
        <log_time>20170214-11:01:39</log_time>
        <description><![CDATA[Created user test on host ftp]]></description>
        <service>COM_API</service>
        <sessionid>00000001</sessionid>
        <type>0</type>    <severity>1</severity>
        <user>test</user>
        <host>ftp</host>
        <cmd>CreateUser</cmd>
        <sguid>278AA2E9-04A9-4484-9EAC-DF1EACBDF372</sguid>
      </entry>
      <entry>
        <log_time>20170214-11:01:39</log_time>
        <description><![CDATA[User test sysadmin set to TRUE on host ftp]]></description>
        <service>COM_API</service>
        <sessionid>00000001</sessionid>
        <type>0</type>    <severity>1</severity>
        <user>test</user>
        <host>ftp</host>
        <cmd>SetSysAdmin</cmd>
        <sguid>278AA2E9-04A9-4484-9EAC-DF1EACBDF372</sguid>
      </entry>
    </log>

From some post I have created a props.conf file of:

 [WS_FTP]
    TIME_PREFIX = \<log_time\>
    TIME_FORMAT = %Y\%m\%d-%H:%M:%S
    SHOULD_LINEMERGE = false
    LINE_BREAKER = \>\s*(?=\<entry\>)
    REPORT-xmlext = xml-extr

and a transforms.conf:

[xml-extr]
REGEX = <([^>]+)>([^<]*)<\/\1>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true

I need to have each entry listed with the associated data as opposed to what i am getting now where there is an event for: , 278AA2E9-04A9-4484-9EAC-DF1EACBDF372, etc.

It seems to be right there, but still something is not working. I have tried without the transforms and only that props.conf, but that too yields similar results, so any help in getting each "entry" properly extracted would be much appreciated.

0 Karma
1 Solution

coltwanger
Contributor

Try using KV_MODE=xml in props.conf and remove your transforms.conf. This appears to be working fine for me:

[WS_FTP]
BREAK_ONLY_BEFORE = <entry>
DATETIME_CONFIG = 
NO_BINARY_CHECK = true
TIME_FORMAT = %Y%m%d-%H:%M:%S
TIME_PREFIX = \<log_time\>
KV_MODE = xml

You'll end up with fields prefaced with the "entry" label, like "entry.description", "entry.sguid", etc. You'll want to play with the line breaking if you want to get rid of the preface of "entry.*".

Edit: Using your LINE_BREAKER, kind of like this:

    [WS_FTP]
    DATETIME_CONFIG = 
    KV_MODE = xml
    LINE_BREAKER = \>\s*(?=\<entry\>)
    NO_BINARY_CHECK = true
    TIME_FORMAT = %Y%m%d-%H:%M:%S
    TIME_PREFIX = \<log_time\>

View solution in original post

smakovits
Explorer

Awesome work, the LINE_BREAKER works. Before you posted that, I also figured out how to do it with the FIELDALIAS and then EVAL null the original values, like this:

#[WS_FTP]
#BREAK_ONLY_BEFORE = <entry>
#DATETIME_CONFIG = 
#NO_BINARY_CHECK = true
#TIME_FORMAT = %Y%m%d-%H:%M:%S
#TIME_PREFIX = \<log_time\>
#FIELDALIAS-rootfields = entry.log_time as Time entry.description as Description entry.user as User entry.cmd as Command entry.service as Service entry.severity as Severity entry.sguid as SGUID entry.host as Host entry.sessionid as Session_ID entry.type as Type
#EVAL-entry.log_time = null
#EVAL-entry.description = null
#EVAL-entry.user = null
#EVAL-entry.cmd = null
#EVAL-entry.service = null
#EVAL-entry.severity = null
#EVAL-entry.sguid = null
#EVAL-entry.host = null
#EVAL-entry.sessionid = null
#EVAL-entry.type = null
#KV_MODE = xml

In the end, both seem to work, but the key was the main extract so thanks a million

coltwanger
Contributor

Very cool! I've run into this with a few logs and this might help me clean them up a bit.

Thanks!

0 Karma

coltwanger
Contributor

Try using KV_MODE=xml in props.conf and remove your transforms.conf. This appears to be working fine for me:

[WS_FTP]
BREAK_ONLY_BEFORE = <entry>
DATETIME_CONFIG = 
NO_BINARY_CHECK = true
TIME_FORMAT = %Y%m%d-%H:%M:%S
TIME_PREFIX = \<log_time\>
KV_MODE = xml

You'll end up with fields prefaced with the "entry" label, like "entry.description", "entry.sguid", etc. You'll want to play with the line breaking if you want to get rid of the preface of "entry.*".

Edit: Using your LINE_BREAKER, kind of like this:

    [WS_FTP]
    DATETIME_CONFIG = 
    KV_MODE = xml
    LINE_BREAKER = \>\s*(?=\<entry\>)
    NO_BINARY_CHECK = true
    TIME_FORMAT = %Y%m%d-%H:%M:%S
    TIME_PREFIX = \<log_time\>

View solution in original post

smakovits
Explorer

super awesome, this worked. Fields extracted as you noted.

I added the below field alias, but then I ended up with both the formatted and unformatted fields. Not sure if I will keep it or just add the search time rename as noted below.

FIELDALIAS-rootfields = entry.log_time as Time entry.description as Description entry.user as User entry.cmd as Command entry.service as Service entry.severity as Severity entry.sguid as SGUID

For the record, I want to make sure I put this into the correct props.conf file. I added it under system as there is no app for ws_ftp to add it to. Or should it go under the search app?

0 Karma

coltwanger
Contributor

If you use the LINE_BREAKER in my edit, it will automatically remove the line you are breaking the events on, which in turn removes the tree'd out format of "entry.$field$" (because it no longer exists in the log).

System should be fine. I personally create a new app on a per-system basis for organizational purposes, but in search or system should be fine. If you place it in search, it's possible you won't be able to use the extractions within other apps -- in that case you'll need to share the objects in the search app globally.

0 Karma

gvmorley
Contributor

If you want to get rid of all of the entry prefixes in the field names, you can always do:

| rename entry.* AS *

Should strip off the prefix.

0 Karma

smakovits
Explorer

I was curious, is there any way to make the search time rename a permanent thing? I did the field alias, but I end up with the entry.description and Description then.

I was looking at the docs, but nothing was super obvious. Hoping this too is something simple.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!