Hello,
While parsing the logs, I'm trying to extract fields, but at some point, I receive the following message "The extraction failed. If you are extracting multiple fields, try removing one or more fields. Start with extractions that are embedded within longer text strings." Even when I try to highlight the fields that it fails to extract, I get the same message.
Could this issue be related to the configuration file "limits.conf"
No, limits.conf has noting to do with the fact that the "graphical" extractor can guess your settings in only relatively basic cases (and can do it well in even rarer ones). If your events are properly formated xml entities, you should rather use KV_MODE=xml in your sourcetype settings.
@PickleRickcould you share some documentation on this ? Or is it enough to add this in the sourcetypes configuration in the inputs configuration file, if I'm not mistaken ?
No. Inputs is one thing. Props for sourcetype is another. Where to put it depends on your installation architecture. I strongly suspect you have an all-in-one installation so unless you're using a HF to ingest this data it should be enough to add a KV_MODE parameter with a value of xml to your sourcetype definition.
@PickleRickno, the installation architecture is a distributed, non-clustered deployment, and I do not use a HF.
Then KV_MODE must be defined on the search-head.
We need more information.
How exactly are you trying to extract the fields? Have you tried any other ways to extract them?
Please share a sanitized example event and indicate which fields you wish to extract.
Hello @richgalloway
Here is an example (not complete), but for instance, when I try to extract the event ID, the user 'bob', and the time, I cannot do it for everything. Moreover, it doesn't extract from all events, so I try to do it manually, and it shows me another error.
{
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service Control Manager'/><EventID Qualifiers='16384'>7036</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x8080000000000000</Keywords><TimeCreated SystemTime='2014-04-24T18:38:37.868683300Z'/><EventRecordID>412598</EventRecordID><Correlation/><Execution ProcessID='192' ThreadID='210980'/><Channel>System</Channel> <Computer>TEST</Computer><Security/></System><EventData><Data Name='SubjectUserSid'>S18</Data><Data Name='SubjectUserName'>BOB</Data><Data Name='SubjectDomainName'>GOZ</Data><Data Name='SubjectLogonId'>x0</Data><Data Name='TargetUserSid'>s20</Data><Data Name='TargetUserName'>BOBT</Data><Data Name='TargetDomainName'>TESTTGT</Data><Data Name='TargetLogonId'>x0</Data><Data Name='LogonType'>x</Data><Data
}
We still don't know *how* you are trying to extract fields. The erex command and the extraction wizard struggle with complex events so consider using one of the commands suggested by @yuanliu .
If you only need a few fields, you may have some luck using the rex command to extract them.
@richgalloway I am trying to extract them using RegEx. I select the event, choose Action, the Extract Fields, and select the method of extraction by regular expression.
This is structured data in XML. Splunk's extraction tool is either regex or delimiter based and will not be robust. I recommend that you forget about extraction, just run spath or xmlkv at search time.
@yuanliu Yes, I had to convert them to XML, so that I could extract the fields I needed. The logs are in French, and I was having issues parsing them
This is confusing. Could you explain "convert them?" Do you mean the raw events are not in XML? In that case, could you share raw events? Also, French should not stop Splunk as long as it is encoded in UTF-8 or another compatible scheme.
@yuanliuPlease find below an example when logs are generated in French, which causes issues during field extraction. This is why I converted them to XML to see if it could resolve the language problem. Do you have any other solutions to this issue, please?
04/29/2014 02:50:23 PM LogName=Security SourceName=Microsoft Windows security auditing. EventCode=4672 EventType=0 Type=Information ComputerName=sacreblue TaskCategory=Ouverture de session spéciale OpCode=Informations RecordNumber=2746 Keywords=Succès de l'audit Message=Privilèges spéciaux attribués à la nouvelle ouverture de session. Sujet : ID de sécurité : AUTORITE NT\Système Nom du compte : Système Domaine du compte : AUTORITE NT ID d'ouverture de session : 0x3e7 Privilèges : SeAssignPrimaryTokenPrivilege SeTcbPrivilege SeSecurityPrivilege SeTakeOwnershipPrivilege SeLoadDriverPrivilege SeBackupPrivilege SeRestorePrivilege SeDebugPrivilege SeAuditPrivilege SeSystemEnvironmentPrivilege SeImpersonatePrivilege
Ok. This is a windows event. Normal approach to this kind of events would be to ingest them as XML using renderXml=true setting in input(s) and use TA_windows to parse them.
Yes. The data is organized in KV pairs. What is different is that it uses two different connectors, "=" and ":". It also does not quote the value. So, I am not sure if automatic extraction is feasible. But at search time, you can simply do
| kv pairdelim="
" kvdelim="=:"
Your sample data will give the following fields:
field name | field value |
ComputerName | sacreblue |
Domaine_du_compte | AUTORITE NT |
EventCode | 4672 |
EventType | 0 |
ID de sécurité | AUTORITE NT\Système |
ID_d_ouverture_de_session | 0x3e7 |
Keywords | Succès de l'audit |
LogName | Security |
Message | Privilèges spéciaux attribués à la nouvelle ouverture de session. |
Nom_du_compte | Système |
OpCode | Informations |
Privilèges | SeAssignPrimaryTokenPrivilege |
RecordNumber | 2746 |
SourceName | Microsoft Windows security auditing. |
TaskCategory | Ouverture de session spéciale |
Type | Information |
Here is an emulation that you can play with and compare with real data
| makeresults
| eval _raw = "04/29/2014 02:50:23 PM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4672
EventType=0
Type=Information
ComputerName=sacreblue
TaskCategory=Ouverture de session spéciale
OpCode=Informations
RecordNumber=2746
Keywords=Succès de l'audit
Message=Privilèges spéciaux attribués à la nouvelle ouverture de session.
Sujet :
ID de sécurité : AUTORITE NT\Système
Nom du compte : Système
Domaine du compte : AUTORITE NT
ID d'ouverture de session : 0x3e7
Privilèges : SeAssignPrimaryTokenPrivilege
SeTcbPrivilege
SeSecurityPrivilege
SeTakeOwnershipPrivilege
SeLoadDriverPrivilege
SeBackupPrivilege
SeRestorePrivilege
SeDebugPrivilege
SeAuditPrivilege
SeSystemEnvironmentPrivilege
SeImpersonatePrivilege"
``` data emulation above ```
I still have a question about your conversion to XML. Do you mean that you use an external tool to convert that raw text into XML before ingesting into Splunk? If you have this option, why not convert the raw text into JSON for which Splunk has better support?
Hello @yuanliu,
Yes, but often I encounter events like this (just an example)
01/01/2014 11:10:38 AM LogName=Security EventCode=4625 EventType=0 ComputerName=TestY SourceName=Microsoft Windows security auditing. Type=Information RecordNumber=2746 Keywords=Échec de l’audit TaskCategory=Ouverture de session OpCode=Informations Message= Echec d'ouverture de session d'un compte. Sujet : ID de sécurité : S-0 Nom du compte : - Domaine du compte : - ID d’ouverture de session : 0x0 Type d’ouverture de session : 3 Compte pour lequel l’ouverture de session a échoué : ID de sécurité : S-0 Nom du compte : Albert Domaine du compte : -
When I try to display the logs in statistics, it shows one event with a user (-) and another event with a user (Albert), even though it is a single event. This happens because it extracts the account name in the "Subject" section and also in the "Logon Type" section.
Regarding your question, for the conversion to XML, no, I just modified the configuration by adding 'renderXml=1'
Yes, if Windows offer the option to renderXml, using it is better than plain text. Either way,, you need to parse with search command.
As to this event, you do need to use semantics to present such data. When you say the message is in French, do you mean you have difficulty understanding the language? If so, seek assistance on that. This is a security failure during an account login. The account of significance is Albert. Maybe set up an extraction after the verb, like
| rex "Compte pour lequel l’ouverture de session .+ : ID de sécurité : (?<securityID>\S+)\s+Nom du compte : (?<accountName>\S+)\s+Domaine du compte : (?<accountDomain>\S+)"
However, if you see two separate events in Splunk when original event is one, there may be a line breaker problem. Fix that first. (XML can make line breaking more robust.)