Splunk Dev

Field Extractions Not Working (Match Limit Exceed)

willadams
Contributor

I am getting the following regular expression failure when trying to extract field information out of a newly defined index and sourcetype. For example this is a log entry that has come through as syslog from an agent (SNARE). The event log is from Windows Sysmon and appears as such

Aug 2 14:08:41 workstation1.domain.com.au EventCaptureApplication(Desktop) - MSWinEventLog#011 - 4#011Microsoft-Windows-Sysmon/Operational#01110466#011Thu Aug 02 14:08:40 2018#0113#011Microsoft-Windows-Sysmon#011NT AUTHORITY\SYSTEM#011N/A#011Information#011WORKSTATION1.domain.com.au#011Network connection detected (rule: NetworkConnect)#011#011Network connection detected: UtcTime: 2018-08-02 06:08:38.893 ProcessGuid: {51A22926-9BE5-1C60-0000-00214F113784} ProcessId: 6789 Image: C:\Windows\SysWOW64\SearchProtocolHost.exe User: DOMAIN\USER1 Protocol: tcp Initiated: true SourceIsIpv6: false SourceIp: 192.168.1.2 SourceHostname: WORKSTATION1.domain.com.au SourcePort: 60815 SourcePortName: DestinationIsIpv6: false DestinationIp: 9.9.9.9 DestinationHostname: DestinationPort: 1234

However when I try and extract just the information I want I get the error

Error in 'rex' command: regex="(?ms)^(?:[^ \n]* ){4}(?P[^ ]+)(?:[^#\n]#){2}\d+(?P[^#]+)(?:[^#\n]#){4}\d+(?P[^#]+)(?:[^#\n]*#){5}\d+(?P\w+\s+\w+\s+\w+)" has exceeded configured match_limit, consider raising the value in limits.conf

If I manually go through and do each field, I get to a stage where I can extract up to "(rule" in the above example and the regular expression fails again.

I thought this might be extraction limit but the default maximum is 10240 characters max and there is no way the above would extend to the maxcolumns either. The number of characters above totals 796 characters (including spaces).

I am trying to see if there is a way I can code this in regex (but I am not very good with regex admittedly).

Tags (1)
0 Karma

FrankVl
Ultra Champion

Looks like you're passing this through rsyslog? You may want to set $EscapeControlCharactersOnReceive off to ensure those #011s are printed properly as tabs. Not sure if that will resolve your entire issue, but it should make parsing easier I guess.

Also: when you are posting code here, please properly format it as code, by either wrapping it in backticks: ` for short pieces, or using the 101010 button in the editor toolbar to post larger segments of code. Otherwise certain characters dissapear, which makes it very difficult to help you adjust your regexes.

Have you checked out the Splunk_TA_Windows? I believe Snare support was dropped from the latest version, but previous versions had some basic support for this snare syslog format. You may want to take inspiration from that. Snippet from that TA's transforms.conf:

# FIELDS/DELIMS extraction for tab delimitted snare data
[raw_kv_for_tab_snare]
DELIMS = "\t"
FIELDS = snare_host,snare_log_type,snare_criticality,LogName,RecordNumber,end_time,EventCode,SourceName,User,SidType,Type,ComputerName,CategoryString,DataString,Message,snare_checksum

# Message extraction for tab delimitted snare data
[Message_kv_for_tab_snare]
SOURCE_KEY = Message
REGEX = (\w[\w ]+)[:](((\x20\x20\x20\x20\x09)|(\x20\x20\x20\x20)|(\x20\x20\x20)|($))|(((\x20){1,3}(.*?))?((\x20\x20\x20\x20\x09)|(\x20\x20\x20\x20)|(\x20\x20\x20)|($))))
FORMAT = $1::$2
MV_ADD = true

Note: using this tab delimited extractions will only work if you fix those #011 codes as I mentioned at the start of my answer.

0 Karma

willadams
Contributor

In my notes under my original post I have removed the escaped characters reference so the log is just normal "tabs". I did try and create a new sourcetype based of the windows snare syslog which we have in our current version of Splunk but that didnt seem to pick anything up. I tried adding in the regex code in the original TA which seemed to get a tick when manually entering regex so I might try and explore this option a little further and see if this gives me anything further (so far it hasn't)

0 Karma

willadams
Contributor

Not sure how I can use the field extractor with that format line though...

0 Karma

FrankVl
Ultra Champion

You can't really use this code in the field extractor. If you want to re-use the code from the windows TA, you need to abandon using the GUI field extractor and work directly from the props.conf and transforms.conf files.

The TA uses a combination of TAB delimited extractions and then applies a regex to extract the specific fields and their values from the Message part.

If you rather have help working with the field extractor gui, then please share details of what you are trying there, posting it as properly formatted code and/or screenshots from the field extractor gui (masking any sensitive bits where needed).

Personally I prefer writing regular expressions myself, rather than trying to have Splunk generate them for me. The field extractor gui can be useful for testing them, but tools like regex101.com provide a lot more feedback on your regex.

0 Karma

493669
Super Champion

Use ...|rex max_match=0 "<YOUR regex>"
max_match Controls the number of times the regex is matched. use 0 to mean unlimited match

0 Karma

willadams
Contributor

I was trying to use the inbuild regex mechanisms within SPLUNK. The log I am getting is effectively a tab delimited "syslog" that comes through. For example this is another event

Aug 6 08:14:10 Wks1.domain.com Snare[Enterprise] for Windows (Desktop Only) - MSWinEventLog - 4 Microsoft-Windows-Sysmon/Operational 31214 Mon Aug 06 08:14:10 2018 1 Microsoft-Windows-Sysmon NT AUTHORITY\SYSTEM N/A Information Wks1.domain.com Process Create (rule: ProcessCreate) Process Create: UtcTime: 2018-08-06 00:14:10.428 ProcessGuid: {C0D7122D-92D2-5B67-0000-00100DAE8600} ProcessId: 11644 Image: C:\Windows\SysWOW64\cscript.exe CommandLine: "C:\WINDOWS\system32\cscript.exe" Gathestuff.bat CurrentDirectory: C:\WINDOWS\ccmcache\6s\ User: Domain\user1 LogonGuid: {V0D7888G-92FF-5N67-1234-0070589L6400} LogonId: 0x649a58 TerminalSessionId: 1 IntegrityLevel: Medium Hashes: MD5=ABBB8B9B89BB900879097BB3409ULKJ98,SHA256=490590865904506906594OIOGUOUGJUOHOUP5959590590JKKKGJGUOI88787KJ9 ParentProcessGuid: {NLLKJBJKLBLJ8998870HLNLLJK98009LKH980} ParentProcessId: 11752 ParentImage: C:\Windows\CCM\Ccm32BitLauncher.exe ParentCommandLine: "C:\WINDOWS\CCM\Ccm32BitLauncher.exe" 1 5440 496

If I use normal SPLUNK regex or even the Delimited Extraction it fails. The regex (selection mode) errors out with The extraction failed. If you are extracting multiple fields, try removing one or more fields. Start with extractions that are embedded within longer text strings.

I will try and write out a regex with this data and see if I can extract it using a limited regex command and tying it in together with the "rex max_match=0" option.

Annoyingly I have 3 other index/sourcetypes to go through and I hope I don't have to customise each one.....

0 Karma

willadams
Contributor

Okay stuck now. The main reason it seems would be the way the log is being generated and being received by the index. The events effectively get sent with the following information:

Date/Time
System
Event Count
Event ID
Source
Username
User Type
Return Code
String

The big issue here is that reviewing the log I have put above the bulk of my query is trying to break the "string" which is effectively the text from "31214" to the end. I also can find no way (yet) of trying to break this up to fields I can use.

Any suggestions welcome

0 Karma

493669
Super Champion

can you please share your raw event and expected outcome which you need to extract for better understanding and if possible share your regex query..and use 101010 button /Ctrl+k after writing raw data/query so that no special character will get escape.

0 Karma

willadams
Contributor

Noticed that the log is effectively the raw event. The raw log appears the same as what the index is seeing it as.

0 Karma

willadams
Contributor

I also tried to extract 1 by 1 and seems that works to a point before it fails. I was able to extract 2 fields and on the 3rd field attempt it failed.

0 Karma

willadams
Contributor

I am literally trying to export the string with multiple fields to search based on the field. So in the example above this line "Mon Aug 06 08:14:10 2018 1 Microsoft-Windows-Sysmon NT AUTHORITY\SYSTEM" contains

1) The EventType being "Microsoft-Windows-Sysmon"
2) The EventID being "1"
3) The user being "NT AUTHORITY\SYSTEM"
4) The full UTC time and date being "Mon Aug 06 08:14:10 2018"

There is more and more information that I need to extract. This particular log is "SYSMON".

0 Karma

493669
Super Champion

for time format extraction you can refer this
http://docs.splunk.com/Documentation/Splunk/7.0.2/Data/Configuretimestamprecognition
and if you have 3 different sourcetypes whose log format is not same then you can configure index time extraction..refer this
http://docs.splunk.com/Documentation/SplunkCloud/7.0.3/Data/Configureindex-timefieldextraction
also if its same you can use search time extraction using props.conf
here is the regex-

...|rex"^.{25}(?<EventID>\d+)\s(?<EventType>\S+)\s(?<user>\S+)"
0 Karma

willadams
Contributor

Thanks I can see your regex and will try that in the field extractor. I just have to adjust the regex as the details around the machine name are obfuscated (something more in line with wks1234.this.domain.com)

0 Karma

willadams
Contributor

When I did try and expand this using

^.{93}(?\d+) this tab shows that it is able to extract this. However I extend this to include \s(?\S+) i.e. ^.{93}(?\d+)\s(?\S+) then EventType show nothing in the resulting tabs.

Also the "Events" tab that shows what the expression is doing has everything set as "x" and no ticks with this extension.

Maybe I need to change my data source or try and get the logging application (SNARE) to try and send in multi-line format instead of the single-line format happening here.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...