Splunk Search

Why am I unable to search my extracted fields?

hmasten
Explorer

I'm trying to ingest airwatch syslog events but not all fields are searchable only those with Field=Value in the message are searchable. The logs contain two different kv formats in the syslog events, those with Field: Value and Field=Value. In the first half of the message the values run into the next field name which I believe is the part that Splunk is having trouble with.
In my example the first kv pair is Event Type: Console with my field extractions in place the fields and values appear correct, however searching for EventType=Console yields no results.

I've implemented regex extractions via inline extractions as a single field extraction with capture groups per field and each field as a separate extraction. I tried using delimiters via the field extraction wizard, this allows me to search like normal but the values include the next field name just like the logs, it requires regex or something further to tell Splunk where the value should end. I figured I'm missing something here, just not sure if using regex for field extractions this way is correct if there's some other piece that is required when using regex for search-time field extractions.

Mar  9 13:52:33 10.0.2.24  March 09 19:52:33 AirWatch  AirWatch Syslog Details are as follows Event Type: ConsoleEvent: DeviceDataModfiedUser: domain\JoeUser Source: ServerEvent Module: DashboardEvent Category: DeviceEvent Data: Device=User iPhone iOS 10.2.1 X0XX;DeviceData=OwnerGroup;LoginSessionID=xx1xxx0xxxxx

my greedy regex.

EXTRACT-Device = Device=(?P<Device>[^ ]+);
EXTRACT-DeviceData = DeviceData=(?P<DeviceData>[^ ]+);
EXTRACT-Event = Event:\s+(?P<Event>\w+)User
EXTRACT-EventCategory = Category:\s+(?P<EventCategory>\w+)Event
EXTRACT-EventModule = Module:\s+(?P<EventModule>\w+)Event
EXTRACT-EventType = Event Type:\s(?P<EventType>\S+)Event
EXTRACT-User = User:\s+(?P<User>[^ ]+)Event
EXTRACT-EventSource = Source:\s+(?P<EventSource>\w+)Event
0 Karma
1 Solution

hmasten
Explorer

I finally fixed it with a fields.conf in etc/system/local for INDEXED_VALUE = false.

I had been putting fields.conf in the local folder of my app and NOT etc/system/local. that's the answer.

View solution in original post

hmasten
Explorer

I finally fixed it with a fields.conf in etc/system/local for INDEXED_VALUE = false.

I had been putting fields.conf in the local folder of my app and NOT etc/system/local. that's the answer.

gvmorley
Contributor

Hi,

One approach you could try is to rule out the 'unknown' characters problem by trying your extractions as rex commands first.

I tried this with:

| makeresults | fields - _time
| eval _raw="Mar  9 13:52:33 10.0.2.24  March 09 19:52:33 AirWatch  AirWatch Syslog Details are as follows Event Type: ConsoleEvent: DeviceDataModfiedUser: domain\JoeUser Source: ServerEvent Module: DashboardEvent Category: DeviceEvent Data: Device=User iPhone iOS 10.2.1 X0XX;DeviceData=OwnerGroup;LoginSessionID=xx1xxx0xxxxx"
| rex "Device=(?P<Device>[^;]+);"
| rex "DeviceData=(?P<DeviceData>[^;]+);"
| rex "Event:\s+(?P<Event>\w+)User"
| rex "Category:\s+(?P<EventCategory>\w+)Event"
| rex "Module:\s+(?P<EventModule>\w+)Event"
| rex "Event\sType:\s+(?P<EventType>\w+)Event"
| rex "User:\s+(?P<User>[^\s]+)"
| rex "Source:\s+(?P<EventSource>\w+)Event"

A number of your regex statements were slightly off, so I've modified them. This now returns:

alt text

I'm not 100% sure it would be the cause, but I'd avoid using the 'space' character in the regex, especially when you put them in props.conf I always go with the \s just to be explicit.

For example. I've avoid this:

EXTRACT-EventType = Event Type:\s+(?P<EventType>\S+)Event

And go instead with:

EXTRACT-EventType = Event\sType:\s+(?P<EventType>\w+)Event

It should be fine, it's more just force of habit for me.

If the rex command work but then don't when you move them over to the props.conf file, you can check the config for that sourcetype with btool.

For example, if the sourcetype for your data was 'extract-test' you could run the command:

./splunk btool props list extract-test --debug

This will give you all of the props config for that sourcetype and which file the config is coming from. It would look a bit like this:

/Splunk/etc/system/local/props.conf   [extract-test]
/Splunk/etc/system/default/props.conf ANNOTATE_PUNCT = True
/Splunk/etc/system/default/props.conf AUTO_KV_JSON = true
/Splunk/etc/system/default/props.conf BREAK_ONLY_BEFORE = 
/Splunk/etc/system/default/props.conf BREAK_ONLY_BEFORE_DATE = True
/Splunk/etc/system/default/props.conf CHARSET = UTF-8
/Splunk/etc/system/default/props.conf DATETIME_CONFIG = /etc/datetime.xml
/Splunk/etc/system/local/props.conf   EXTRACT-Device = Device=(?P<Device>[^;]+);
/Splunk/etc/system/local/props.conf   EXTRACT-DeviceData = DeviceData=(?P<DeviceData>[^;]+);
/Splunk/etc/system/local/props.conf   EXTRACT-Event = Event:\s+(?P<Event>\w+)User
/Splunk/etc/system/local/props.conf   EXTRACT-EventCategory = Category:\s+(?P<EventCategory>\w+)Event
/Splunk/etc/system/local/props.conf   EXTRACT-EventModule = Module:\s+(?P<EventModule>\w+)Event
/Splunk/etc/system/local/props.conf   EXTRACT-EventSource = Source:\s+(?P<EventSource>\w+)Event
/Splunk/etc/system/local/props.conf   EXTRACT-EventType = Event\sType:\s+(?P<EventType>\w+)Event
/Splunk/etc/system/local/props.conf   EXTRACT-User = User:\s+(?P<User>[^\s]+)
/Splunk/etc/system/default/props.conf HEADER_MODE = 
/Splunk/etc/system/default/props.conf LEARN_MODEL = true
/Splunk/etc/system/default/props.conf LEARN_SOURCETYPE = true
/Splunk/etc/system/default/props.conf LINE_BREAKER_LOOKBEHIND = 100
/Splunk/etc/system/default/props.conf MATCH_LIMIT = 100000
/Splunk/etc/system/default/props.conf MAX_DAYS_AGO = 2000
/Splunk/etc/system/default/props.conf MAX_DAYS_HENCE = 2
/Splunk/etc/system/default/props.conf MAX_DIFF_SECS_AGO = 3600
/Splunk/etc/system/default/props.conf MAX_DIFF_SECS_HENCE = 604800
/Splunk/etc/system/default/props.conf MAX_EVENTS = 256
/Splunk/etc/system/default/props.conf MAX_TIMESTAMP_LOOKAHEAD = 128
/Splunk/etc/system/default/props.conf MUST_BREAK_AFTER = 
/Splunk/etc/system/default/props.conf MUST_NOT_BREAK_AFTER = 
/Splunk/etc/system/default/props.conf MUST_NOT_BREAK_BEFORE = 
/Splunk/etc/system/default/props.conf SEGMENTATION = indexing
/Splunk/etc/system/default/props.conf SEGMENTATION-all = full
/Splunk/etc/system/default/props.conf SEGMENTATION-inner = inner
/Splunk/etc/system/default/props.conf SEGMENTATION-outer = outer
/Splunk/etc/system/default/props.conf SEGMENTATION-raw = none
/Splunk/etc/system/default/props.conf SEGMENTATION-standard = standard
/Splunk/etc/system/default/props.conf SHOULD_LINEMERGE = True
/Splunk/etc/system/default/props.conf TRANSFORMS = 
/Splunk/etc/system/default/props.conf TRUNCATE = 10000
/Splunk/etc/system/default/props.conf detect_trailing_nulls = false
/Splunk/etc/system/default/props.conf maxDist = 100
/Splunk/etc/system/default/props.conf priority = 
/Splunk/etc/system/default/props.conf sourcetype =

Have a go with some of the tweaked regular expressions via rex first, then move them into props.conf, restart then check with btool and see where you're at.

Your approach is fine and should work, there's just going to be a small niggle tripping you up somewhere!

With the regex and props.conf above, I think I get the result you're looking for in my test:

alt text

0 Karma

hmasten
Explorer

thanks for the input, I've been testing this out with REX commands in search but the results are still different from my results with props.conf and transforms.conf.

I'm currently using the following regex pattern in transforms with field aliases for event and user, since the regex cuts off the first letter on those fields with the pattern below. Regardless I am still unable to search for some values. In this example, searching for domain\JoeUser has been working with my regex, but in another set of events where User=sysadmin it's not working and those make up 99% of the logs I have, strange that's it is different behavior with what would be expected the same outcome.

(?<_KEY_1>\w+)(:\s)(?<_VAL_1>[a-zA-Z0-9\\]+)(U|E)

I just tested with your props.conf, noticed the User field was grabbing the string "Event" from the data following the user value, I was scrubbing the data when I pasted it to answers and removed the trailing "Event" string from the user value, here's another example with no sensitive data.

Mar 9 13:52:32 10.0.2.24 March 09 19:52:32 AirWatch AirWatch Syslog Details are as follows Event Type: DeviceEvent: RemoveProfileRequestedUser: sysadminEvent Source: ServerEvent Module: DashboardEvent Category: CommandEvent Data: Profile=iOS Visual Privacy Webclip

Notice no spaces here from airwatch to separate one value from the next field/key. It's like this by default in the airwatch syslog settings.
{Event Type}{Event}{User}{Event Source}{Event Module}{Event Category}{Event Data}

I added this to your EXTRACT-User line:
EXTRACT-User = User:\s+(?P<User>[^\s]+)Event

now adding User=sysadmin gives me no results, when I should have results.

0 Karma

gvmorley
Contributor

Hi,

Looks like the logging format for these events is a bit of a pain!

I'm not 100% sure your:

(?<_KEY_1>\w+)(:\s)(?<_VAL_1>[a-zA-Z0-9\\]+)(U|E)

Is going to work for you? What if one of the values you're trying to capture (like a Username), ends in a U or and E?

I also noticed that the Keys in the 'Event Data' are different in the two event samples. You're also going to want to cater for this and the associated spaces. The default Splunk Key/Value extractor is only going to get you:

Profile=iOS

Whereas you probably want:

Profile=iOS Visual Privacy Webclip

Ultimately you may just need to keep refining your regex(s) until you cater for 100% of your data. I get that this is pretty dull...

And whilst I'd always go with the 'keep it simple' wherever possible, you may end up with something slightly more complex.

For example, I don't like it, but this below works for both of your sample data events:

| makeresults 
| eval _raw="Mar  9 13:52:33 10.0.2.24  March 09 19:52:33 AirWatch  AirWatch Syslog Details are as follows Event Type: ConsoleEvent: DeviceDataModfiedUser: domain\JoeUserEvent Source: ServerEvent Module: DashboardEvent Category: DeviceEvent Data: Device=User iPhone iOS 10.2.1 X0XX;DeviceData=OwnerGroup;LoginSessionID=xx1xxx0xxxxx" 
| append 
    [| makeresults 
    | eval _raw="Mar 9 13:52:32 10.0.2.24 March 09 19:52:32 AirWatch AirWatch Syslog Details are as follows Event Type: DeviceEvent: RemoveProfileRequestedUser: sysadminEvent Source: ServerEvent Module: DashboardEvent Category: CommandEvent Data: Profile=iOS Visual Privacy Webcli"]
| fields - _time 
| rex "Event\sType:\s(?<EventType>[^\s]+)Event:\s(?<Event>[^\s]+)User:\s(?<User>[^\s]+)Event\sSource:\s(?<EventSource>[^\s]+)Event\sModule:\s(?<EventModule>[^\s]+)Event\sCategory:\s(?<EventCategory>[^\s]+)Event\sData:\s(?<EventData>(?:(?:Device=(?<Device>[^;\r\n]+);?)|(?:DeviceData=(?<DeviceData>[^;\r\n]+);?)|(?:LoginSessionID=(?<LoginSessionID>[^;\r\n]+);?)|(?:Profile=(?<Profile>[^;\r\n]+);?))*)"
| table EventType Event User EventSource EventModule EventCategory EventData Device DeviceData LoginSessionID Profile

You can play and test with it here: https://regex101.com/r/GEpAOP/1/

I.e. You may need to expand the (?<EventData>) grouping to cater for different Keys in that section.

Good luck - I hope you find a much simpler way!

0 Karma

hmasten
Explorer

again, REX gives different results than using the EXTRACTs in props and the REGEX patterns in transforms. I understand you're using it for testing I appreciate the effort but it's not the same in reality for some strange reason. I can extract the fields using your patterns all day but I'm not able to search these events on the fields extracted using props and transforms.

I think I'm going to remove all the field extractions and add REX to all my saved searches with this data because extractions aside the whole point would be to search on those extracted fields, which is my issue with Splunk at the moment and I've been able to search a field once it's REXed.

I'm pretty sure you should be able to search an extracted field so this has got to be a bug.

0 Karma

hmasten
Explorer

so i just tested using only REX, search works and after some typos I have the results in an email alert, so as I expected REX will do to get these alerts and this is my workaround.

0 Karma

gvmorley
Contributor

Hi,

I'm pleased to hear that you've got a workaround.

But you should still be able to get the EXTRACTs working; there's nothing wrong with your approach.

As such, it might be worth raising a Support Case with Splunk to try and get to the bottom of what's going on.

Just incase there is a bug, or something else is happening within your setup. You don't want to come across it down the line!

0 Karma

somesoni2
Revered Legend

You can setup delimiter based extraction by updating props.conf and transforms.conf. See this
https://www.splunk.com/blog/2008/02/12/delimiter-based-key-value-pair-extraction/

props.conf (on search heads)

[yoursourceytpe]
REPORT-colonfields = colon_delimited_fields

transforms.conf (on search heads)

[colon_delimited_fields]
DELIMS = " ", ":"
0 Karma

DalJeanis
Legend

Seems like there must be some special characters in there that are not appearing in your question. perhaps tab characters?

Event Type: ConsoleEvent: DeviceDataModfiedUser: domain\JoeUser Source: ServerEvent Module: DashboardEvent Category: DeviceEvent Data: Device=User iPhone iOS 10.2.1 X0XX;DeviceData=OwnerGroup;LoginSessionID=xx1xxx0xxxxx

parses to my eyes as

Event Type: ((null)
ConsoleEvent:  ((null))
DeviceDataModfiedUser: domain\JoeUser 
Source: ServerEvent 
Module: DashboardEvent 
Category: DeviceEvent 
Data: Device=User iPhone iOS 10.2.1 X0XX;DeviceData=OwnerGroup;LoginSessionID=xx1xxx0xxxxx

It seems likely that there is a special character or whitespace before User:, and between Console and Event:. Putting single spaces in thos spots would result in this parse...

 Event Type: Console
 Event:  DeviceDataModfied
 User: domain\JoeUser 
 Source: ServerEvent 
 Module: DashboardEvent 
 Category: DeviceEvent 
 Data: Device=User iPhone iOS 10.2.1 X0XX;DeviceData=OwnerGroup;LoginSessionID=xx1xxx0xxxxx

And, I'm not sure whether the system would index "Event Type" or "Type".

0 Karma

hmasten
Explorer

so I would format the logs at ingestion? and then I would be indexing fields, correct?

I was hoping to avoid index-time operations if possible but if the log is garbage I can see how I'm limited to begin with. I agree if there were spaces we could use a delimiter to parse no problem.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...