Splunk Search

Single field not always extracted, but appears when piping into "extract"

spock_yh
Path Finder

I have set up a search-time field extraction. The extraction extracts a bunch of fields from a URL in a log file.

My problem is that for one of these fields, some events contain it and others do not, with no apparent reason. Here are two such examples. The first manages to extract the field, the second doesn't:

1.1.9.1 - [20/Mar/2011:17:39:37 -0700] 15625 "some.web.site" GET "/myaccount/videos/B004CZXC54.flv" "" 307 - "medusa" "-" "Python-urllib/2.6" "2.2.2.2"

1.1.9.1 - [20/Mar/2011:18:10:45 -0700] 0 "some.web.site" GET "/myaccount/videos/B003QMJAXM.flv" "" 307 - "medusa" "-" "Python-urllib/2.6" "2.2.2.2"

The field I'm trying to extract is the one corresponding to the "myaccount" part. As you can see, the two events are extremely similar - but the first doesn't show the field, the second does.

The odd thing about this is that: * If I pipe my search into | extract reload=T, I can see the missing field for all results. * There are a number of fields after this missing field (for the "videos" part, "B003QM.." part, "flv" part, etc) that are extracted fine.

The original regular expression was quite complex but I stripped it down to something simple that still shows the problem:

 /(?<medusa_account_alias>[^/]+)/(?<medusa_restype>videos|images)

The problem field is the medusa_account_alias field. The fields following it seem to be extracted ok.

Any ideas will be greatly appreciated, is this some kind of bug in splunk or am I missing something?

Tags (1)
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Can you please provide the props.conf/transforms.conf stanzas that are responsible for performing the extractions and field aliasing?

spock_yh
Path Finder

The problem is caused by a field alias I have defined.

What I want is to have medusa_account_alias filled either from the above regex, or from another field ("accountId") extracted for another format of the log row. I used an alias from accountId to medusa_account_alias, and this caused the problem.

How do I achieve this otherwise? Having a field that can get filled by two disjoint cases?

Also, this doesn't explain why splunk's behavior was so arbitrary - why would it generate medusa_account_alias for one event and not for the other?

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...