Splunk Search

Transform, but only when matching this RE

serialmonkey
Path Finder

I get lots of data from various systems via syslog. One of my systems sends me data that looks like this

HEADERTEXT: name=value;name=value;name=value.......

I have a generic transform written to extract the name, value pairs. The problem is, I have other data that looks like this

SOMEOTHERHEADER: http://www.blah.com/servlet?name=value;name=value

What I am finding is that the name/value extract from my first transform is getting applied to data from the second as well. WHat I would like todo is, somehow in the props.conf say

"Only apply this stanza if this RE is matched". I would then put the RE as "HEADERTEXT".

Anyone have any pointers on if something like this is possible ? I can't put HEADERTEXT in the RE in the transform.conf as it's a recursive RE for extracting multiple kv's/

Here are some samples, plus my matching RE's from transform.conf. As you can see, the User-Agent in the first example (DATA1) actually causes the data to match both REGEX1 and REGEX2, causing the data to be tagged with both sourcetypes.

DATA1

Aug 2 21:54:32 10.1.2.3 tmm[1853]: Rule syslog_http : HTTP,10.1.2.4:5804,vs_https_oursite,4.4.4.3:49788,oururl.com,/somepath,10.1.2.5:7001,302,2,http://somewhere.gov/,GET,'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; aff-kingsoft-ciba; staticlogin:product=cboxf09&act=login&info=ZmlsZW5hbWU9UG93ZXJ3b; SE 2.X)',''

REGEX1

tmm[\d+]: Rule syslog_http <(?:HTTP_(?:RESPONSE|REQUEST)|LB_FAILED)>: (?:HTTP|HTTP-ERROR|LB-ERROR),([\d|.]+):([\d]+),([\w]+),([\d|.]+):([\d]+),([\w\d:.-]+),([^,?]+)(\?[^,]),(?:([\d|.]+):([\d]+))?,([\d]+),([\d]),([^,]),([^,]),'([^,])','([^,]*)'

DATA2

Aug 2 01:30:01 10.120.17.247 user:01:30:02.019 INFO SummaryData - SUMMARY:name1=value1;name2=value2;name3=value3;

REGEX2

([_a-z]+)=([^;]+);

Tags (1)
1 Solution

serialmonkey
Path Finder

Thanks for the answer.

I can't split the data based on host unfortunatly.

What I have currently is two entries in my props.conf - both with a [syslog] stanza representing the syslog input type (as you guessed). Both of these link to seperate TRANSFORMS.

The problem I have is that in some cases I have data which matches both transforms. What I see in this case when I search for this data is that the sourcetype attribute actually appears three times on that search result (once for sourcetype=syslog, and additionally for the other two transforms).

One of my regex's is a name=value style regex. The other is more concrete. I guess what I can do is add a component to my concrete regex that blocks it matching the name=value style data. It's messy, but should work.

View solution in original post

0 Karma

serialmonkey
Path Finder

Thanks for the answer.

I can't split the data based on host unfortunatly.

What I have currently is two entries in my props.conf - both with a [syslog] stanza representing the syslog input type (as you guessed). Both of these link to seperate TRANSFORMS.

The problem I have is that in some cases I have data which matches both transforms. What I see in this case when I search for this data is that the sourcetype attribute actually appears three times on that search result (once for sourcetype=syslog, and additionally for the other two transforms).

One of my regex's is a name=value style regex. The other is more concrete. I guess what I can do is add a component to my concrete regex that blocks it matching the name=value style data. It's messy, but should work.

View solution in original post

0 Karma

serialmonkey
Path Finder

In the end, I just to work more complexity into my RE's. Unfortunatly using the funky name=value style RE's can match more than you intend, especially if you push lots of data in via the same input (i.e. syslog)

0 Karma

serialmonkey
Path Finder

samples added above

0 Karma

Genti
Splunk Employee
Splunk Employee

show some sample data and the two regexes in props/transforms.

0 Karma

hexx
Splunk Employee
Splunk Employee

Your problem seems to be that both of your field extractions are applied using the same spec, which I imagine is the syslog sourcetype.

Take a look at your props.conf to find out what spec is being used for that extraction.

http://www.splunk.com/base/Documentation/latest/Knowledge/Createandmaintainsearch-timefieldextractio...

[] EXTRACT- =

* <spec> can be:
      o <sourcetype>, the source type of an event.
      o host::<host>, where <host> is the host for an event.
      o source::<source>, where <source> is the source for an event. 

If it is indeed based on the [sourcetype] spec, then you have two options :

  • If each set of events (HEADERTEXT, SOMEHEADERTEXT) are coming from different hosts, then you could simply define one field extraction for each with [host] as the discriminating spec.

  • If you are unable to differentiate by host, then you may have to stick with [sourcetype] as the discriminating spec, in which case I suggest that you set up a regex-based sourcetype override in order to assign a custom sourcetype to each type of event.

http://www.splunk.com/base/Documentation/latest/Admin/Advancedsourcetypeoverrides#Configuration

Then, you can re-write your extractions to be specific to the sourcetypes you just defined. Make sure that the sourcetype assignment happens before the field extraction in transforms.conf.

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!