Splunk Search

Transform, but only when matching this RE

Path Finder

I get lots of data from various systems via syslog. One of my systems sends me data that looks like this

HEADERTEXT: name=value;name=value;name=value.......

I have a generic transform written to extract the name, value pairs. The problem is, I have other data that looks like this

SOMEOTHERHEADER: http://www.blah.com/servlet?name=value;name=value

What I am finding is that the name/value extract from my first transform is getting applied to data from the second as well. WHat I would like todo is, somehow in the props.conf say

"Only apply this stanza if this RE is matched". I would then put the RE as "HEADERTEXT".

Anyone have any pointers on if something like this is possible ? I can't put HEADERTEXT in the RE in the transform.conf as it's a recursive RE for extracting multiple kv's/

Here are some samples, plus my matching RE's from transform.conf. As you can see, the User-Agent in the first example (DATA1) actually causes the data to match both REGEX1 and REGEX2, causing the data to be tagged with both sourcetypes.

DATA1

Aug 2 21:54:32 10.1.2.3 tmm[1853]: Rule syslog_http : HTTP,10.1.2.4:5804,vs_https_oursite,4.4.4.3:49788,oururl.com,/somepath,10.1.2.5:7001,302,2,http://somewhere.gov/,GET,'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; aff-kingsoft-ciba; staticlogin:product=cboxf09&act=login&info=ZmlsZW5hbWU9UG93ZXJ3b; SE 2.X)',''

REGEX1

tmm[\d+]: Rule syslog_http <(?:HTTP_(?:RESPONSE|REQUEST)|LB_FAILED)>: (?:HTTP|HTTP-ERROR|LB-ERROR),([\d|.]+):([\d]+),([\w]+),([\d|.]+):([\d]+),([\w\d:.-]+),([^,?]+)(\?[^,]),(?:([\d|.]+):([\d]+))?,([\d]+),([\d]),([^,]),([^,]),'([^,])','([^,]*)'

DATA2

Aug 2 01:30:01 10.120.17.247 user:01:30:02.019 INFO SummaryData - SUMMARY:name1=value1;name2=value2;name3=value3;

REGEX2

([_a-z]+)=([^;]+);

Tags (1)
1 Solution

Path Finder

Thanks for the answer.

I can't split the data based on host unfortunatly.

What I have currently is two entries in my props.conf - both with a [syslog] stanza representing the syslog input type (as you guessed). Both of these link to seperate TRANSFORMS.

The problem I have is that in some cases I have data which matches both transforms. What I see in this case when I search for this data is that the sourcetype attribute actually appears three times on that search result (once for sourcetype=syslog, and additionally for the other two transforms).

One of my regex's is a name=value style regex. The other is more concrete. I guess what I can do is add a component to my concrete regex that blocks it matching the name=value style data. It's messy, but should work.

View solution in original post

0 Karma

Path Finder

Thanks for the answer.

I can't split the data based on host unfortunatly.

What I have currently is two entries in my props.conf - both with a [syslog] stanza representing the syslog input type (as you guessed). Both of these link to seperate TRANSFORMS.

The problem I have is that in some cases I have data which matches both transforms. What I see in this case when I search for this data is that the sourcetype attribute actually appears three times on that search result (once for sourcetype=syslog, and additionally for the other two transforms).

One of my regex's is a name=value style regex. The other is more concrete. I guess what I can do is add a component to my concrete regex that blocks it matching the name=value style data. It's messy, but should work.

View solution in original post

0 Karma

Path Finder

In the end, I just to work more complexity into my RE's. Unfortunatly using the funky name=value style RE's can match more than you intend, especially if you push lots of data in via the same input (i.e. syslog)

0 Karma

Path Finder

samples added above

0 Karma

Splunk Employee
Splunk Employee

show some sample data and the two regexes in props/transforms.

0 Karma

Splunk Employee
Splunk Employee

Your problem seems to be that both of your field extractions are applied using the same spec, which I imagine is the syslog sourcetype.

Take a look at your props.conf to find out what spec is being used for that extraction.

http://www.splunk.com/base/Documentation/latest/Knowledge/Createandmaintainsearch-timefieldextractio...

[] EXTRACT- =

* <spec> can be:
      o <sourcetype>, the source type of an event.
      o host::<host>, where <host> is the host for an event.
      o source::<source>, where <source> is the source for an event. 

If it is indeed based on the [sourcetype] spec, then you have two options :

  • If each set of events (HEADERTEXT, SOMEHEADERTEXT) are coming from different hosts, then you could simply define one field extraction for each with [host] as the discriminating spec.

  • If you are unable to differentiate by host, then you may have to stick with [sourcetype] as the discriminating spec, in which case I suggest that you set up a regex-based sourcetype override in order to assign a custom sourcetype to each type of event.

http://www.splunk.com/base/Documentation/latest/Admin/Advancedsourcetypeoverrides#Configuration

Then, you can re-write your extractions to be specific to the sourcetypes you just defined. Make sure that the sourcetype assignment happens before the field extraction in transforms.conf.