Splunk Search

Help with regex for index time extraction

dhavamanis
Builder

Can you please help us with the REGEX to extract "varnishnsca" from the log below during the indexing time to assign the _MetaData:Index. Also provide some more info how can i figure out a regex myself if any other extraction is needed.

varnishncsa bal-8079 1.1.1.1 - - [20/Aug/2014:20:42:48 +0000] "HEAD http://test.com/test HTTP/1.1" 200 0 "-" "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15" 5.049910307 miss miss request_id="v-8be63a04-28aa-11e4-9a2d-22000a1e84a4" "-"

1 Solution

kristian_kolb
Ultra Champion

You should probably have a look at this section of the docs;

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Advancedsourcetypeoverrides

It deals with sourcetypes and hosts, but you could just as easily use the method to rewrite the index.

If your events normally go to the index 'blah' and you just want to re-route the 'varnishncsa' you'd do it like this;

inputs.conf

[monitor:///path/to/file]
index=blah
sourcetype=bob

props.conf

[bob]
TRANSFORMS-chidx = reroute_varnish

transforms.conf

[reroute_varnish]
REGEX = ^varnishncsa
FORMAT = varnishncsa
DEST_KEY = _MetaData:Index

Or you could do it dynamically, i.e. re-route all events to an index that matches the first word/string in each event. Just make sure that the indexes actually exist first - they will not be dynamically created;

props.conf

[bob]
TRANSFORMS-chidx = dyn_idx

transforms.conf

[dyn_idx]
REGEX = ^(\S+)\s*
FORMAT = $1
DEST_KEY = _MetaData:Index

Haven't tried the last alternative, since that can be slightly unpredictable.


UPDATE:

So if the data is coming in via syslog, I guess you should do it in inputs.conf;

[udp://514]
sourcetype=access_combined_wcookie
index=blah
connection_host= ip OR dns. see the docs.
no_appending_timestamp = true

This is assuming that you don't have other types of data coming in via that port. In that case you configure Splunk to listen on a dedicated port just for this traffic (if you can configure your web servers to send to e.g. udp:10514).

In props.conf you call for transformation;

[access_combined_wcookie]
TRANSFORMS-set_index = awc_change_index

and in transforms.conf you do the REGEX, FORMAT and DEST_KEY as discussed above.

See http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Inputsconf

/K

View solution in original post

kristian_kolb
Ultra Champion

You should probably have a look at this section of the docs;

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Advancedsourcetypeoverrides

It deals with sourcetypes and hosts, but you could just as easily use the method to rewrite the index.

If your events normally go to the index 'blah' and you just want to re-route the 'varnishncsa' you'd do it like this;

inputs.conf

[monitor:///path/to/file]
index=blah
sourcetype=bob

props.conf

[bob]
TRANSFORMS-chidx = reroute_varnish

transforms.conf

[reroute_varnish]
REGEX = ^varnishncsa
FORMAT = varnishncsa
DEST_KEY = _MetaData:Index

Or you could do it dynamically, i.e. re-route all events to an index that matches the first word/string in each event. Just make sure that the indexes actually exist first - they will not be dynamically created;

props.conf

[bob]
TRANSFORMS-chidx = dyn_idx

transforms.conf

[dyn_idx]
REGEX = ^(\S+)\s*
FORMAT = $1
DEST_KEY = _MetaData:Index

Haven't tried the last alternative, since that can be slightly unpredictable.


UPDATE:

So if the data is coming in via syslog, I guess you should do it in inputs.conf;

[udp://514]
sourcetype=access_combined_wcookie
index=blah
connection_host= ip OR dns. see the docs.
no_appending_timestamp = true

This is assuming that you don't have other types of data coming in via that port. In that case you configure Splunk to listen on a dedicated port just for this traffic (if you can configure your web servers to send to e.g. udp:10514).

In props.conf you call for transformation;

[access_combined_wcookie]
TRANSFORMS-set_index = awc_change_index

and in transforms.conf you do the REGEX, FORMAT and DEST_KEY as discussed above.

See http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Inputsconf

/K

kristian_kolb
Ultra Champion

see update above. /k

0 Karma

dhavamanis
Builder

Thank you so much. Can you please tell us, how can i assign the another sourcetype in the same config. because access logs are coming as syslog format and need to use appropriate sourcetype to get the field values automatically (access_combined_wcookie).

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...