Splunk Search

Help with regex for index time extraction

dhavamanis
Builder

Can you please help us with the REGEX to extract "varnishnsca" from the log below during the indexing time to assign the _MetaData:Index. Also provide some more info how can i figure out a regex myself if any other extraction is needed.

varnishncsa bal-8079 1.1.1.1 - - [20/Aug/2014:20:42:48 +0000] "HEAD http://test.com/test HTTP/1.1" 200 0 "-" "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15" 5.049910307 miss miss request_id="v-8be63a04-28aa-11e4-9a2d-22000a1e84a4" "-"

1 Solution

kristian_kolb
Ultra Champion

You should probably have a look at this section of the docs;

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Advancedsourcetypeoverrides

It deals with sourcetypes and hosts, but you could just as easily use the method to rewrite the index.

If your events normally go to the index 'blah' and you just want to re-route the 'varnishncsa' you'd do it like this;

inputs.conf

[monitor:///path/to/file]
index=blah
sourcetype=bob

props.conf

[bob]
TRANSFORMS-chidx = reroute_varnish

transforms.conf

[reroute_varnish]
REGEX = ^varnishncsa
FORMAT = varnishncsa
DEST_KEY = _MetaData:Index

Or you could do it dynamically, i.e. re-route all events to an index that matches the first word/string in each event. Just make sure that the indexes actually exist first - they will not be dynamically created;

props.conf

[bob]
TRANSFORMS-chidx = dyn_idx

transforms.conf

[dyn_idx]
REGEX = ^(\S+)\s*
FORMAT = $1
DEST_KEY = _MetaData:Index

Haven't tried the last alternative, since that can be slightly unpredictable.


UPDATE:

So if the data is coming in via syslog, I guess you should do it in inputs.conf;

[udp://514]
sourcetype=access_combined_wcookie
index=blah
connection_host= ip OR dns. see the docs.
no_appending_timestamp = true

This is assuming that you don't have other types of data coming in via that port. In that case you configure Splunk to listen on a dedicated port just for this traffic (if you can configure your web servers to send to e.g. udp:10514).

In props.conf you call for transformation;

[access_combined_wcookie]
TRANSFORMS-set_index = awc_change_index

and in transforms.conf you do the REGEX, FORMAT and DEST_KEY as discussed above.

See http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Inputsconf

/K

View solution in original post

kristian_kolb
Ultra Champion

You should probably have a look at this section of the docs;

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Advancedsourcetypeoverrides

It deals with sourcetypes and hosts, but you could just as easily use the method to rewrite the index.

If your events normally go to the index 'blah' and you just want to re-route the 'varnishncsa' you'd do it like this;

inputs.conf

[monitor:///path/to/file]
index=blah
sourcetype=bob

props.conf

[bob]
TRANSFORMS-chidx = reroute_varnish

transforms.conf

[reroute_varnish]
REGEX = ^varnishncsa
FORMAT = varnishncsa
DEST_KEY = _MetaData:Index

Or you could do it dynamically, i.e. re-route all events to an index that matches the first word/string in each event. Just make sure that the indexes actually exist first - they will not be dynamically created;

props.conf

[bob]
TRANSFORMS-chidx = dyn_idx

transforms.conf

[dyn_idx]
REGEX = ^(\S+)\s*
FORMAT = $1
DEST_KEY = _MetaData:Index

Haven't tried the last alternative, since that can be slightly unpredictable.


UPDATE:

So if the data is coming in via syslog, I guess you should do it in inputs.conf;

[udp://514]
sourcetype=access_combined_wcookie
index=blah
connection_host= ip OR dns. see the docs.
no_appending_timestamp = true

This is assuming that you don't have other types of data coming in via that port. In that case you configure Splunk to listen on a dedicated port just for this traffic (if you can configure your web servers to send to e.g. udp:10514).

In props.conf you call for transformation;

[access_combined_wcookie]
TRANSFORMS-set_index = awc_change_index

and in transforms.conf you do the REGEX, FORMAT and DEST_KEY as discussed above.

See http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/Inputsconf

/K

kristian_kolb
Ultra Champion

see update above. /k

0 Karma

dhavamanis
Builder

Thank you so much. Can you please tell us, how can i assign the another sourcetype in the same config. because access logs are coming as syslog format and need to use appropriate sourcetype to get the field values automatically (access_combined_wcookie).

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...