Splunk Search

Index time field extraction: regexp issue

Super_Knulps
Explorer

Hello,

Since I often search a specific expression in a large set of events, I would like to index it.

Every single instance that I am running has the following format:
instance-name.generic-name.subdomaine.domain.com

In this expression, only domain.com is static and will never change.
I would like to extract generic-name for all of my events.

props.conf

[generic-name]
TRANSFORMS-generic-name = generic-name

transforms.conf

[generic-name]

REGEX = (?<instancename>[^\.]+)\.(?<gname>[^\.]+)\.(?<subdomain>[^\.]+)\.(?<domain>[^\.]+)\.

fields.conf

[gname]
INDEXED = True

I am wondering if the fact that I am not receiving anything in the Splunk dashboard is coming from my configuration file or my regular expression ?
Thank you in advance for your help

Update: I have tried all the following regexp and there is still no result. I don't receive any data in my sourcetype.

0 Karma
1 Solution

rsennett_splunk
Splunk Employee
Splunk Employee

I've decided to add a totally separate answer here, since if I'm right... your regex is fine (it was just the markup bug we're dealing with now that confused everyone) but your transforms syntax is off.:
Create an indexed field:

[extracted-gname]
REGEX =  whatevercomesbeforeit [^\.]+\.(?<gname>[^\.]+)\.[^\.]+\..+
FORMAT = gname::$1

[extracting-from-host]
SOURCE_KEY = MetaData:Host
REGEX =   [^\.]+\.(?<gname>[^\.]+)\.[^\.]+\..+
FORMAT = gname::$1







[indexed-gname]
REGEX =  whatevercomesbeforeit [^\.]+\.(?<gname>[^\.]+)\.[^\.]+\..+
FORMAT = gname::$1
WRITE_META = true
With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

Super_Knulps
Explorer

Thank you for your answer.
Your regexp looks good and easy to understand but maybe slower due to multiple extraction.
Anyway, I still receive no data when I am trying to use yours. Am I missing something else somewhere ?

0 Karma

woodcock
Esteemed Legend

You need to swap the frontslashes for backslashes (stinking broken markdown). It does work; I tested it. It is important to include the other portions (but you don't necessarily have to capture them into fields) because otherwise your single capture will be capturing things you do not intend.

0 Karma

Super_Knulps
Explorer

Okay thank you, both of your regexp woocock and rsenett_splunk are matching what I want, which is perfect.
However, I still don't receive anything in the dashboard. The sourcetype is fine in the license. I have updated my first post with your regex: it is all up to date.

0 Karma

woodcock
Esteemed Legend

Post your dashboard xml.

0 Karma

Super_Knulps
Explorer

I am just using the search: "sourcetype=generic-name gname=foo", in my Splunk App.

0 Karma

woodcock
Esteemed Legend

This is probably the problem:
http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/

Try this search instead:
sourcetype=generic-name gname=* | search gname="foo"

0 Karma

Super_Knulps
Explorer

My problem is different: in the link you gave me, sourcetype=generic-name gname=* should give results which is not my case. I litteraly get nothing.

0 Karma

woodcock
Esteemed Legend

Your statement is incorrect. Did you run the exact search that I gave you (even though you think it is silly)? Did it give results? The problem in the link causes searches to give 0 results. It is a very nuanced thing, trust me. Just run this and tell me what you get:
sourcetype=generic-name gname=* | search gname="foo"

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

I'm pretty sure that's only a problem if he's running Splunk 4.2 and earlier...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

woodcock
Esteemed Legend

This "problem" (it is not actually a problem, it is a deliberate design compromise) exists for all versions of Splunk.

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

I think there is some confusion about exactly what your problem is.
Your question says... you often search for an expression like:
instance-name.generic-name.subdomain.domain.com
I think some folks here have assumed that this is found the host. I didn't get that from what you've said.

Also, you're giving us the text and we're giving you legitimate working regexes and still you're getting nothing. So it would be a good idea if you posted a couple of events that contain the values you're looking for so we can see what might be going wrong.

Also... see my edited answer that addresses your transforms.conf syntax

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

OP's first comment under the question suggests the value is in the host field, in which case all the transforms.conf REGEX on _raw is pointless.

...I'd still first go with search-time extractions and see if there's any performance hit left over to be addressed with indexed stuff...

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

Ah. I missed. that. And I totally agree. Amended my 2nd answer in case it's just a matter of missing the SOURCE_KEY... which would only bring back the value if that full structure was available and without an anchor would be horribly non-performant...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

See my answer. You were missing an actual extraction. Your capturing group surrounded only the field name... so nothing was being captured. You're also representing only one iteration of "anything that is not a dot" because you were missing the + which says "Everything that is not a dot, until you hit the dot". Whether you grab all the fields, or put literals in the domain and sub domain it doesn't matter as long as you are actually capturing something. As for "Slower" as long as you are moving forward (and not doing lookbacks) speed isn't an issue.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Are you sure you need indexed extractions here?

What happens when you run this search:

index=foo sourcetype=generic-name gname=some-gname

Is the scanCount in the job inspector higher than the resultCount?

0 Karma

Super_Knulps
Explorer

Thank you for your answer. Yes I am pretty sure that I need indexed extractions here since I am running the equivalence of gname=foo on every single search I do. Anyway, I will compare the performance before and after my change.

When I run this:
index=foo sourcetype=generic-name gname=some-gname
I got: No Results Found. Even with sourcetype=generic-name only and gname=some-gname only.

scanCount=0 resultCount=0.

I am wondering if the host is part of the data. Is the host part of the data that I can extract ? Or maybe it is just my regexp.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...