Why can't values of fields from field extractions ...

bitnapper · ‎12-13-2022

Hi there,

I created multiple field extractions, extracting values from different sourcetypes into the same field:

sourcetype0: "field0":"(?<geolocation_code>.{7})
sourcetype1: "field1":"(?<geolocation_code>.{7})
sourcetype2: "field2":"(?<geolocation_code>.{7})

They are populated as expected, all looking like ABCDEF0, CDEFGH3 or ZDEGFH9. But when using them in search geolocation_code=ABCDEF0 I have zero hits even though the preview from the fields pane on the left shows me plenty of those values. Using geolocation_code!=ABCDEF0 on the other hand works exactly as inteded. Also geolocation_code=ABCDEF0* gives the result I expected from geolocation_code=ABCDEF0 even though this field only contains exactly the value I'm looking for. I don't really understand what is happening here and why only with this extraction but not with other.

bitnapper · ‎12-15-2022

The data is indeed JSON. But I want to have another field just containing the first 7 characters from field0 to use that field for crossreferencing. But when I extract it like "field0":"(?<geolocation_code>.{7}) I can only use it like geolocation_code=ABCDEF1* not geolocation_code=ABCDEF1.

{"info":"text","host":"SOURCE00033","@timestamp":"2022-12-07T09:33:01.000Z","user":"UserName@domain","device":"DVD/CD-ROM","ctime":"2022-12-07T09:33:01.000Z","id":737112781,"field0":"ABCDEF1ABCDEF01.domain.local"}

yuanliu · ‎12-15-2022

geolocation_code is a new field, is that correct? If you only want the first 7 characters, you can do so easily with substr (or any number of other methods).

| eval geolocation_code = substr(coalesce(field0, field1, field2), 1, 7)

bitnapper · ‎12-15-2022

And substring works in field extractions? But why does the search only work with asterisk? I'd love to understand what I did not understand about how splunk processes this regex to avoid repeating that mistake in the future. As far as I understand it, it should not happen like this but obviously I does so there must be something that I got wrong.

yuanliu · ‎12-15-2022

substr only works in SPL. What I was saying is that there is no need to do this in transforms (or worse, index time extraction). Generally, you should do things at search time, and it is a bad practice to use regex on raw events with structured data like JSON. (Use regex on fields extracted from JSON instead.)

As I said earlier, without real knowledge about actual data, it is not possible to know why you need that asterisk. Using substr can actually help you diagnose by supplying an independent data point that do not rely on your automatic extraction, which is generally more difficult to diagnose.

bitnapper · ‎12-15-2022

So with substr I have no such effect.

yuanliu · ‎12-14-2022

Without seeing actual data, I cannot tell why your instance have that behavior. However, the snippet you showed suggests that your data is actually JSON. If this is correct, using regex for extraction is counterproductive. You should have "field0", "field1", etc., already. Just use coalesce.

| eval geolocation_code = coalesce(field0, field1, field2)

If the raw events are not JSON but part of it is, aim to extract the entire JSON object, then run spath on it.

Why can't values of fields from field extractions be searched as expected?

field extraction

Splunk Enterprise Security(ES) 7.3 is approaching the end of support. Get ready for ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability for AI

Are you a member of the Splunk Community?

Why can't values of fields from field extractions be searched as expected?

field extraction

Splunk Enterprise Security(ES) 7.3 is approaching the end of support. Get ready for ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability for AI