Hi there,
I created multiple field extractions, extracting values from different sourcetypes into the same field:
sourcetype0: "field0":"(?<geolocation_code>.{7})
sourcetype1: "field1":"(?<geolocation_code>.{7})
sourcetype2: "field2":"(?<geolocation_code>.{7})
They are populated as expected, all looking like ABCDEF0, CDEFGH3 or ZDEGFH9. But when using them in search geolocation_code=ABCDEF0 I have zero hits even though the preview from the fields pane on the left shows me plenty of those values. Using geolocation_code!=ABCDEF0 on the other hand works exactly as inteded. Also geolocation_code=ABCDEF0* gives the result I expected from geolocation_code=ABCDEF0 even though this field only contains exactly the value I'm looking for. I don't really understand what is happening here and why only with this extraction but not with other.
The data is indeed JSON. But I want to have another field just containing the first 7 characters from field0 to use that field for crossreferencing. But when I extract it like "field0":"(?<geolocation_code>.{7}) I can only use it like geolocation_code=ABCDEF1* not geolocation_code=ABCDEF1.
{"info":"text","host":"SOURCE00033","@timestamp":"2022-12-07T09:33:01.000Z","user":"UserName@domain","device":"DVD/CD-ROM","ctime":"2022-12-07T09:33:01.000Z","id":737112781,"field0":"ABCDEF1ABCDEF01.domain.local"}
And substring works in field extractions? But why does the search only work with asterisk? I'd love to understand what I did not understand about how splunk processes this regex to avoid repeating that mistake in the future. As far as I understand it, it should not happen like this but obviously I does so there must be something that I got wrong.
substr only works in SPL. What I was saying is that there is no need to do this in transforms (or worse, index time extraction). Generally, you should do things at search time, and it is a bad practice to use regex on raw events with structured data like JSON. (Use regex on fields extracted from JSON instead.)
As I said earlier, without real knowledge about actual data, it is not possible to know why you need that asterisk. Using substr can actually help you diagnose by supplying an independent data point that do not rely on your automatic extraction, which is generally more difficult to diagnose.
So with substr I have no such effect.
Without seeing actual data, I cannot tell why your instance have that behavior. However, the snippet you showed suggests that your data is actually JSON. If this is correct, using regex for extraction is counterproductive. You should have "field0", "field1", etc., already. Just use coalesce.
| eval geolocation_code = coalesce(field0, field1, field2)
If the raw events are not JSON but part of it is, aim to extract the entire JSON object, then run spath on it.