I created an extracted field called remote_user. My search for certain dates do bring the field value properly. However the same search for some other dates do not bring the proper values. I checked the events and the extracted field is malformed on the dates having issues. The remote_user field value will be like "CompanyName John_doe". The days when search is working the remote_user shows "CompanyName John_doe". The dates when the search is not working the field shows value as "CompanyName". How can same extracted field works differently on different dates? Any suggestions?
Hi @gnshah12345 ,
You may use the following regex expression for fetching the required "remote_user" field.
\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}\s\-\s(?<remote_user>.+)\[
Kindly upvote, if found helpful.
Hi @gnshah12345 ,
If the field extraction is based on user provided regex, kindly share the same in the response with a sample data, will be helpful in finding the right cause.
Thanks!
I used regular expression for field extraction.
The below is sample. The extracted field is highlighted.
May 3 11:26:01 linux_1 request-instance SoftCert 10.10.20.30 - Brew Bar John Doe_123456_UE [03/May/2023:11:25:55.509 -0400] "GET /rest/BROk305031.xml?ink=202305031525554263206 HTTP/1.1" 404 196 36580 1 25135 brew.bar.com /rest 749 "OU=123456+CN= Brew Bar John Doe,OU=ny,O=Brew Bar Joint,C=us" cc045c0a-e9a9-11ed-a6e5-0050568916c1 "x509: TLSV12: 30" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"
The question doesn't seem to be related to dates - unless you can show two different raw events, one for which your regex works as desired, one for which not. Additionally, unless you can demonstrate your regex, there is no way to diagnose.
But ultimately, what is the significance of this string preceding the bracketed date, namely "Brew Bar John Doe_123456_UE"? According to your description, the value you want is "Brew Bar John Doe". If your description is accurate, this is the value of CN attribute in that embedded LDAP node, except that embedded message contains a nonstandard delimiter ("+" instead of space), and some inconvenient spacing, both can be fixed easily.
Instead of trying to reinvent regex, I suggest that you use Splunk supported extractions when applicable. They are more robust. In your case, the log contains a segment that is NCSA/Apache access log. Splunk comes with access-request and access-extractions for such. For example,
| rex mode=sed "s/\+/,/g s/= */=/g" ``` handle little quirks in data ```
| extract access-request ``` but this is robust ```
This will give you
C | CN | O | OU | file | ink | method | root | uri | uri_domain | uri_path | uri_query | version |
us | Brew Bar John Doe | Brew Bar Joint | 123456 | BROk305031.xml | 202305031525554263206 | GET | rest | /rest/BROk305031.xml?ink=202305031525554263206 | /rest/BROk305031.xml | ink=202305031525554263206 | HTTP/1.1 |
| rex mode=sed "s/\+/,/g s/= */=/g"
| extract access-extractions
C | CN | O | OU | ink |
us | Brew Bar John Doe | Brew Bar Joint | 123456 | 202305031525554263206 |