Hello,
I have a data source with dynamic structure, position of comma separated field/value changes for some of the events. A few sample events and the extraction I used are giving below. My extraction is working for event one, but not working for other 2 events as field/values position changes there. Is there any way we can use one field extraction code to address this issue will be highly appreciated. Thank you so much.
Timestamp:(?P<TIME_STAMP>.+), Type:(?P<TYPE>.+), EType:(?P<EType>.+), TCode:(?P <TCode>.+), EventId: (?P<EventId>.+), Id: (?P<Id>.+), SAddress: (?P<SAddress>.+), System: (?P< System >.+), SId: (?P<SId>.+), eSignCode: (?P< eSignCode >.+), RCode: (?P< RCode >.+), Error: (?P< Error >.+)
2022-10-12 06:42:36.591 { INFO } [default task-79] - Timestamp: 2022-10-12T11:42:39.591Z, eSignCode: 3012, Type: REGT, EType: ESIGN, TCode: 23005, EventId: GET_SIGN, Id: 12045, SAddress: 35.168.40.67, System: EIVES, SId: =/=S()A.b(X(-yJrV/+do)f(Q_)uca-/6+o_v.k|39OYc+Fh_=YOX-iDA++===, RCode: 000, Error: nullm
2022-10-12 06:42:30.591 { INFO } [default task-79] - Timestamp: 2022-10-12T11:42:30.591Z, Type: REGT, TCode: 23305, Id: 12045, SAddress: 35.168.40.67, System: EIVES, SId: =/=S()A.b(X(-yJrV/+do)f(Q_)uca-/6+o_v.k|39OYc+Fh_=YOX-iDA++===, eSignCode: 3012, EventId: GET_SIGN, Error: nullm
2022-10-14 06:42:26.591 { INFO } [default task-79] - Timestamp: 2022-10-12T11:42:26.591Z, Type: REGT, TCode: 23015, EventId: GET_SIGN, RCode: 010, Id: 12045, SAddress: 35.168.40.65, System: EIVES, SId: =/=S()A.b(X(-yJrV/98do)f(Q_)tca-/6+o_v.k|39OYc+Fh_=YOX-iDA++===, EventId: GET_SIGN, Error: nullm
Hello all,
Thank you so much for your quick response, but any of them I cannot use in In-Line field extraction available in SPLUNK web.
This is confusing. Why cannot field-by-field extraction (as @jdunlea suggested) be used in inline field extraction? You just enter them one by one
For timestamp, enter
"Timestamp:\s*(?P<TIME_STAMP>[^,]+)"Similarly, enter
"Type:\s*(?P<TYPE>[^,]+)""EType:\s*(?P<EType>[^,]+)""TCode:\s*(?P<TCode>[^,]+)""EventId:\s*(?P<EventId>[^,]+)""Id:\s*(?P<Id>[^,]+)""SAddress:\s*(?P<SAddress>[^,]+)""System:\s*(?P<System>[^,]+)""SId:\s*(?P<SId>[^,]+)""eSignCode:\s*(?P<eSignCode>[^,]+)""RCode:\s*(?P<RCode>[^,]+)"and
"Error:\s*(?P<Error>[^,]+)"
Hello,
Yes, we can use that approach, go with field by field. But sometime source fields are created dynamically, and, in that case, we don't know the field value pairs; also, we need to create around 10 to 12 separate extractions. How would we address that? Thank you again.
That is where @johnhuang's suggestion comes to play.
| extract pairdelim=",",kvdelim=":"In search line, of course, not automatic. Also, it doesn't work with Timestamp field. If your developers refuse to maintain an agreed-upon log format - yes, I know that happens, you are left with few choices.
Speaking of log format, the existing format is regular enough that they could have simply used "=" instead of ":" and you would have no problem of this sort. It may be worth exerting any influence that you can.
In the meantime, you can put either johnhua's or jdunlea's solution in a macro and insert it whenever needed.
Hello @yuanliu,
Thank you so much again and sounds good to me. I have one more question, is there any way we can use props and transforms configurations to implement this extraction?
If there is any, I haven't found it. (And not for lack of trying.) You can still extract individual fields automatically as jdunlea suggested.
Looks like a good use case for kv extraction:
| makeresults
| eval _raw="2022-10-12 06:42:36.591 { INFO } [default task-79] - Timestamp: 2022-10-12T11:42:39.591Z, eSignCode: 3012, Type: REGT, EType: ESIGN, TCode: 23005, EventId: GET_SIGN, Id: 12045, SAddress: 35.168.40.67, System: EIVES, SId: =/=S()A.b(X(-yJrV/+do)f(Q_)uca-/6+o_v.k|39OYc+Fh_=YOX-iDA++===, RCode: 000, Error: nullm"
| append [| makeresults
| eval _raw="2022-10-12 06:42:30.591 { INFO } [default task-79] - Timestamp: 2022-10-12T11:42:30.591Z, Type: REGT, TCode: 23305, Id: 12045, SAddress: 35.168.40.67, System: EIVES, SId: =/=S()A.b(X(-yJrV/+do)f(Q_)uca-/6+o_v.k|39OYc+Fh_=YOX-iDA++===, eSignCode: 3012, EventId: GET_SIGN, Error: nullm"]
| append [| makeresults
| eval _raw=" 2022-10-14 06:42:26.591 { INFO } [default task-79] - Timestamp: 2022-10-12T11:42:26.591Z, Type: REGT, TCode: 23015, EventId: GET_SIGN, RCode: 010, Id: 12045, SAddress: 35.168.40.65, System: EIVES, SId: =/=S()A.b(X(-yJrV/98do)f(Q_)tca-/6+o_v.k|39OYc+Fh_=YOX-iDA++===, EventId: GET_SIGN, Error: nullm"]
| extract pairdelim=",",kvdelim=":"
I would recommend breaking up your rex statement into a few different regexes. This way, you can anchor on the items that are closer to the data you want to extract.
For example:
| rex field=_raw "<TYPE regex here>"
| rex field=_raw "<EType regex here>"
| rex field=_raw "<TCode regex here>"
etc.
Alternatively you can construct 2 or 3 large regexes that can accommodate the different event structures you have, and in each regex, call the fields slightly different names.
I.E. Regex 1 would extract TCode1, and regex 2 would extract TCode2.
Then you can use the eval command with the coalesce function to merge these fields together later on to TCode.
For example:
| eval TCode=coalesce(Tcode1,TCode2)