When I have a single row of values with in the tags-
<employee details="ename;position;branch" department="XYZ">AA;systems engineer;seattle
</employee>
Then I'm able to parse the data properly as I require. But when there are multiple rows ( multiple set of values) as mentioned in the above example posted,
<employee details="ename;position;branch" department="XYZ">BB;Lead;seattle
CC;Tech Lead;Redmond
</employee>
then I'm facing difficult to parse the data.
Example that worked out for me.
Data:
10:26:10 PST 16 Nov 2015
<employee details="ename;position;branch" department="XYZ">AA;systems engineer;seattle
</employee>
1:26:10 PST 16 Nov 2015
<employee details="ename;position;branch" department="XYZ">BB;Lead;seattle
</employee>
6:26:10 PST 16 Nov 2015
<employee details="ename;position;branch" department="XYZ">DD;data architect;annapolis
</employee>
props.conf
[employee]
SHOULD_LINEMERGE = true (combines multiple lines into single event)
MUST_BREAK_AFTER = </employee> (dividing the data into events)
NO_BINARY_CHECK = true
disabled = false
pulldown_type = true
REPORT-employee = emp (transform stanza name)
transforms.conf
[emp]
REGEX = <employee details="ename,position,branch" department="XYZ">(.*?)</employee> (regular expression for capturing the data within the tags)
FORMAT = details::$1 (format of the event)
MV_ADD = true (multivalued field)
REPEAT_MATCH = true
CSV data formatting
Splunk Query:
index = main sourcetype = employee | eval data = split(details,";") | eval name= mvindex(data,0) | eval position = mvindex(data,-2) | eval branch= mvindex(data,-1) | table data, name, position , branch
Output:
ename position branch
AA systems engineer seattle
BB Lead seattle
DD data architect annapolis
... View more