I'm trying to extract XML fields from a report which is about 70-80 lines (maybe more). I receive the whole report as a single event because breaking it would make the report lose its meaning. I have been researching and trying out various means of field extraction for this report but nothing has worked out so far. If anyone can help me out with this, it'd be great.
I tried xmlkv, spath, xpath, manual regex field extraction. When I try manual field extraction or xmklkv, it matches only the last occurence of the tag. For example, consider the following code sample:
When I use regex for field extraction or when I use xmlkv for say field level, I get only the last value (Low). Also, spath by default extracts fields from the first 5000 characters and I understand this can be changed in limits.conf, but I don't know how many characters my report would contain, so I dont know what I should set the value to. When I try spath like so:
whatever_search|spath output=host path=objects.object.ip|top host
the field host contains the whole xml report and not just the field I'm looking for. Can someone please suggest some alternative/solution to this? I have no option but using XML for this.
Kristian, I just wanted to say thanks for the tip. I've been able to successfully use this method to do field extractions in some xml logs I'm working with.
did you find a solution?
Have you looked at MV_ADD=true in order to get more than the last value?
Basically, you need to do the following changes/additions;
in props.conf
[your_xml_sourcetype]
REPORT-gettin_da_levels = da_level
in transforms.conf
[da_level]
REGEX = <level>([^<]+)<
FORMAT = myLevel::$1
MV_ADD = True
Hope this helps,
Kristian
You're most welcome 🙂
Kristian: Thank you very much for your help. Yours is the first solution that worked for me. Really appreciate all the help. Thank you!
Sheela: Glad it worked so far. As for context.. I don't know. Not all that familiar with working with XML files. I guess you have tried the xpath & spath commands which supposedly do this kind of thing. sorry...
tb5821: I know that extract(kv) has an mv_add option which can be used inline, however I don't think it'll work here.
Do you have to make changes to config files, is there a way to only do it via search?
Thanks Kristian! That worked like a charm. Can you also tell me how I can do the same thing while maintaining the level of nesting in xml? The reports I have are about 200 lines and deeply nested, do you have any suggestions on how I can extract fields so they make sense in their context?
For example, in the XML above, one host(192.168.X.Y) can have level high while another host(192.168.X.X) can have level low. Will be able to extract such context sensitive information?
I think I'm having a simlar issue over here: http://splunk-base.splunk.com/answers/45039/regex-text
I get the entire line not just the data between the two fields