Getting Data In

Help indexing XML CDRs

jerrad
Path Finder

I have been struggling to get these XML CDRs to index correctly in Splunk without missing some data from the events.

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>

I would really like to create an event that contains <record> thru to </record> and move on to the next event, however I get events that only contain two lines here and there so one event may show

<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>

then the next event will show

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>

Instead of

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
..........
</record>

My props.conf

[aaaacct]
BREAK_ONLY_BEFORE=<recId>
MAX_EVENTS=200000
TIME_PREFIX = (?m)<startDate>

Does anyone have any suggestions on how to approach this problem?

Thanks

Jerrad

Tags (1)
0 Karma
1 Solution

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

View solution in original post

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

ziegfried
Influencer

It matches because of the use of the *-quantifier. It matches 0 or more whitespaces - hence it matches if there's nothing between the tags.

0 Karma

jerrad
Path Finder

This worked perfectly, the way that I read the LINE_BREAKER docs confused me, but I think the magic lies in

Wherever the regex matches, the start of the first matching group is considered the end of the
previous event, and the end of the first matching group is considered the start of the next event.

However I'm not sure how this group matched since there isn't a space, tab or linebreak between and

0 Karma

ziegfried
Influencer

Yes, your're right. Corrected it in the answer above.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Instead of MAX_EVENTS, when you use SHOULD_LINEMERGE = false, you need to increase the event limit by increasing TRUNCATE, e.g., TRUNCATE = 500000. MAX_EVENTS will have no effect since when you don't merge lines, the maximum number of lines merged is 1. (Or zero or whatever.)

0 Karma
Get Updates on the Splunk Community!

What’s new on Splunk Lantern in August

This month’s Splunk Lantern update gives you the low-down on all of the articles we’ve published over the past ...

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data ...

This Week's Community Digest - Splunk Community Happenings [8.3.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...