Getting Data In

Help indexing XML CDRs

jerrad
Path Finder

I have been struggling to get these XML CDRs to index correctly in Splunk without missing some data from the events.

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>

I would really like to create an event that contains <record> thru to </record> and move on to the next event, however I get events that only contain two lines here and there so one event may show

<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>

then the next event will show

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>

Instead of

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
..........
</record>

My props.conf

[aaaacct]
BREAK_ONLY_BEFORE=<recId>
MAX_EVENTS=200000
TIME_PREFIX = (?m)<startDate>

Does anyone have any suggestions on how to approach this problem?

Thanks

Jerrad

Tags (1)
0 Karma
1 Solution

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

View solution in original post

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

ziegfried
Influencer

It matches because of the use of the *-quantifier. It matches 0 or more whitespaces - hence it matches if there's nothing between the tags.

0 Karma

jerrad
Path Finder

This worked perfectly, the way that I read the LINE_BREAKER docs confused me, but I think the magic lies in

Wherever the regex matches, the start of the first matching group is considered the end of the
previous event, and the end of the first matching group is considered the start of the next event.

However I'm not sure how this group matched since there isn't a space, tab or linebreak between and

0 Karma

ziegfried
Influencer

Yes, your're right. Corrected it in the answer above.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Instead of MAX_EVENTS, when you use SHOULD_LINEMERGE = false, you need to increase the event limit by increasing TRUNCATE, e.g., TRUNCATE = 500000. MAX_EVENTS will have no effect since when you don't merge lines, the maximum number of lines merged is 1. (Or zero or whatever.)

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Observability Simplified: Combining User Experience, Application Performance & ...

Tech Talk Observability Simplified: Combining User Experience, Application Performance & Network ...

Event Series May & June: From Network Visibility to Service Intelligence

Unifying the Network: Moving from Alert Noise to Service Intelligence with Splunk ITSI In today’s hybrid ...

Global Splunk User Group Events: May + June 2026

Your Splunk Community Awaits: Discover Upcoming User Group Events Worldwide    Staying ahead in the fast-paced ...