Getting Data In

Help indexing XML CDRs

jerrad
Path Finder

I have been struggling to get these XML CDRs to index correctly in Splunk without missing some data from the events.

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>
<startDate>2009/11/10 06:54:51</startDate>
<callingNumber>xxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>v3631:9216=4;v5535:44=xxxxxxxxxxxxxx;v5535:48=0;v5535:24=3;v5535:7=xxxxxxxxxxxxxx;</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxx:StdFile:flatfile-12549597153198</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>

I would really like to create an event that contains <record> thru to </record> and move on to the next event, however I get events that only contain two lines here and there so one event may show

<created>Tue Nov 10 07:01:37 2009</created>
<userid>xxxxxxxxxxxxxx</userid>
<domain>xxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxx</radIP>
<userIP>xxxxxxxxxxxxxx</userIP>
<delta>44</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>18630</bytesIn>
<bytesOut>14050</bytesOut>
<packetsIn>47</packetsIn>
<packetsOut>45</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>0</proxyAcctIPAddr>
<proxyAcctAck>0</proxyAcctAck>
<termCause>1</termCause>
<clientIPAddr>xxxxxxxxxxxxxx</clientIPAddr>
<entityID>xxxxxxxxxxxxxx</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>F</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxx</clientID>
<sessionID>cdma_3553142430988069998</sessionID>
<nasID>xxxxxxxxxxxxxx</nasID>
<nasVendor>v</nasVendor>
<nasModel>xxxxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxxxxxxxx</nasPort>
<billingID>xxxxxxxxxxxxxx</billingID>

then the next event will show

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>

Instead of

<record>
<recId>cdma_8461599e2356401240238057235696109</recId>
..........
</record>

My props.conf

[aaaacct]
BREAK_ONLY_BEFORE=<recId>
MAX_EVENTS=200000
TIME_PREFIX = (?m)<startDate>

Does anyone have any suggestions on how to approach this problem?

Thanks

Jerrad

Tags (1)
0 Karma
1 Solution

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

View solution in original post

ziegfried
Influencer

I'd suggest using a custom LINE_BREAKER instead of the BREAK_ONLY_BEFORE option:

[aaaacct]
SHOULD_LINEMERGE = false
TRUNCATE = 500000
LINE_BREAKER = </record>(\s*)<record>
TIME_PREFIX = (?m)<startDate>

ziegfried
Influencer

It matches because of the use of the *-quantifier. It matches 0 or more whitespaces - hence it matches if there's nothing between the tags.

0 Karma

jerrad
Path Finder

This worked perfectly, the way that I read the LINE_BREAKER docs confused me, but I think the magic lies in

Wherever the regex matches, the start of the first matching group is considered the end of the
previous event, and the end of the first matching group is considered the start of the next event.

However I'm not sure how this group matched since there isn't a space, tab or linebreak between and

0 Karma

ziegfried
Influencer

Yes, your're right. Corrected it in the answer above.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Instead of MAX_EVENTS, when you use SHOULD_LINEMERGE = false, you need to increase the event limit by increasing TRUNCATE, e.g., TRUNCATE = 500000. MAX_EVENTS will have no effect since when you don't merge lines, the maximum number of lines merged is 1. (Or zero or whatever.)

0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...