Dashboards & Visualizations

parse xml that are more than 900K bytes

SplunkCSIT
Communicator

I got alot of more than 900K bytes xml file, i just want to index the first few tags of file within the xml. If my xml file is just less than 20K bytes, the below transforms able to work. But if the file is more than 50K bytes, the below transforms will not work. Any other alternative? The one alternative that i can think of is to write a batch script to remove the body and the content tag from the xml file before it go thru the splunk. I dont really know how to write a batch script. Any other easier suggestion? thks

props.conf

[xmlFilter1]

KV_MODE = xml
BREAK_ONLY_BEFORE = <xml>
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = 1
pulldown_type = 1
TRUNCATE = 1000000000000
MAX_EVENTS = 1000000000000
TRANSFORMS-test1 = body, content

transforms.conf

[body]
LOOKAHEAD = 1000000000000
SOURCE_KEY=_raw
REGEX=(.*?)\<body\>.*?\</body\>(.*)
DEST_KEY=_raw
FORMAT=$1<body>####</body>$2

[content]

LOOKAHEAD = 1000000000000
SOURCE_KEY=_raw
REGEX=(.*?)\<content\>.*?\</content\>(.*)
DEST_KEY=_raw
FORMAT=$1<content>*******</content>$2
Tags (1)
0 Karma

SplunkCSIT
Communicator

Hi,
You can ignore the above xml, this is the sample xml. I wish to mask the content for and tag, what is the recommended regex? thks

123 456 Not to be forward to indexer 333 not to be forwardthis is to be validfsffggetewrwerwerwewff
1.tetette. 2rerere ererererer3.erererefr 23/4/2014 12:23:232 23/4/2014 12:24:232

0 Karma

lguinn2
Legend

One suggestion: in props.conf, set TRUNCATE = 0

0 Karma

SplunkCSIT
Communicator

Hi,
You can ignore the above xml, this is the sample xml. I wish to mask the content for and tag, what is the recommended regex? thks

123 456 Not to be forward to indexer 333 not to be forwardthis is to be validfsffggetewrwerwerwewff
1.tetette. 2rerere ererererer3.erererefr 23/4/2014 12:23:232 23/4/2014 12:24:232

0 Karma

lguinn2
Legend

It would help if we could see a bit of the file (you could remove the content and just show the tags with a little gibberish between them).

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...