Getting Data In

Issues with XML Linebreaker

avoelk
Communicator

Hello fellow splunkers,

right now I'm working through the 7 labs for SE II which are necessary to be able to start the finishing accreditation quiz. I've been able to finish 5 of them by now but am totally lost with lab 6. here the instructions are:

- events should begin with <Interceptor> and end with </Interceptor> (so Linebreaking is needed)

- Extract (at search time) all fields and values in between the Interceptor lines and throw away any of the header lines before the first <Interceptor> and the line after the very last </Interceptor>

- Use the ActionDate and ActionTime field as the timestamp

- have Splunk auto extract the fields and values

 

how they say I'd know I've done it:

- I'll have x amount of events and the fields broken out using SPATH notation

- the correct timestamp

- no text before the first and after the last Interceptor

 

What I have so far:

- I'm able to extract ActionDate and ActionTime to create a new timestamp

- I'm able to linebreak with LINE_BREAK = \<Interceptor\>()

 

My Issue:

- When I linebreak I save the new sourcetype and try to proceed to alter it given the other things to do like extract timestamp or delete the header text. but when I change ANYTHING it just disregards the linebreaker argument and goes back to be one huge event again and I can't do anything about it.

- even if I could linebreak and extract everything as stated, I don't really understand what they mean with "broken out using SPATH mean". do they mean via SPL ? cause they clearly stated that Splunk should "auto extract the fields and values"

How the data looks:

<?xml version="1.0" encoding="UTF-8" ?><dataroot><Interceptor><AttackCoords>-80.33100097073213,25.10742916222947</AttackCoords><Outcome>Interdiction</Outcome><Infiltrators>23</Infiltrators><Enforcer>Ironwood</Enforcer><ActionDate>2013-04-24</ActionDate><ActionTime>00:07:00</ActionTime><RecordNotes></RecordNotes><NumEscaped>0</NumEscaped><LaunchCoords>-80.23429525620114,24.08680387475695</LaunchCoords><AttackVessel>Rustic</AttackVessel></Interceptor><Interceptor><AttackCoords>-80.14622349209523,24.53605142362535</AttackCoords><Outcome>Interdiction</Outcome><Infiltrators>6</Infiltrators><Enforcer>Cunningham</Enforcer><ActionDate>2013-04-26</ActionDate><ActionTime>00:23:00</ActionTime><RecordNotes></RecordNotes><NumEscaped>0</NumEscaped><LaunchCoords></LaunchCoords><AttackVessel>Raft</AttackVessel></Interceptor><Interceptor><AttackCoords>-80.75496221688965,24.72483828554483</AttackCoords><Outcome>Interdiction</Outcome><Infiltrators>11</Infiltrators><Enforcer>Forthright</Enforcer><ActionDate>2013-05-15</ActionDate><ActionTime>23:35:00</ActionTime><RecordNotes></RecordNotes><NumEscaped>0</NumEscaped><LaunchCoords>-79.65932674368925,23.70743135623052</LaunchCoords><AttackVessel>Rustic</AttackVessel></Interceptor><Interceptor><AttackCoords>-80.32020594311533,25.02156920297054</AttackCoords><Outcome>Interdiction</Outcome><Infiltrators>6</Infiltrators><Enforcer>Pompano</Enforcer><ActionDate>2013-02-25</ActionDate><ActionTime>15:35:00</ActionTime><RecordNotes></RecordNotes><NumEscaped>0</NumEscaped><LaunchCoords></LaunchCoords><AttackVessel>Raft</AttackVessel></Interceptor><Interceptor><AttackCoords>-80.15149489716094,24.57412215015249</AttackCoords><Outcome>Interdiction</Outcome><Infiltrators>6</Infiltrators><Enforcer>Tripoteur</Enforcer><ActionDate>2013-04-13</ActionDate><ActionTime>15:40:00</ActionTime><RecordNotes></RecordNotes><NumEscaped>0</NumEscaped><LaunchCoords>-79.65999190070923,23.73619147168514</LaunchCoords><AttackVessel>Raft</AttackVessel></Interceptor></dataroot>

I hope someone can help understand how to proceed here.

EDIT:

in Lab 4 there was almost the same data to input - the only difference is that in lab6 it has no linebreaks whatsoever. here is my props.conf from lab4:

[dreamcrusher]
DATETIME_CONFIG = 
FIELD_HEADER_REGEX = <Interceptor>
LINE_BREAKER = \<Interceptor\>
MAX_DAYS_AGO = 4000
NO_BINARY_CHECK = true
category = Custom
disabled = false
pulldown_type = true
REPORT-actiondate = actiondate
EVAL-_time = strptime(ActionDate +" " + ActionTime,"%Y-%m-%d %H:%M:%S")

and my transforms.conf:

#[actiondate]
#REGEX = \<ActionDate\>(?P<ActionDate>\d+-\d+-\d+)\<\/ActionDate\>\s*\<ActionTime\>(?P<ActionTime>\d+:\d+:\d+)
#FORMAT = $1::$2

 

Labels (4)
0 Karma
1 Solution

avoelk
Communicator

alright so apparently the GUI is sometimes buggy when you try to change a sourcetype. so to do more than just the linebreak - especially the deletion of the header - I did this: 

since it's a huge one line event and has no breakt the FIELD_HEADER_REGEX doesn't work here. what I did was: 

props.conf

TRANSFORMS-t1 = extraction

transforms.conf

[extraction]
REGEX = \<\?xml\sversion="\d.\d"\sencoding="UTF-8"\s\?\>\<dataroot\>
DEST_KEY = queue
FORMAT = nullQueue

this captured the whole xml lalala crap until the actual event begins. since there is <dataroot> at the beginning AND end, this deletes both.

to extract the necessary ActionDate and ActionTime and put it together into a new timestamp I did the following: 

props.conf

REPORT-actiondate = actiondate
EVAL-_time = strptime(ActionDate +" " + ActionTime,"%Y-%m-%d %H:%M:%S")

transforms.conf

[actiondate]
REGEX = \<ActionDate\>(?P<ActionDate>\d+-\d+-\d+)\<\/ActionDate\>\s*\<ActionTime\>(?P<ActionTime>\d+:\d+:\d+)
FORMAT = $1::$2

 

I still don't understand the part "broken out using spath mean" so I figured I'd do it with .. well, spath via SPL:

index=myindex sourcetype=mysourcetype |spath input=_raw path=

 

that did it for me

 

View solution in original post

0 Karma

avoelk
Communicator

alright so apparently the GUI is sometimes buggy when you try to change a sourcetype. so to do more than just the linebreak - especially the deletion of the header - I did this: 

since it's a huge one line event and has no breakt the FIELD_HEADER_REGEX doesn't work here. what I did was: 

props.conf

TRANSFORMS-t1 = extraction

transforms.conf

[extraction]
REGEX = \<\?xml\sversion="\d.\d"\sencoding="UTF-8"\s\?\>\<dataroot\>
DEST_KEY = queue
FORMAT = nullQueue

this captured the whole xml lalala crap until the actual event begins. since there is <dataroot> at the beginning AND end, this deletes both.

to extract the necessary ActionDate and ActionTime and put it together into a new timestamp I did the following: 

props.conf

REPORT-actiondate = actiondate
EVAL-_time = strptime(ActionDate +" " + ActionTime,"%Y-%m-%d %H:%M:%S")

transforms.conf

[actiondate]
REGEX = \<ActionDate\>(?P<ActionDate>\d+-\d+-\d+)\<\/ActionDate\>\s*\<ActionTime\>(?P<ActionTime>\d+:\d+:\d+)
FORMAT = $1::$2

 

I still don't understand the part "broken out using spath mean" so I figured I'd do it with .. well, spath via SPL:

index=myindex sourcetype=mysourcetype |spath input=_raw path=

 

that did it for me

 

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...