Splunk Search

How do I fix my field extraction to account for whitespace in some paths

tkw03
Communicator

Hello

I have some data in a txt file that I am working on extractions for. It extracts fine except that in some of the urls there is/are spaces and it throws the rest of the extractions off.

for example
this works just fine

Type      AppliesTo  Path                                            Snap  Hard    Soft  Adv     Used    
---------------------------------------------------------------------------------------------------------
directory DEFAULT    /place/here2/test                                  No    1.00G   -     990.00M 12      

However this does not

Type      AppliesTo  Path                                            Snap  Hard    Soft  Adv     Used    
---------------------------------------------------------------------------------------------------------

directory DEFAULT    /place/here/fileservers/host16/App Management No    100.00G -     98.00G  90.073G 

due to spaces in the path the extarctions after that dont work.

Here are my props

[ storage:data ]
CHARSET=UTF-8
DATETIME_CONFIG=CURRENT
FIELD_DELIMITER=whitespace
HEADER_FIELD_LINE_NUMBER=1
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=null
SEDCMD-removeDash=s/---------------------------------------------------------------------------------------------------------//g
SEDCMD-removeDash2=s/^-.*$//g
SHOULD_LINEMERGE=false
disabled=false
pulldown_type=true

The issue is using whitespace as the delimiter I suppose but if I dont use that I dont get any field extractions. Any ideas?

Tags (1)
0 Karma
1 Solution

atownson
Explorer

Give the below a shot. You'll need to check the line breaking (LINE_BREAKER) to verify the events are broken properly. And you'll need to list all possible values of the 'Type' field separated by a pipe in the regular expression (EXTRACT). I've listed 'directory' and 'file'. This should give you the correct search-time field extractions.

[storage:data]
CHARSET=UTF-8
DATETIME_CONFIG=CURRENT
LINE_BREAKER=([\r\n]+) *Type +
NO_BINARY_CHECK=null
SHOULD_LINEMERGE=false
disabled=false
pulldown_type=true
EXTRACT-data=^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$

For a clustered environment:

props.conf on indexers:

 [storage:data]
 CHARSET=UTF-8
 DATETIME_CONFIG=CURRENT
 LINE_BREAKER=([\r\n]+) *Type +
 NO_BINARY_CHECK=null
 SHOULD_LINEMERGE=false
 disabled=false
 pulldown_type=true

props.conf on search heads:

[storage:data]
EXTRACT-data=^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$

View solution in original post

0 Karma

atownson
Explorer

Give the below a shot. You'll need to check the line breaking (LINE_BREAKER) to verify the events are broken properly. And you'll need to list all possible values of the 'Type' field separated by a pipe in the regular expression (EXTRACT). I've listed 'directory' and 'file'. This should give you the correct search-time field extractions.

[storage:data]
CHARSET=UTF-8
DATETIME_CONFIG=CURRENT
LINE_BREAKER=([\r\n]+) *Type +
NO_BINARY_CHECK=null
SHOULD_LINEMERGE=false
disabled=false
pulldown_type=true
EXTRACT-data=^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$

For a clustered environment:

props.conf on indexers:

 [storage:data]
 CHARSET=UTF-8
 DATETIME_CONFIG=CURRENT
 LINE_BREAKER=([\r\n]+) *Type +
 NO_BINARY_CHECK=null
 SHOULD_LINEMERGE=false
 disabled=false
 pulldown_type=true

props.conf on search heads:

[storage:data]
EXTRACT-data=^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$
0 Karma

tkw03
Communicator

Question, if a field in the log record doesnt exist is there a way to force that field to extract nothing? be blank?

Sometimes I have a record like this:
directory DEFAULT /ifs/home/home/T/TLO11 No 1.00G 12

Ans sometimes its like this:
directory DEFAULT /ifs/home/departments/o56/Dev No 1.00G 921.60M 2.55M

0 Karma

_Tom
Explorer

If you want to get the key with an empty value, use "KEEP_EMPTY_VALS = true" in your extraction stanza in transforms.conf.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...