Splunk Search

How do I edit my regular expression to extract the file path from my sample data?

bworrellZP
Communicator

Hello,

Trying to set up a field extraction to get the file path from a log source. Raw data looks like this:

file_path=\\?\C:\Windows\Temp\nsf9A28.tmp\System.dll 

I set up a file extraction that looks like this. file_path: (?P[A-Z]:\\[A-Za-z\\0-9\s]+....)

Testing looks okay, but when we go to the field in a search, it comes up like this.

\\?\C:\Windows\Temp\nsf9A28.tmp\System.dll

How do I adjust to drop the \\?\?

Also, how do I adjust for longer or shorter paths?

0 Karma

bworrellZP
Communicator

somesoni2,

Any suggestions with how to adjust for the (x86) issue?

Thanks

0 Karma

gokadroid
Motivator

Depending on how your data comes, either containing file_path= or file_path: try this regex below to save path in field called actualPath:

your query to return events
| rex "file_path(\=|\:)\s*(?<deleteThis>[^\w]+)(?<actualPath>[\S]+)"
| table deleteThis, actualPath

See extraction here

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

your base search | rex "file_path=.+(?P<filepath>[A-Z]:(\\\)[A-z0-9\._\s-]+)"

See this run anywhere sample search

| gentimes start=-1 | eval _raw="file_path=\\?\C:\Windows\Temp\nsf9A28.tmp\System.dll" | table _raw | rex "file_path=.+(?P<filepath>[A-Z]:(\\\)[A-z0-9\._\s-]+)"
0 Karma

somesoni2
SplunkTrust
SplunkTrust

Lets try with this regex.(in conf files)

file_path=.*(?P<filepath>[A-Z]:[^\.]+\.\w+)\"

With rex command,

your base search | rex "file_path=.*(?P<filepath>[A-Z]:[^\.]+\.\w+)\""
0 Karma

bworrellZP
Communicator

This works on some, breaks some of the ones that worked on the other.

I think its how there are several message types filling in this field.

Is there a way to do multiple extracts on one field, get them two different names, then combine them in a table?

0 Karma

somesoni2
SplunkTrust
SplunkTrust

For the paths it's not working (the new regex file_path=.*(?P<filepath>[A-Z]:[^\.]+\.\w+)\" ), do they have a file name or just the folder name? Any sample for which it didn't work?

0 Karma

bworrellZP
Communicator

alt text

Here is a screen shot, you can see the "filepath" dir, as compared to the "file_path" which we are extracting.

Some of the ones failing worked with the last extract. Not sure as to why though.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

I guess the picture upload is broken.

0 Karma

bworrellZP
Communicator

alt text

Trying again
alt text

Here is a direct link https://1drv.ms/i/s!AjeD4bQcnKDim1X7MYMgE4LLAAtl

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Hope this one fixes everything.

file_path=[^A-Za-z]*(?P<filepath>[^\"]+)

With rex command,

 your base search | rex "file_path=[^A-Za-z]*(?P<filepath>[^\"]+)"
0 Karma

somesoni2
SplunkTrust
SplunkTrust

This regex works for both type of entries.

file_path=.*(?P<filepath>[A-Z]:[^\"]+)

see the run anywhere sample.

| gentimes start=-1 | eval name="rec_type=125 rec_type_simple=\"MALWARE EVENT\" event_sec=1481920232 agent_uuid=771335d1-1070-43a5-aba6-d5d2d6eb06e7 cloud=\"US Cloud\" type=1090519054 subtype=34 detector=SHA detection=W32.A78962E3EB-100.SBX.VIOC agent_user=ejones@ZOTECNET file_name=TBNotifier.exe file_path=\"C:\Program Files (x86)\AskPartnerNetwork\Toolbar\Updater\TBNotifier.exe\" sha256=a78962e3ebde2876ba49ba646207c622e7dd4e66b0222108be06b6c49de5ab22 file_size=1928776 file_type=HTML file_ts=1478726223 parent_fname=\"\" parent_sha256=\"\" event_description=\"\" sensor=0 instance_id=0 connection_id=1017 connection_sec=1481920231 direction=0 src_ip=10.0.0.118 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri=\"\" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0##rec_type=125 rec_type_simple=\"MALWARE EVENT\" event_sec=1481920232 agent_uuid=771335d1-1070-43a5-aba6-d5d2d6eb06e7 cloud=\"US Cloud\" type=1090519054 subtype=Execute detector=SHA detection=W32.A78962E3EB-100.SBX.VIOC agent_user=\"SYSTEM@NT AUTHORITY\" file_name=TBNotifier.exe file_path=\"\\?\C:\Program Files (x86)\AskPartnerNetwork\Toolbar\Updater\TBNotifier.exe\" sha256=a78962e3ebde2876ba49ba646207c622e7dd4e66b0222108be06b6c49de5ab22 file_size=1928776 file_type=HTML file_ts=1478726223 parent_fname=apnmcp.exe parent_sha256=b69749726c16e54fc2ec448748dba5136c412ee5a70443b559db89406ba811cb event_description=\"\" sensor=0 instance_id=0 connection_id=1016 connection_sec=1481920231 direction=0 src_ip=10.0.0.118 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri=\"\" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0" | table name | makemv name delim="##" | mvexpand name | rename name as _raw | rex "file_path=.*(?P<filepath>[A-Z]:[^\"]+)"
0 Karma

bworrellZP
Communicator

alt text

So close. Not sure why some are not getting the correct dir.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Samples of logs which are not getting proper field extracted?

0 Karma

bworrellZP
Communicator
    rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484593326 agent_uuid=2c57a94e-6758-4ef2-9598-dda4ba314c2a cloud="US Cloud" type=553648143 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="\\?\C:\Program Files\Sourcefire\fireAMP\Quarantine\qrt01d2702b07961b79.003" sha256=1b89b0631d931d2f8cfe42ffb0a932cf3035c79700bb8f77c2de824defe114b2 file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6376279778734899202" sensor=0 instance_id=0 connection_id=77 connection_sec=1484593334 direction=0 src_ip=192.168.1.106 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484593326 agent_uuid=2c57a94e-6758-4ef2-9598-dda4ba314c2a cloud="US Cloud" type=553648143 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="\?\C:\Program Files\Sourcefire\fireAMP\Quarantine\qrt01d2702b07955824.002" sha256=90f5cd7d989973f12e6c494f6e25f60ef2822d81506b209c8a431c2a76687fca file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6376279778734899201" sensor=0 instance_id=0 connection_id=76 connection_sec=1484593333 direction=0 src_ip=192.168.1.106 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484326304 agent_uuid=50a7aa9b-4a62-440a-bbfb-d30183df85f6 cloud="US Cloud" type=554696715 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="Flash Scan" sha256="" file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Scan ID: 193784, scanned directories: 0, scanned files: 3715, scanned processes: 111" sensor=0 instance_id=0 connection_id=71 connection_sec=1484326305 direction=0 src_ip=:: dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484314921 agent_uuid=09582dbf-1a4c-476a-9114-85765a6f8da1 cloud="US Cloud" type=2164260880 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="" sha256=dd21fcb1dbd5ff927b3ded134f9f7081bddf9aad6d46508cef9a4add93d7c581 file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6375084042659823622, error code: 3221225524" sensor=0 instance_id=0 connection_id=67 connection_sec=1484314906 direction=0 src_ip=10.0.0.61 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484314921 agent_uuid=09582dbf-1a4c-476a-9114-85765a6f8da1 cloud="US Cloud" type=553648143 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="\?\C:\Program Files\Sourcefire\fireAMP\Quarantine\qrt01d26da2d1378c61.001" sha256=dd21fcb1dbd5ff927b3ded134f9f7081bddf9aad6d46508cef9a4add93d7c581 file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6375084038364856322" sensor=0 instance_id=0 connection_id=62 connection_sec=1484314905 direction=0 src_ip=10.0.0.61 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

0 Karma

bworrellZP
Communicator

Were these enough, or do I need to get some more?

0 Karma

bworrellZP
Communicator

Oh noticed one thing when testing, which I am not sure about. In some directories, you get (x86) for example. While your rex was perfect for the rest, it dropped that. Testing to get it back, when its there, did not work. Is that due to the () ? Is it a rex issue?

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Can you provide some sample entries where it's failing?

0 Karma

bworrellZP
Communicator

Here is the raw, where the (x86) is at.

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1481920232 agent_uuid=771335d1-1070-43a5-aba6-d5d2d6eb06e7 cloud="US Cloud" type=1090519054 subtype=34 detector=SHA detection=W32.A78962E3EB-100.SBX.VIOC agent_user=ejones@ZOTECNET file_name=TBNotifier.exe file_path="C:\Program Files (x86)\AskPartnerNetwork\Toolbar\Updater\TBNotifier.exe" sha256=a78962e3ebde2876ba49ba646207c622e7dd4e66b0222108be06b6c49de5ab22 file_size=1928776 file_type=HTML file_ts=1478726223 parent_fname="" parent_sha256="" event_description="" sensor=0 instance_id=0 connection_id=1017 connection_sec=1481920231 direction=0 src_ip=10.0.0.118 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

Here is what it shows.

file_path
C:\Program Files

Thanks

0 Karma

somesoni2
SplunkTrust
SplunkTrust

If the value of the file_path is always enclosed in double quotes, try like this.

your base search | rex "file_path=.*(?P<filepath>[A-Z]:(\\\)[^\"]+)"
0 Karma

bworrellZP
Communicator

This is my search string sourcetype=cisco:sourcefire rec_type_simple="MALWARE EVENT" | rex "file_path=.*(?P<filepath>[A-Z]:(\\\)[^\"]+)" |stats count by file_path

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...