Splunk Search

How do I edit my regular expression to extract the file path from my sample data?

Communicator

Hello,

Trying to set up a field extraction to get the file path from a log source. Raw data looks like this:

file_path=\\?\C:\Windows\Temp\nsf9A28.tmp\System.dll 

I set up a file extraction that looks like this. file_path: (?P[A-Z]:\\[A-Za-z\\0-9\s]+....)

Testing looks okay, but when we go to the field in a search, it comes up like this.

\\?\C:\Windows\Temp\nsf9A28.tmp\System.dll

How do I adjust to drop the \\?\?

Also, how do I adjust for longer or shorter paths?

0 Karma

Communicator

somesoni2,

Any suggestions with how to adjust for the (x86) issue?

Thanks

0 Karma

Motivator

Depending on how your data comes, either containing file_path= or file_path: try this regex below to save path in field called actualPath:

your query to return events
| rex "file_path(\=|\:)\s*(?<deleteThis>[^\w]+)(?<actualPath>[\S]+)"
| table deleteThis, actualPath

See extraction here

0 Karma

Revered Legend

Give this a try

your base search | rex "file_path=.+(?P<filepath>[A-Z]:(\\\)[A-z0-9\._\s-]+)"

See this run anywhere sample search

| gentimes start=-1 | eval _raw="file_path=\\?\C:\Windows\Temp\nsf9A28.tmp\System.dll" | table _raw | rex "file_path=.+(?P<filepath>[A-Z]:(\\\)[A-z0-9\._\s-]+)"
0 Karma

Revered Legend

Lets try with this regex.(in conf files)

file_path=.*(?P<filepath>[A-Z]:[^\.]+\.\w+)\"

With rex command,

your base search | rex "file_path=.*(?P<filepath>[A-Z]:[^\.]+\.\w+)\""
0 Karma

Communicator

This works on some, breaks some of the ones that worked on the other.

I think its how there are several message types filling in this field.

Is there a way to do multiple extracts on one field, get them two different names, then combine them in a table?

0 Karma

Revered Legend

For the paths it's not working (the new regex file_path=.*(?P<filepath>[A-Z]:[^\.]+\.\w+)\" ), do they have a file name or just the folder name? Any sample for which it didn't work?

0 Karma

Communicator

alt text

Here is a screen shot, you can see the "filepath" dir, as compared to the "file_path" which we are extracting.

Some of the ones failing worked with the last extract. Not sure as to why though.

0 Karma

Revered Legend

I guess the picture upload is broken.

0 Karma

Communicator

alt text

Trying again
alt text

Here is a direct link https://1drv.ms/i/s!AjeD4bQcnKDim1X7MYMgE4LLAAtl

0 Karma

Revered Legend

Hope this one fixes everything.

file_path=[^A-Za-z]*(?P<filepath>[^\"]+)

With rex command,

 your base search | rex "file_path=[^A-Za-z]*(?P<filepath>[^\"]+)"
0 Karma

Revered Legend

This regex works for both type of entries.

file_path=.*(?P<filepath>[A-Z]:[^\"]+)

see the run anywhere sample.

| gentimes start=-1 | eval name="rec_type=125 rec_type_simple=\"MALWARE EVENT\" event_sec=1481920232 agent_uuid=771335d1-1070-43a5-aba6-d5d2d6eb06e7 cloud=\"US Cloud\" type=1090519054 subtype=34 detector=SHA detection=W32.A78962E3EB-100.SBX.VIOC agent_user=ejones@ZOTECNET file_name=TBNotifier.exe file_path=\"C:\Program Files (x86)\AskPartnerNetwork\Toolbar\Updater\TBNotifier.exe\" sha256=a78962e3ebde2876ba49ba646207c622e7dd4e66b0222108be06b6c49de5ab22 file_size=1928776 file_type=HTML file_ts=1478726223 parent_fname=\"\" parent_sha256=\"\" event_description=\"\" sensor=0 instance_id=0 connection_id=1017 connection_sec=1481920231 direction=0 src_ip=10.0.0.118 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri=\"\" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0##rec_type=125 rec_type_simple=\"MALWARE EVENT\" event_sec=1481920232 agent_uuid=771335d1-1070-43a5-aba6-d5d2d6eb06e7 cloud=\"US Cloud\" type=1090519054 subtype=Execute detector=SHA detection=W32.A78962E3EB-100.SBX.VIOC agent_user=\"SYSTEM@NT AUTHORITY\" file_name=TBNotifier.exe file_path=\"\\?\C:\Program Files (x86)\AskPartnerNetwork\Toolbar\Updater\TBNotifier.exe\" sha256=a78962e3ebde2876ba49ba646207c622e7dd4e66b0222108be06b6c49de5ab22 file_size=1928776 file_type=HTML file_ts=1478726223 parent_fname=apnmcp.exe parent_sha256=b69749726c16e54fc2ec448748dba5136c412ee5a70443b559db89406ba811cb event_description=\"\" sensor=0 instance_id=0 connection_id=1016 connection_sec=1481920231 direction=0 src_ip=10.0.0.118 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri=\"\" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0" | table name | makemv name delim="##" | mvexpand name | rename name as _raw | rex "file_path=.*(?P<filepath>[A-Z]:[^\"]+)"
0 Karma

Communicator

alt text

So close. Not sure why some are not getting the correct dir.

0 Karma

Revered Legend

Samples of logs which are not getting proper field extracted?

0 Karma

Communicator
    rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484593326 agent_uuid=2c57a94e-6758-4ef2-9598-dda4ba314c2a cloud="US Cloud" type=553648143 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="\\?\C:\Program Files\Sourcefire\fireAMP\Quarantine\qrt01d2702b07961b79.003" sha256=1b89b0631d931d2f8cfe42ffb0a932cf3035c79700bb8f77c2de824defe114b2 file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6376279778734899202" sensor=0 instance_id=0 connection_id=77 connection_sec=1484593334 direction=0 src_ip=192.168.1.106 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484593326 agent_uuid=2c57a94e-6758-4ef2-9598-dda4ba314c2a cloud="US Cloud" type=553648143 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="\?\C:\Program Files\Sourcefire\fireAMP\Quarantine\qrt01d2702b07955824.002" sha256=90f5cd7d989973f12e6c494f6e25f60ef2822d81506b209c8a431c2a76687fca file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6376279778734899201" sensor=0 instance_id=0 connection_id=76 connection_sec=1484593333 direction=0 src_ip=192.168.1.106 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484326304 agent_uuid=50a7aa9b-4a62-440a-bbfb-d30183df85f6 cloud="US Cloud" type=554696715 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="Flash Scan" sha256="" file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Scan ID: 193784, scanned directories: 0, scanned files: 3715, scanned processes: 111" sensor=0 instance_id=0 connection_id=71 connection_sec=1484326305 direction=0 src_ip=:: dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484314921 agent_uuid=09582dbf-1a4c-476a-9114-85765a6f8da1 cloud="US Cloud" type=2164260880 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="" sha256=dd21fcb1dbd5ff927b3ded134f9f7081bddf9aad6d46508cef9a4add93d7c581 file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6375084042659823622, error code: 3221225524" sensor=0 instance_id=0 connection_id=67 connection_sec=1484314906 direction=0 src_ip=10.0.0.61 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1484314921 agent_uuid=09582dbf-1a4c-476a-9114-85765a6f8da1 cloud="US Cloud" type=553648143 subtype=0 detector=0 detection="" agent_user="" file_name="" file_path="\?\C:\Program Files\Sourcefire\fireAMP\Quarantine\qrt01d26da2d1378c61.001" sha256=dd21fcb1dbd5ff927b3ded134f9f7081bddf9aad6d46508cef9a4add93d7c581 file_size=0 file_type=0 file_ts=0 parent_fname="" parent_sha256="" event_description="Detection ID: 6375084038364856322" sensor=0 instance_id=0 connection_id=62 connection_sec=1484314905 direction=0 src_ip=10.0.0.61 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

0 Karma

Communicator

Were these enough, or do I need to get some more?

0 Karma

Communicator

Oh noticed one thing when testing, which I am not sure about. In some directories, you get (x86) for example. While your rex was perfect for the rest, it dropped that. Testing to get it back, when its there, did not work. Is that due to the () ? Is it a rex issue?

0 Karma

Revered Legend

Can you provide some sample entries where it's failing?

0 Karma

Communicator

Here is the raw, where the (x86) is at.

rec_type=125 rec_type_simple="MALWARE EVENT" event_sec=1481920232 agent_uuid=771335d1-1070-43a5-aba6-d5d2d6eb06e7 cloud="US Cloud" type=1090519054 subtype=34 detector=SHA detection=W32.A78962E3EB-100.SBX.VIOC agent_user=ejones@ZOTECNET file_name=TBNotifier.exe file_path="C:\Program Files (x86)\AskPartnerNetwork\Toolbar\Updater\TBNotifier.exe" sha256=a78962e3ebde2876ba49ba646207c622e7dd4e66b0222108be06b6c49de5ab22 file_size=1928776 file_type=HTML file_ts=1478726223 parent_fname="" parent_sha256="" event_description="" sensor=0 instance_id=0 connection_id=1017 connection_sec=1481920231 direction=0 src_ip=10.0.0.118 dest_ip=:: app_proto=0 agent_user=0 file_policy=00000000-0000-0000-0000-000000000000 disposition=0 retro_disposition=0 uri="" src_port=0 dest_port=0 src_ip_country=0 dest_ip_country=0 web_app=0 client_app=0 file_action=0 ip_proto=0 threat_score=0 num_ioc=0

Here is what it shows.

file_path
C:\Program Files

Thanks

0 Karma

Revered Legend

If the value of the file_path is always enclosed in double quotes, try like this.

your base search | rex "file_path=.*(?P<filepath>[A-Z]:(\\\)[^\"]+)"
0 Karma

Communicator

This is my search string sourcetype=cisco:sourcefire rec_type_simple="MALWARE EVENT" | rex "file_path=.*(?P<filepath>[A-Z]:(\\\)[^\"]+)" |stats count by file_path

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!