Need to extract source and target path fields from a logged command line for an application called Aspera SCP, part of IBM Aspera file transfer service. The command lines are logged via events such as these:
C:\Program Files\Aspera\Enterprise Server\bin\ascp.exe -T -Q -d -l 50000 -m 25000 -k 2 -i C:\Users\asperaadmin\.ssh\asperaweb_id_dsa.openssh -O 33001 -P 33001 --ignore-host-key --mode=send --user=xferuser --host=ats-aws-us-whatever.com Z:\Content\metadata\somefile.xml /we-shall-anonimyze-this-one-too-b2c09898392f
C:\Program Files\Aspera\Enterprise Server\bin\ascp.exe -T -Q -d -l 300000 -m 10000 -k 2 -O 33001 -P 22 --ignore-host-key Z:\Source Files\Content\image.jpg username@target.host.net:/target/path/directory
Note that the flags may have filenames after them - that may contain spaces; filenames may contain spaces as well.
The commands follow Aspera Command Reference:
ascp options [[user@]srcHost:]source_file1[,source_file2,...] [[user@]destHost:]target_path
Questions:
P.S. I tried writing several rex statements to extract the ascp filename, ignore most flags, then one source filename, and finally target user, host and path for the above two statements - and got stuck - my multiple rex statements are stepping on each other and thus do not seem to be the best mechanism. The code below isn't working properly.
| rex field=event_message "^(?P<program_path_win>\w\:\\\.*?\\\(?P<program_module_win>[^\\\]+\.\S+))\s+(?P<event_msg_tail>.+)$"
| rex field=event_msg_tail ".+\s+(?P<file_path_win>\w\:\\\.*?\\\(?P<file_name_win>[^\\\]+\.\S+))\s+(?P<Destination>((?P<peer_userID>.+)\@)(?P<peer_host>\S+))\:|s+(?P<peer_dir>.*)?$"
| rex field=event_msg_tail "--host=(?P<peer_host>\S+)\s+"
| rex field=event_msg_tail "--user=(?P<peer_user>\S+)\s+"
| rex field=event_msg_tail ".+\s+(?P<file_path_win>\w\:\\\.*?\\\(?P<file_name_win>[^\\\]+\.\S+))\s+(?P<Destination>((?P<peer_userID>.+)\@)(?P<peer_host>\S+))\:|s+(?P<peer_dir>.*)?$"
| rex field=event_msg_tail "-i\s+(?P<private_key_file>\w\:\\\.+?)\s+(?:-\w+\s+|--\w+=|$)"
| rex field=event_msg_tail ".+\s+(?P<file_path_win>\w\:\\\.*?\\\(?P<file_name_win>[^\\\]+\.\S+))\s+(?P<peer_dir>.*)?$"
| eval peer_dir = coalesce(peer_dir, "")
| eval peer_host = coalesce(peer_host, "")
| eval peer_user = coalesce(peer_user, peer_userID, "")
| eval agent = coalesce(program_module_win, "")
| table _time host log_level component agent peer_host peer_user peer_userID peer_dir
Thanks!
P.S. I inadvertently posted this to the wrong section - to "Dashboards & Visualizations" rather than "search" - but don't see an option to move the post. Would someone please kindly move it - or tell me how?
index=_internal | head 1 | fields _raw | eval _raw="C:\Program Files\Aspera\Enterprise Server\bin\ascp.exe -T -Q -d -l 50000 -m 25000 -k 2 -i C:\Users\asperaadmin\.ssh\asperaweb_id_dsa.openssh -O 33001 -P 33001 --ignore-host-key --mode=send --user=xferuser --host=ats-aws-us-whatever.com Z:\Content\metadata\somefile.xml /we-shall-anonimyze-this-one-too-b2c09898392f"
| appendpipe [eval _raw="C:\Program Files\Aspera\Enterprise Server\bin\ascp.exe -T -Q -d -l 300000 -m 10000 -k 2 -O 33001 -P 22 --ignore-host-key Z:\Source Files\Content\image.jpg username@target.host.net:/target/path/directory"]
| rex "ascp.exe.*\s--\S+\s(?P<source_files>[A-Z]\:.*)\s(?P<target_path>.*)$"
| rex field=source_files "((?<user>[^@]+(?=@))(?:@))?(?<source_host>[^:]+(?=\:\/))?.*"
| rex field=target_path "((?<user>[^@]+(?=@))(?:@))?(?<target_host>[^:]+(?=\:))?.*"
Regular expressions can't work well without accurate samples, right?
index=_internal | head 1 | fields _raw | eval _raw="C:\Program Files\Aspera\Enterprise Server\bin\ascp.exe -T -Q -d -l 50000 -m 25000 -k 2 -i C:\Users\asperaadmin\.ssh\asperaweb_id_dsa.openssh -O 33001 -P 33001 --ignore-host-key --mode=send --user=xferuser --host=ats-aws-us-whatever.com Z:\Content\metadata\somefile.xml /we-shall-anonimyze-this-one-too-b2c09898392f"
| appendpipe [eval _raw="C:\Program Files\Aspera\Enterprise Server\bin\ascp.exe -T -Q -d -l 300000 -m 10000 -k 2 -O 33001 -P 22 --ignore-host-key Z:\Source Files\Content\image.jpg username@target.host.net:/target/path/directory"]
| rex "ascp.exe.*\s--\S+\s(?P<source_files>[A-Z]\:.*)\s(?P<target_path>.*)$"
| rex field=source_files "((?<user>[^@]+(?=@))(?:@))?(?<source_host>[^:]+(?=\:\/))?.*"
| rex field=target_path "((?<user>[^@]+(?=@))(?:@))?(?<target_host>[^:]+(?=\:))?.*"
Regular expressions can't work well without accurate samples, right?
Well played 🙂 Removing "--ignore-host-key" trips it though: "-flag" and "--options" aren't required parts of the command. E.g. the SPL should work for these two, as well:
ascp.exe -T -Q -l 300000 Z:\Source Files\Content\image2.jpg /target/path/directory2
... and something like:
ascp.exe -T -Q -l 300000 -i C:\Users\asperaadmin\.ssh\asperaweb_id_dsa.openssh Z:\Source Files\Content\image2.jpg /target/path/directory2
I.e. the only certainty is this:
ascp options [[user@]srcHost:]source_file1[,source_file2,...] [[user@]destHost:]target_path