I would like to automatically extract fields using props.conf.
When there is a pattern like the one below, what I want to extract is each file name. attach_filename:[""] contains one or two file names.
How can I extract all file names?
"attach_filename":["image.png","GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent"]
"attach_filename":["image.png","Office2016_Patcher_For_OSX.torrent"]
"attach_filename":["image.png"]
"attach_filename":["Saccharomyces_cerevisiae_patent.docx"]
field extract will be store file_name
file_name : image.png,
Saccharomyces_cerevisiae_patent.docx,
GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent, Office2016_Patcher_For_OSX.torrent
You can change the field name with the "rename" method,
but what I wanted was for the desired field name to be searched
when I searched with just
index=botsv2 sourcetype="stream:smtp"
---------------------------------------------------------------------------------------------------------------------------------------------
index=botsv2 sourcetype="stream:smtp" attach_filename{}="*"
(Before,, In order to extraact file_name, I had to search for that..)
I took a hint from your words and solved it in a different way.
Taking a hint that attach_filename{} was already extracted from splunk,
I created a lookup-file using "spath" and made it "Auto-Lookup".
Then, the field is now extracted and displayed with just index=botsv2 sourcetype="stream:smtp".
I really appreciate your help. Thank You 🙂
My first reaction is: regex is the wrong solution. This looks like part of a JSON document. Treating structured data as text string is just calling for trouble down the road. Can you share raw events? (Anonymize as needed.)
Or, if this is a developer's joke, and you only have this string in a field, let's call it field1, you can still use Splunk's JSON capability to extract data. It's much more robust. Something like this:
| eval field1 = "{" . field1 . "}"
| spath input=field1
Your mock data will give
attach_filename{} | field1 |
image.png GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent | {"attach_filename":["image.png","GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent"]} |
image.png Office2016_Patcher_For_OSX.torrent | {"attach_filename":["image.png","Office2016_Patcher_For_OSX.torrent"]} |
image.png | {"attach_filename":["image.png"]} |
Saccharomyces_cerevisiae_patent.docx | {"attach_filename":["Saccharomyces_cerevisiae_patent.docx"]} |
Here is an emulation you can play with and compare with real data, if your developers really play such a joke.
| makeresults
| fields - _*
| eval field1 = split("\"attach_filename\":[\"image.png\",\"GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent\"]
\"attach_filename\":[\"image.png\",\"Office2016_Patcher_For_OSX.torrent\"]
\"attach_filename\":[\"image.png\"]
\"attach_filename\":[\"Saccharomyces_cerevisiae_patent.docx\"]", "
")
| mvexpand field1
``` data emulation ```
you're right. I am trying to extract fields from JSON-data.
I used botsv2 data, in "stream:smtp" sourcetype.
This is my _raw data(I try to search index="botsv2" sourcetype="stream:smtp").
The _raw data result.
{"endtime":"2017-08-31T22:56:56.070751Z","timestamp":"2017-08-31T22:56:56.070751Z","ack_packets_in":0,"ack_packets_out":0,"bytes":72,"bytes_in":0,"bytes_out":72,"capture_hostname":"matar","client_rtt":0,"client_rtt_packets":0,"client_rtt_sum":0,"data_packets_in":0,"data_packets_out":1,"dest_ip":"172.31.38.181","dest_mac":"06:6A:51:FA:0A:B0","dest_port":25,"duplicate_packets_in":0,"duplicate_packets_out":0,"flow_id":"b6b9eb1b-e8e1-4cec-ab3c-f7223adc490a","greeting":"ip-172-31-38-181.us-west-2.compute.internal ESMTP Postfix (Ubuntu)","missing_packets_in":0,"missing_packets_out":0,"network_interface":"eth0","packets_in":0,"packets_out":1,"protocol_stack":"ip:tcp:smtp","reply_time":0,"request_ack_time":0,"request_time":0,"response_ack_time":24624,"response_code":220,"response_time":0,"sender_server":"ip-172-31-38-181.us-west-2.compute.internal","server_agent":"ESMTP Postfix (Ubuntu)","server_response":"220 ip-172-31-38-181.us-west-2.compute.internal ESMTP Postfix (Ubuntu)","server_rtt":0,"server_rtt_packets":0,"server_rtt_sum":0,"src_ip":"104.47.34.68","src_mac":"06:E3:CC:18:AA:33","src_port":37952,"time_taken":0,"transport":"tcp"}
I have one more question. The raw data results I searched with index=botsv2 sourcetype="stream:smtp" and Why are the search results with index="botsv2" sourcetype="stream:smtp" attach_filename{}="*" different? The field I want to extract exists in the search results with index="botsv2" sourcetype="stream:smtp" attach_filename{}="*".
Search Try: index="botsv2" sourcetype="stream:smtp" attach_filename{}="*"
{"endtime":"2017-08-30T15:08:00.075698Z","timestamp":"2017-08-30T15:07:59.774655Z","ack_packets_in":0,"ack_packets_out":31,"attach_disposition":["attachment"],"attach_filename":["Saccharomyces_cerevisiae_patent.docx"],"attach_size":[142540],"attach_size_decoded":[104162],"attach_transfer_encoding":["base64"],"attach_type":["application/vnd.openxmlformats-officedocument.wordprocessingml.document"],"bytes":155976,"bytes_in":155939,"bytes_out":37,"capture_hostname":"matar","client_rtt":0,"client_rtt_packets":0,"client_rtt_sum":0,"content":["DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r\n d=jacobsmythe111.onmicrosoft.com; s=selector1-froth-ly;\r\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
So, you already have attach_filename{} extracted by Splunk. No need for extra work. Is this correct?
To answer your question about two searches, when you add an additional filter, you SHOULD expect the result to change. It is obvious that not all events have that attach_filename{} field populated. If you do
index="botsv2" sourcetype="stream:smtp" attach_filename{}="*"
you only select those events with this field. Without attach_filename{}="*", you pick up every event, including those that do not have attach_filename{}.
Then, how do I change the field name from attach_filename{} to file_name?
rename is your friend.
| rename attach_filename{} as filename
You can change the field name with the "rename" method,
but what I wanted was for the desired field name to be searched
when I searched with just
index=botsv2 sourcetype="stream:smtp"
---------------------------------------------------------------------------------------------------------------------------------------------
index=botsv2 sourcetype="stream:smtp" attach_filename{}="*"
(Before,, In order to extraact file_name, I had to search for that..)
I took a hint from your words and solved it in a different way.
Taking a hint that attach_filename{} was already extracted from splunk,
I created a lookup-file using "spath" and made it "Auto-Lookup".
Then, the field is now extracted and displayed with just index=botsv2 sourcetype="stream:smtp".
I really appreciate your help. Thank You 🙂
You know there is a field alias feature in Splunk, too. That is a more appropriate solution if you do really want to search by a different name. An extra lookup is clunky and also a compute cost.
Go to Settings -> Fields -> Field aliases.