Knowledge Management

Fields using regular expressions

silverKi
Path Finder

I would like to automatically extract fields using props.conf.
When there is a pattern like the one below, what I want to extract is each file name. attach_filename:[""] contains one or two file names.
How can I extract all file names?

 

"attach_filename":["image.png","GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent"]
"attach_filename":["image.png","Office2016_Patcher_For_OSX.torrent"]
"attach_filename":["image.png"]
"attach_filename":["Saccharomyces_cerevisiae_patent.docx"]

 

field extract will be store file_name

 

file_name : image.png, 
Saccharomyces_cerevisiae_patent.docx, 
GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent, Office2016_Patcher_For_OSX.torrent

 

Labels (2)
Tags (2)
0 Karma
1 Solution

silverKi
Path Finder

You can change the field name with the "rename" method,
but what I wanted was for the desired field name to be searched
when I searched with just 

 

index=botsv2 sourcetype="stream:smtp"

 

---------------------------------------------------------------------------------------------------------------------------------------------

index=botsv2 sourcetype="stream:smtp" attach_filename{}="*" 

(Before,, In order to extraact file_name, I had to search for  that..)

I took a hint from your words and solved it in a different way.

Taking a hint that attach_filename{} was already extracted from splunk,
I created a lookup-file using "spath" and made it "Auto-Lookup".

Then, the field is now extracted and displayed with just index=botsv2 sourcetype="stream:smtp".

I really appreciate your help. Thank You 🙂

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

My first reaction is: regex is the wrong solution.  This looks like part of a JSON document.  Treating structured data as text string is just calling for trouble down the road.  Can you share raw events? (Anonymize as needed.)

Or, if this is a developer's joke, and you only have this string in a field, let's call it field1, you can still use Splunk's JSON capability to extract data.  It's much more robust.  Something like this:

 

| eval field1 = "{" . field1 . "}"
| spath input=field1

 

Your mock data will give

attach_filename{}
field1
image.png
GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent
{"attach_filename":["image.png","GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent"]}
image.png
Office2016_Patcher_For_OSX.torrent
{"attach_filename":["image.png","Office2016_Patcher_For_OSX.torrent"]}
image.png{"attach_filename":["image.png"]}
Saccharomyces_cerevisiae_patent.docx{"attach_filename":["Saccharomyces_cerevisiae_patent.docx"]}

Here is an emulation you can play with and compare with real data, if your developers really play such a joke.

 

| makeresults
| fields - _*
| eval field1 = split("\"attach_filename\":[\"image.png\",\"GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent\"]
\"attach_filename\":[\"image.png\",\"Office2016_Patcher_For_OSX.torrent\"]
\"attach_filename\":[\"image.png\"]
\"attach_filename\":[\"Saccharomyces_cerevisiae_patent.docx\"]", "
")
| mvexpand field1
``` data emulation ```

 

0 Karma

silverKi
Path Finder

you're right. I am trying to extract fields from JSON-data.

I used botsv2 data, in "stream:smtp" sourcetype.

This is my _raw data(I try to search index="botsv2" sourcetype="stream:smtp").
The _raw data result.


{"
endtime":"2017-08-31T22:56:56.070751Z","timestamp":"2017-08-31T22:56:56.070751Z","ack_packets_in":0,"ack_packets_out":0,"bytes":72,"bytes_in":0,"bytes_out":72,"capture_hostname":"matar","client_rtt":0,"client_rtt_packets":0,"client_rtt_sum":0,"data_packets_in":0,"data_packets_out":1,"dest_ip":"172.31.38.181","dest_mac":"06:6A:51:FA:0A:B0","dest_port":25,"duplicate_packets_in":0,"duplicate_packets_out":0,"flow_id":"b6b9eb1b-e8e1-4cec-ab3c-f7223adc490a","greeting":"ip-172-31-38-181.us-west-2.compute.internal ESMTP Postfix (Ubuntu)","missing_packets_in":0,"missing_packets_out":0,"network_interface":"eth0","packets_in":0,"packets_out":1,"protocol_stack":"ip:tcp:smtp","reply_time":0,"request_ack_time":0,"request_time":0,"response_ack_time":24624,"response_code":220,"response_time":0,"sender_server":"ip-172-31-38-181.us-west-2.compute.internal","server_agent":"ESMTP Postfix (Ubuntu)","server_response":"220 ip-172-31-38-181.us-west-2.compute.internal ESMTP Postfix (Ubuntu)","server_rtt":0,"server_rtt_packets":0,"server_rtt_sum":0,"src_ip":"104.47.34.68","src_mac":"06:E3:CC:18:AA:33","src_port":37952,"time_taken":0,"transport":"tcp"}

I have one more question. The raw data results I searched with index=botsv2 sourcetype="stream:smtp" and Why are the search results with index="botsv2" sourcetype="stream:smtp" attach_filename{}="*" different? The field I want to extract exists in the search results with index="botsv2" sourcetype="stream:smtp" attach_filename{}="*".



Search Try: index="botsv2" sourcetype="stream:smtp" attach_filename{}="*"


{"endtime":"2017-08-30T15:08:00.075698Z","timestamp":"2017-08-30T15:07:59.774655Z","ack_packets_in":0,"ack_packets_out":31,"attach_disposition":["attachment"],"attach_filename":["Saccharomyces_cerevisiae_patent.docx"],"attach_size":[142540],"attach_size_decoded":[104162],"attach_transfer_encoding":["base64"],"attach_type":["application/vnd.openxmlformats-officedocument.wordprocessingml.document"],"bytes":155976,"bytes_in":155939,"bytes_out":37,"capture_hostname":"matar","client_rtt":0,"client_rtt_packets":0,"client_rtt_sum":0,"content":["DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r\n d=jacobsmythe111.onmicrosoft.com; s=selector1-froth-ly;\r\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 

 

Tags (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, you already have attach_filename{} extracted by Splunk.  No need for extra work.  Is this correct?

To answer your question about two searches, when you add an additional filter, you SHOULD expect the result to change.  It is obvious that not all events have that attach_filename{} field populated.  If you do

index="botsv2" sourcetype="stream:smtp" attach_filename{}="*"

you only select those events with this field.  Without attach_filename{}="*", you pick up every event, including those that do not have attach_filename{}.

0 Karma

silverKi
Path Finder
Then, how do I change the field name from attach_filename{} to file_name?
Tags (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

 

Then, how do I change the field name from attach_filename{} to file_name?


rename
 is your friend.

| rename attach_filename{} as filename

 

0 Karma

silverKi
Path Finder

You can change the field name with the "rename" method,
but what I wanted was for the desired field name to be searched
when I searched with just 

 

index=botsv2 sourcetype="stream:smtp"

 

---------------------------------------------------------------------------------------------------------------------------------------------

index=botsv2 sourcetype="stream:smtp" attach_filename{}="*" 

(Before,, In order to extraact file_name, I had to search for  that..)

I took a hint from your words and solved it in a different way.

Taking a hint that attach_filename{} was already extracted from splunk,
I created a lookup-file using "spath" and made it "Auto-Lookup".

Then, the field is now extracted and displayed with just index=botsv2 sourcetype="stream:smtp".

I really appreciate your help. Thank You 🙂

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You know there is a field alias feature in Splunk, too.  That is a more appropriate solution if you do really want to search by a different name.  An extra lookup is clunky and also a compute cost.

Go to Settings -> Fields -> Field aliases.  

0 Karma
Get Updates on the Splunk Community!

Fueling your curiosity with new Splunk ILT and eLearning courses

At Splunk Education, we’re driven by curiosity—both ours and yours! That’s why we’re committed to delivering ...

Splunk AI Assistant for SPL 1.1.0 | Now Personalized to Your Environment for Greater ...

Splunk AI Assistant for SPL has transformed how users interact with Splunk, making it easier than ever to ...

Unleash Unified Security and Observability with Splunk Cloud Platform

     Now Available on Microsoft AzureOn Demand Now Step boldly into the AI revolution with enhanced security ...