Knowledge Management

Fields using regular expressions

silverKi
Path Finder

I would like to automatically extract fields using props.conf.
When there is a pattern like the one below, what I want to extract is each file name. attach_filename:[""] contains one or two file names.
How can I extract all file names?

 

"attach_filename":["image.png","GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent"]
"attach_filename":["image.png","Office2016_Patcher_For_OSX.torrent"]
"attach_filename":["image.png"]
"attach_filename":["Saccharomyces_cerevisiae_patent.docx"]

 

field extract will be store file_name

 

file_name : image.png, 
Saccharomyces_cerevisiae_patent.docx, 
GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent, Office2016_Patcher_For_OSX.torrent

 

Labels (2)
Tags (2)
0 Karma
1 Solution

silverKi
Path Finder

You can change the field name with the "rename" method,
but what I wanted was for the desired field name to be searched
when I searched with just 

 

index=botsv2 sourcetype="stream:smtp"

 

---------------------------------------------------------------------------------------------------------------------------------------------

index=botsv2 sourcetype="stream:smtp" attach_filename{}="*" 

(Before,, In order to extraact file_name, I had to search for  that..)

I took a hint from your words and solved it in a different way.

Taking a hint that attach_filename{} was already extracted from splunk,
I created a lookup-file using "spath" and made it "Auto-Lookup".

Then, the field is now extracted and displayed with just index=botsv2 sourcetype="stream:smtp".

I really appreciate your help. Thank You 🙂

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

My first reaction is: regex is the wrong solution.  This looks like part of a JSON document.  Treating structured data as text string is just calling for trouble down the road.  Can you share raw events? (Anonymize as needed.)

Or, if this is a developer's joke, and you only have this string in a field, let's call it field1, you can still use Splunk's JSON capability to extract data.  It's much more robust.  Something like this:

 

| eval field1 = "{" . field1 . "}"
| spath input=field1

 

Your mock data will give

attach_filename{}
field1
image.png
GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent
{"attach_filename":["image.png","GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent"]}
image.png
Office2016_Patcher_For_OSX.torrent
{"attach_filename":["image.png","Office2016_Patcher_For_OSX.torrent"]}
image.png{"attach_filename":["image.png"]}
Saccharomyces_cerevisiae_patent.docx{"attach_filename":["Saccharomyces_cerevisiae_patent.docx"]}

Here is an emulation you can play with and compare with real data, if your developers really play such a joke.

 

| makeresults
| fields - _*
| eval field1 = split("\"attach_filename\":[\"image.png\",\"GoT.S7E2.BOTS.BOTS.BOTS.mkv.torrent\"]
\"attach_filename\":[\"image.png\",\"Office2016_Patcher_For_OSX.torrent\"]
\"attach_filename\":[\"image.png\"]
\"attach_filename\":[\"Saccharomyces_cerevisiae_patent.docx\"]", "
")
| mvexpand field1
``` data emulation ```

 

0 Karma

silverKi
Path Finder

you're right. I am trying to extract fields from JSON-data.

I used botsv2 data, in "stream:smtp" sourcetype.

This is my _raw data(I try to search index="botsv2" sourcetype="stream:smtp").
The _raw data result.


{"
endtime":"2017-08-31T22:56:56.070751Z","timestamp":"2017-08-31T22:56:56.070751Z","ack_packets_in":0,"ack_packets_out":0,"bytes":72,"bytes_in":0,"bytes_out":72,"capture_hostname":"matar","client_rtt":0,"client_rtt_packets":0,"client_rtt_sum":0,"data_packets_in":0,"data_packets_out":1,"dest_ip":"172.31.38.181","dest_mac":"06:6A:51:FA:0A:B0","dest_port":25,"duplicate_packets_in":0,"duplicate_packets_out":0,"flow_id":"b6b9eb1b-e8e1-4cec-ab3c-f7223adc490a","greeting":"ip-172-31-38-181.us-west-2.compute.internal ESMTP Postfix (Ubuntu)","missing_packets_in":0,"missing_packets_out":0,"network_interface":"eth0","packets_in":0,"packets_out":1,"protocol_stack":"ip:tcp:smtp","reply_time":0,"request_ack_time":0,"request_time":0,"response_ack_time":24624,"response_code":220,"response_time":0,"sender_server":"ip-172-31-38-181.us-west-2.compute.internal","server_agent":"ESMTP Postfix (Ubuntu)","server_response":"220 ip-172-31-38-181.us-west-2.compute.internal ESMTP Postfix (Ubuntu)","server_rtt":0,"server_rtt_packets":0,"server_rtt_sum":0,"src_ip":"104.47.34.68","src_mac":"06:E3:CC:18:AA:33","src_port":37952,"time_taken":0,"transport":"tcp"}

I have one more question. The raw data results I searched with index=botsv2 sourcetype="stream:smtp" and Why are the search results with index="botsv2" sourcetype="stream:smtp" attach_filename{}="*" different? The field I want to extract exists in the search results with index="botsv2" sourcetype="stream:smtp" attach_filename{}="*".



Search Try: index="botsv2" sourcetype="stream:smtp" attach_filename{}="*"


{"endtime":"2017-08-30T15:08:00.075698Z","timestamp":"2017-08-30T15:07:59.774655Z","ack_packets_in":0,"ack_packets_out":31,"attach_disposition":["attachment"],"attach_filename":["Saccharomyces_cerevisiae_patent.docx"],"attach_size":[142540],"attach_size_decoded":[104162],"attach_transfer_encoding":["base64"],"attach_type":["application/vnd.openxmlformats-officedocument.wordprocessingml.document"],"bytes":155976,"bytes_in":155939,"bytes_out":37,"capture_hostname":"matar","client_rtt":0,"client_rtt_packets":0,"client_rtt_sum":0,"content":["DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r\n d=jacobsmythe111.onmicrosoft.com; s=selector1-froth-ly;\r\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 

 

Tags (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, you already have attach_filename{} extracted by Splunk.  No need for extra work.  Is this correct?

To answer your question about two searches, when you add an additional filter, you SHOULD expect the result to change.  It is obvious that not all events have that attach_filename{} field populated.  If you do

index="botsv2" sourcetype="stream:smtp" attach_filename{}="*"

you only select those events with this field.  Without attach_filename{}="*", you pick up every event, including those that do not have attach_filename{}.

0 Karma

silverKi
Path Finder
Then, how do I change the field name from attach_filename{} to file_name?
Tags (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

 

Then, how do I change the field name from attach_filename{} to file_name?


rename
 is your friend.

| rename attach_filename{} as filename

 

0 Karma

silverKi
Path Finder

You can change the field name with the "rename" method,
but what I wanted was for the desired field name to be searched
when I searched with just 

 

index=botsv2 sourcetype="stream:smtp"

 

---------------------------------------------------------------------------------------------------------------------------------------------

index=botsv2 sourcetype="stream:smtp" attach_filename{}="*" 

(Before,, In order to extraact file_name, I had to search for  that..)

I took a hint from your words and solved it in a different way.

Taking a hint that attach_filename{} was already extracted from splunk,
I created a lookup-file using "spath" and made it "Auto-Lookup".

Then, the field is now extracted and displayed with just index=botsv2 sourcetype="stream:smtp".

I really appreciate your help. Thank You 🙂

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You know there is a field alias feature in Splunk, too.  That is a more appropriate solution if you do really want to search by a different name.  An extra lookup is clunky and also a compute cost.

Go to Settings -> Fields -> Field aliases.  

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...