Splunk Search

Efficient way of extracting data from different log files

Mubarish
Path Finder

Hi

I have three log files which provide information for file transmission. The File transmission information is in different format for each log file. The requirement is to have one common search query for extracting the file transmission information from all the three log files. The sample events are shown below.

File 1 Success event
Source=A
2014-04-14 14:08:56,179 [xxxxx] SUCCESS: File successfully uploaded using SFTP. Filename was [xxxx.XLS]. File length was [312454]. Connected to host [xxx.com].
File 2 Success event
Source= B
14-08-28 13:28:38 [yyyy] SUCCESS: The FTP Server [yyyyy - FTP SERVER] uploaded file [yyyy.ACK] of length 792 bytes.

File 3 Success event
Source= C
2014-01-28 08:23:23,853 [24e524e5] SUCCESS: User NKI0005P downloaded or attempted to download file [zzz.xlsx] [.io.agents.imail.web.BaseHandler]

To extract the Filename from the two file
Solution1
I can use the query
Source = A or B or C| rex("(Filename was)|(uploaded file)| (download file) [(?\S+)]")

In Solution 1 ,the search query tries to match the regex "Filename was" and "uploaded file" to all the events until it finds the events matching the regex "download file" to extract the FileName for File 3.Here the unnecessary regex matching occurs which affects the time efficiency.

howcan I bring out the below mentioned scenario in framing the search query

Source = A | rex("Filename was [(?\S+)]") | table FileName
Source = B | rex("uploaded file [(?\S+)]") | table FileName
Source = C | rex("download file [(?\S+)]") | table FileName

How can I merge the above query into single query to get the field FileName from Source A, B and C so that the query will search for "Filename was" alone in Source=A events, "uploaded file" alone in Source=B events and “download file” alone in Source C to extract FileName. Please suggest.

0 Karma
1 Solution

kristian_kolb
Ultra Champion

If I were you, I would put these regexes into EXTRACTs in props.conf under their respective source/sourcetype stanza, instead of doing it via rex in the search.

props.conf

[aaa]
EXTRACT-filename_a = Filename was \[(?<filename>[^\]]+)\]

[bbb]
EXTRACT-filename_b = uploaded file \[(?<filename>[^\]]+)\]

[ccc]
EXTRACT-filename_c = download file \[(?<filename>[^\]]+)\]

Then you can make searches like;

sourcetype=aaa OR sourcetype=bbb OR sourcetype=ccc | table filename

EDIT: clarification and typos fixed.

View solution in original post

Prakashthanthon
New Member

I have similar issue where i want to extract the data using rex in two different sources. the field that i want to extract is same in both sources. i want that data to be extracted from these sources to plot a graph to compare the count between two.

can anyone please suggest a way to achieve this without touching props.conf file,I have similar problem, can i achieve my need without touching props.conf ?

0 Karma

kristian_kolb
Ultra Champion

If I were you, I would put these regexes into EXTRACTs in props.conf under their respective source/sourcetype stanza, instead of doing it via rex in the search.

props.conf

[aaa]
EXTRACT-filename_a = Filename was \[(?<filename>[^\]]+)\]

[bbb]
EXTRACT-filename_b = uploaded file \[(?<filename>[^\]]+)\]

[ccc]
EXTRACT-filename_c = download file \[(?<filename>[^\]]+)\]

Then you can make searches like;

sourcetype=aaa OR sourcetype=bbb OR sourcetype=ccc | table filename

EDIT: clarification and typos fixed.

kristian_kolb
Ultra Champion

can be:
1. , the source type of an event.
2. host::, where is the host, or host-matching pattern, for an event.
3. source::, where is the source, or source-matching pattern, for an event.
4. rule::, where is a unique name of a source type classification rule.
5. delayedrule::, where is a unique name of a delayed source type
classification rule.
These are only considered as a last resort before generating a new source type based on the
source seen.

0 Karma

kristian_kolb
Ultra Champion

[]
* This stanza enables properties for a given .
* A props.conf file can contain multiple stanzas for any number of different .
* Follow this stanza name with any number of the following attribute/value pairs, as appropriate
for what you want to do.
* If you do not set an attribute for a given , the default is used.

0 Karma

kristian_kolb
Ultra Champion

For a sourcetype, the syntax is [insert_sourcetype_here]
For a source or host, the actual value will need to be preceded by source:: or host:: respectively, e.g. [source::c:\blah.log]

See the docs on props.conf which discusses this in the beginning of the page.

http://docs.splunk.com/Documentation/Splunk/6.1.3/Admin/propsconf

0 Karma

Mubarish
Path Finder

here [aaa] can be sourcetype or source log file ?
for example [C:\Data\xxxx\xxx.txt]

0 Karma

kristian_kolb
Ultra Champion

It will be even more difficult doing it inline in the search. The point of extracting fields on a per-sourcetype basis (through config files) is that the regexes only get applied to events of that sourcetype.

This is one of the main reasons behind the concept of sourcetypes - events that share a common format are to be considered a sourcetype, because configurations like field extraction can be set on the sourcetype level, rather than host or source/path level - or a combination thereof.

/k

0 Karma

Mubarish
Path Finder

ok. but the given events are just a sample but in real scenario i need to extract 16 fields from each event. It will be difficult if i am adding all those fields in the props.conf

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...