Splunk Search

Extract fields or define eventypes -- Which is better for large logs?

archananaveen
Explorer

Hi There,

I have huge logs and there is not a definite pattern in the logs. Should I sit down to add each and every eventtype or just have the data stored in some fields instead.

Please, could you guide me to see the good and bad of doing field extractions against defining eventtypes?

Regards,

0 Karma

Richfez
SplunkTrust
SplunkTrust

Fields and eventtypes are not really the same things. So much so that they don't do the same things.

Fields are what happens when you take raw data and parse it into discrete fields and values. If they are in anything remotely like key=value pairs, this happens automatically for most/many events. Otherwise you can define you own with delimited data,regex extractions or other methods. In any case, fields are contained in your data and aren't separate from it.

For instance, perhaps you have a log file with key value pairs set up in a dumb way so they're not automatically extracted, let's pretend this: "abort; This is an abort message meaning the process stopped immediately". You may make fields out of this called "signal" (which equals abort) and "message" (which equals "This is ... " to the end). But they're part of the event already, you are just defining them with an extraction, rex, or whatever so they have a name. Again - they're ALREADY in there. It's not new data.

Eventtypes are searches that define a type of event based on certain criteria and are thus "added" to the data. It's a way to semantically search for "Errors" even if the actual data never says it's an error, because it's an additional piece of information you provide/define on that event due to what's in the event. You do this with a search. The search to define an eventtype can be simple string searching, or it can be more complex (i.e. using fields), but either way, it's adding data to the event.

Using the same example, you could define an eventtype of "error" for events that have a signal of abort. You could also assign an eventtype of "error" if the events include the word error, too. So if you search for eventtype=error, you'll get not only events with the word error, but also those ones YOU defined as being an error because it contained a field signal that had a value of abort. That's exactly what we did with that example - it wouldn't match "error" because it doesn't have a string "error" in it, but it can match an eventtype of "error" because you added that data tag to the event.

So, your question had a pretty severe lack of real information about your events so I can only guess, but my guess (being what I do with 99% of our own data) would be to define BOTH fields and eventtypes.

archananaveen
Explorer

Thanks Rich. Below are my sample logs. The rex works fine with what I want: (?^[.])\s+(?.?<)+(?.?)>\s+(?.?)):\s+(?.*)$ But doesn't on a SPL. Is there any tool available to convert rex to splunk language? I feel, now I am confident with the eventtypes and fields. Thank you so much for your support in my journey.

[DEBUG] 2015-06-09 20:00:37.630/619046.916 Oracle Coherence GE 1.2.1.0.1 (thread=Proxy:CRAZY-ExtendTcpProxyService:TcpAcceptorWorker:9, member=9): An exception occurred while processing a QueryRequest for Service=Proxy:CRAZY-ExtendTcpProxyService:TcpAcceptor: Portable(com.tangosol.util.WrapperException): (Wrapped: Failed request execution for CRAZY-PartitionedCacheService service on Member(Id=4, Timestamp=2017-06-02 16:03:51.182, Address=198.18.40.74:5701, MachineId=3591, Location=machine:dmpra02a0605,process:25350,member:1CRAZY-DUMMY-Partitioned-node1, Role=CoherenceServer) (Wrapped) unknown user type: 1502) unknown user type: 1502

[ERROR] 2015-10-20 06:35:44.357/88948.725 Oracle Coherence GE 1.2.1.2.0 (thread=Abandon, member=1): A worker thread "Worker:0 executing task "com.tangosol.util.fsm.NonBlockingFiniteStateMachine$Task@1745773c", did not respond to 8 interrupt requests. The execution was canceled. The thread is abandoned...

[WARN ] 2017-08-06 00:20:11.558/86627.452 Oracle Coherence GE 1.2.1.0.3 (thread=Worker:0, member=4): Exception '(Wrapped: Exception vetoed by "".) java.lang.RuntimeException: Failed to flush buffered commits' while sending journal records to participant MERCUAT2. Retry count 1
[WARN ] 2017-08-06 00:24:46.304/86902.198 Oracle Coherence GE 1.2.1.0.3 (thread=Recovery Thread, member=4): Attempting recovery of Guard{Daemon=Worker:0}

0 Karma

Richfez
SplunkTrust
SplunkTrust

Why not use the Field Extractor?

I would suggest if you are unfamiliar with it you'll want to test on either a test system that has that data, or at least create a new app, change that app context and build them there - that way you have an additional "I can search in this app for my records and see if the extractions really do what I want" option before changing permissions on them and letting everyone/all the app have them.

BTW - You'll want to use regular expression based extractions. Otherwise, just carefully follow the most excellent documentation. If you have specific questions, paste a screenshot (or carefully describe it) and we'll see what we can do!

Also note that after you've let the field extractor create whatever it's creating, you can see the regex it produces. It's not pretty, but usually gets the job done - in any case, you can fix it if you want then.

0 Karma

archananaveen
Explorer

Yes, Thank you. It's all so easy then what seemed in the begining.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Speaking of which, if you'd like to proved some examples I am sure we can help make this job easier for you and offer suggestions!

0 Karma

archananaveen
Explorer

Thanks Rich for your help. Below are the sample logs. I appreciate your input!

2014-05-11 18:24:05,957 WARN [Logger@9233091 3.3.1/389] [Coherence] log 2014-05-11 18:24:05.957 Oracle Coherence GE 3.3.1/389 (thread=PacketPublisher, member=2): A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after 45 seconds, although other packets were acknowledged by the same cluster member (Member(Id=1, Timestamp=2014-05-11 18:17:48.519, Address=xxx.xxx.xxx.x:8090, MachineId=12345, Location=process:1234@CONFLUENCE01)) to this member (Member(Id=2, Timestamp=2014-05-11 18:23:16.19, Address=xxx.xxx.xxx.x:8090, MachineId=67891, Location=process:1234@CONFLUENCE02)) as recently as 0 seconds ago. It is possible that the packet size greater than 1468 is responsible; for example, some network equipment cannot handle packets larger than 1472 bytes (IPv4) or 1468 bytes (IPv6). Use the 'ping' command with the option to verify successful delivery of specifically sized packets. Other possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times.

2014-05-11 18:13:49,218 WARN [Logger@9226875 3.3.1/389] [Coherence] log 2014-05-11 18:13:49.218 Oracle Coherence GE 3.3.1/389 (thread=PacketPublisher, member=2): Timeout while delivering a packet; the member appears to be alive, but exhibits long periods of unresponsiveness; removing Member(Id=1, Timestamp=2014-05-11 18:09:52.641, Address=xxx.xxx.xxx.x:8090, MachineId=41352, Location=process:1234@CONFLUENCE01)

2014-05-11 18:13:49,249 INFO [Cluster:EventDispatcher] [confluence.cluster.coherence.TangosolClusterManager] memberLeft Member has left cluster: Member(Id=1, Timestamp=2014-05-11 18:13:49.218, Address=xxx.xxx.xxx.x:8090, MachineId=12345, Location=process:1234@CONFLUENCE01) 2014-05-11 18:13:49,436 WARN [Logger@9226875 3.3.1/389] [Coherence] log 2014-05-11 18:13:49.436 Oracle Coherence GE 3.3.1/389 (thread=Cluster, member=2): The member formerly known as Member(Id=1, Timestamp=2014-05-11 18:13:49.218, Address=xxx.xxx.xxx.x:8090, MachineId=12345, Location=process:1234@CONFLUENCE01) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Did you paste 3 sample logs, or 4? (I see 4, three WARN and one INFO).

You should have a lot of fields extracted automatically, like Timestamp, thread, member, MachineID and so on. Is that right?

From either a view of "additional fields" or of eventtypes, what would you be looking for?

And, does it seem reasonable that all WARN seem to follow a similar pattern? If so, they seem to be like this:

(?<FieldA>WARN)\s+\[(?<FieldB>[^\]]*)\]\s+\[(?<FieldC>[^\]]*)\]\s+(?<FieldD>log)\s(?<FieldE>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s(?<FieldF>.*):\s+(?<MessageText>.*)$

(You can see at this saved search at regex101.com )

INFO events seem to follow a different pattern, though it's not entirely dissimilar.

The info on what, in those four cases, you'd want extracted as field or tagged into eventtypes would be great.

0 Karma
*NEW* Splunk Love Promo!
Snag a $25 Visa Gift Card for Giving Your Review!

It's another Splunk Love Special! For a limited time, you can review one of our select Splunk products through Gartner Peer Insights and receive a $25 Visa gift card!

Review:





Or Learn More in Our Blog >>