Splunk Search

Log Field extraction When should i do it?

Champion

Hello Guyz,
I have to extract around 30/40 fields from logs and monitor them. They are well formatted and can be extracted easily through regex. I am just concerned where should i do it?

While indexing the logs or while searching? I mean keeping an eye on performance.

Sample Data

[Date][PreciseTime][Time][Pid][Tid][SrcFile][Function][TransactionID][AgentName][Resource][User][Group][Realm][Domain][Directory][Policy][AgentType][Rule][ErrorValue][ReturnValue][ErrorString][IPAddr][IPPort][Result][Returns][CallDetail][Data][Message] 
[====][===========][====][===][===][=======][========][=============][=========][========][====][=====][=====][======][=========][][][][==========][===========][===========][======][======][======][=======][==========][====][=======]

where "===" -> the data. It may have or may bot have value.

| rex field=_raw "\[(?<DDD>[\d\/]+)\]\[(?<DDD1>[\d:\.]+)\]\[(?<DDD2>[\d\:]+)\]\[(?<DDD3>[\d]+)\]\[(?<DDD4>[\d]+)\]\[(?<DDD5>[A-Za-z_\.\:\d]+)\]"|table DDD,DDD1,DDD2,DDD3,DDD4,DDD5

Just Planing the regex as well for them. Is that okay to set while indexing. And how do i mention something in the [] than [A-Za-z_.:\d] where i may miss some character?

Any kind of suggestion is welcome.

Thank you

0 Karma
1 Solution

SplunkTrust
SplunkTrust

In almost every case you'll want search time extractions, simple ones as EXTRACT-foo and more complex ones as REPORT-bar with a corresponding transforms.conf stanza [bar]. Only use indexed fields if you have a good reason to, such as values that commonly exist outside a field killing searchtime filtering performance.

As for your character classes, consider using [^]]* for your data fields to match until before the closing square bracket.

View solution in original post

SplunkTrust
SplunkTrust

In almost every case you'll want search time extractions, simple ones as EXTRACT-foo and more complex ones as REPORT-bar with a corresponding transforms.conf stanza [bar]. Only use indexed fields if you have a good reason to, such as values that commonly exist outside a field killing searchtime filtering performance.

As for your character classes, consider using [^]]* for your data fields to match until before the closing square bracket.

View solution in original post

Champion

I need to put stats from the extracted the fields from the logs. As you suggested i will go with search time extraction seems flexible and i will see if there is frequent use i will schedule the search. Thank you for your help.

0 Karma

SplunkTrust
SplunkTrust

Indextime field extractions will put some load on your indexer, yeah - but the bigger disadvantage I see is that you lose the flexibility of Splunk's schema-on-the-fly searchtime extractions.

As for dashboards, those launch regular searches so it doesn't matter much if a search is on a dashboard or not. If you have a high number of users frequently loading the same dashboard with identical searches you're often better off just scheduling the searches behind the dashboard.

What's best for your case depends on your case though.

0 Karma

Champion

Thanks Martin. So how if i do it in index time, will the load on the index will be more? And when the extraction happens at search with every use is it a good approach for dashboards? I have no intention of summarizing them as they would be just reference for 1-3 days

0 Karma