Solved: Log Field extraction When should i do it?

linu1988 · ‎04-05-2014

Hello Guyz,
I have to extract around 30/40 fields from logs and monitor them. They are well formatted and can be extracted easily through regex. I am just concerned where should i do it?

While indexing the logs or while searching? I mean keeping an eye on performance.

Sample Data

[Date][PreciseTime][Time][Pid][Tid][SrcFile][Function][TransactionID][AgentName][Resource][User][Group][Realm][Domain][Directory][Policy][AgentType][Rule][ErrorValue][ReturnValue][ErrorString][IPAddr][IPPort][Result][Returns][CallDetail][Data][Message] 
[====][===========][====][===][===][=======][========][=============][=========][========][====][=====][=====][======][=========][][][][==========][===========][===========][======][======][======][=======][==========][====][=======]

where "===" -> the data. It may have or may bot have value.

| rex field=_raw "\[(?<DDD>[\d\/]+)\]\[(?<DDD1>[\d:\.]+)\]\[(?<DDD2>[\d\:]+)\]\[(?<DDD3>[\d]+)\]\[(?<DDD4>[\d]+)\]\[(?<DDD5>[A-Za-z_\.\:\d]+)\]"|table DDD,DDD1,DDD2,DDD3,DDD4,DDD5

Just Planing the regex as well for them. Is that okay to set while indexing. And how do i mention something in the [] than [A-Za-z_.:\d] where i may miss some character?

Any kind of suggestion is welcome.

Thank you

martin_mueller · ‎04-05-2014

In almost every case you'll want search time extractions, simple ones as EXTRACT-foo and more complex ones as REPORT-bar with a corresponding transforms.conf stanza [bar]. Only use indexed fields if you have a good reason to, such as values that commonly exist outside a field killing searchtime filtering performance.

As for your character classes, consider using [^]]* for your data fields to match until before the closing square bracket.

View solution in original post

martin_mueller · ‎04-05-2014

In almost every case you'll want search time extractions, simple ones as EXTRACT-foo and more complex ones as REPORT-bar with a corresponding transforms.conf stanza [bar]. Only use indexed fields if you have a good reason to, such as values that commonly exist outside a field killing searchtime filtering performance.

As for your character classes, consider using [^]]* for your data fields to match until before the closing square bracket.

linu1988 · ‎04-06-2014

I need to put stats from the extracted the fields from the logs. As you suggested i will go with search time extraction seems flexible and i will see if there is frequent use i will schedule the search. Thank you for your help.

martin_mueller · ‎04-05-2014

Indextime field extractions will put some load on your indexer, yeah - but the bigger disadvantage I see is that you lose the flexibility of Splunk's schema-on-the-fly searchtime extractions.

As for dashboards, those launch regular searches so it doesn't matter much if a search is on a dashboard or not. If you have a high number of users frequently loading the same dashboard with identical searches you're often better off just scheduling the searches behind the dashboard.

What's best for your case depends on your case though.

linu1988 · ‎04-05-2014

Thanks Martin. So how if i do it in index time, will the load on the index will be more? And when the extraction happens at search with every use is it a good approach for dashboards? I have no intention of summarizing them as they would be just reference for 1-3 days

Log Field extraction When should i do it?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

Log Field extraction When should i do it?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!