Solved: Re: Creating a table with unique rows base upon un...

Ho_Wai_Yung · ‎02-13-2024

I am relatively new to the Splunk coding space so bare with me in regards to my inquiry.

Currently I am trying to create a table, each row would have the _time, host, and a unique field extracted from the entry:

_Time Host Field-Type Field-Value

00:00 Unique_Host_1 F_Type_1 F_Type_1_Value

00:00 Unique_Host_1 F_Type_2 F_Type_2_Value

00:00 Unique_Host_1 F_Type_3 F_Type_3_Value

00:00 Unique_Host_2 F_Type_1 F_Type_1_Value

00:00 Unique_Host_2 F_Type_2 F_Type_2_Value

00:00 Unique_Host_2 F_Type_3 F_Type_3_Value

..

The data given for each server:

Field-Type=F_Type_1,.....,Section=F_Type_1_Value
Field-Type=F_Type_2,.....,Section=F_Type_2_Value
Filed-Type=F_Type_3,.....,Section=F_Type_3_Value

I have created 3 field extractions for F-Type Values:

(.|\n)*?\bF_Type_1.*?\b Section=(?<F_Type_1_Value>-?\d+)

This is what I have done so far for the table:

index="nothing" source-type="nothing" | first( F_Type_1) by host

I am not sure this is the best approach, and I can also refine the field extraction if needed.

Generally, my thought process follows:
Source
| Obtain first entries for all the hosts
| Extract fields values
| Create table

But I am currently hitting a road block in the syntax to create rows for each of the unique Field-Types and their value.

Ho_Wai_Yung · ‎02-14-2024

The emulation works wonderfully when doing it my test environment, however when doing the emulation in the search head, the "INTERESTING FIELDS" field names and their values are overriding the extracted values:

host	Component	Value	_time
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25

So I had to remove the auto-extracted field at the beginning

Here is the final emulation in live data:

| fields - Section_5
| dedup host
| eval data = split(_raw, "
")
| eval data = mvfilter(match(data, "^Component="))
| mvexpand data
| rename data AS _raw
| extract pairdelim=",", kvdelim="="
| rename Section_5 AS Value
| table host Component Value _time

Thank you so much for your help!

View solution in original post

ITWhisperer · ‎02-14-2024

It depends what you mean by first - if you want the first event returned by the search, this is going to be the latest as events are returned newest first - if you want the first event in time, then you could sort by _time first.

In both cases, you could then use dedup which keeps the first event for each unique field values, in your instance you want host and field type

| dedup Host Field-Type

yuanliu · ‎02-13-2024

Do you have some custom extraction in this sourcetype that is preventing Splunk from automatically extract these fields? With the exception of a typo in your data sample (Filed_Type should be Field_Type as the other rows), the following is an emulation

| makeresults | eval data = split("Field-Type=F_Type_1,.....,Section=F_Type_1_Value
Field-Type=F_Type_2,.....,Section=F_Type_2_Value
Field-Type=F_Type_3,.....,Section=F_Type_3_Value", "
")
| mvexpand data
| rename data AS _raw
| extract
``` data emulation above ```

Note the extract is implied in most sourcetypes.

Field_Type	Section	_raw	_time
F_Type_1	F_Type_1_Value	Field-Type=F_Type_1,.....,Section=F_Type_1_Value	2024-02-13 16:15:12
F_Type_2	F_Type_2_Value	Field-Type=F_Type_2,.....,Section=F_Type_2_Value	2024-02-13 16:15:12
F_Type_3	F_Type_3_Value	Field-Type=F_Type_3,.....,Section=F_Type_3_Value	2024-02-13 16:15:12

Are you not getting fields Field_Type and Section (which in your illustration of desired results is just Field-Value)? There should be no regex needed. (Also, regex is not the best tool for this rigidly formatted data.)

If you already get Field_Type and Section, the following will give you what you illustrated:

| sort host _time
| rename Field_Type as Field-Type, Section as Field-Value
| table _time host Field-Type Field-Value

Ho_Wai_Yung · ‎02-14-2024

For clarification, I am currently using the SplunkForwarder to monitor a custom log file which auto-updates every 10 seconds. This custom log file is monitored in multiple hosts.

After looking my previous example, I incorrectly stated the data format; this is the correct data structure displayed in Splunk:

TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value

There are 4 other sections, but for brevity, the only value I am getting from each component is the last section.

If there is a better way of structuring the data so Splunk can auto detect the new fields, rather than using regex extraction, that would be wonderful.

Splunk will be getting the latest entry for each hosts:

Host	Component	Value	Time
Unique_Host_1	F_Type_1	F_Type_1_Section_5_Value	00:00:00
Unique_Host_1	F_Type_2	F_Type_2_Section_5_Value	00:00:00
Unique_Host_1	F_Type_3	F_Type_3_Section_5_Value	00:00:00
Unique_Host_2	F_Type_1	F_Type_1_Section_5_Value	00:00:00
Unique_Host_2	F_Type_2	F_Type_2_Section_5_Value	00:00:00
Unique_Host_2	F_Type_3	F_Type_3_Section_5_Value	00:00:00
.....

The Splunk table creation would be something like this:

index="hosts" sourcetype="logname"
| eval data=split("Field-Type=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Field-Type=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Field-Type=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value", "
")
| stats latest(data) by host
| mxexpand
| rename Section_5 AS Value
| extract

yuanliu · ‎02-14-2024

There is now a conflict between the corrected mock data and the emulation pseudo code. The former seems to imply that Component contains what you want as Field-Type, but the latter directly uses Field-Type as field name.

Let's take baby steps. First, can you confirm that your _raw events look like, or contain something like the following emulation? In other words, the mock data you give, are they emulating _raw?

| makeresults
| eval data=split("Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value", "
")
| mvexpand data
| rename data AS _raw
``` emulation assuming Splunk "forgets" to extract ```

_raw	_time
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value	2024-02-14 11:10:02
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value	2024-02-14 11:10:02
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value	2024-02-14 11:10:02

(See how similar this is from my previous emulation? You can simply adopt the formula with the field names.) Whether you use forwarder or some other mechanism to ingest data is not a factor in Splunk extraction. But if Splunk does NOT give Component and Section_5, you should dig deeper with admin. Maybe post the props.conf that contains this source type. You can always run | extract with _raw. But it it would be so much better if you don't have to.

TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value

Or, do you mean all these 3 (and more) lines form one single _raw event? In other words, does this emulation better resembles your _raw events?

| makeresults
| eval _raw="TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value"

_raw	_time
TimeStamp Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value	2024-02-14 11:20:05

Ho_Wai_Yung · ‎02-14-2024

I'll see if I can remove the time stamps in the raw data, since it is causing parsing issues.

yuanliu · ‎02-14-2024

You do not need to remove timestamp per se. Just let us know whether the mock data is a single, multi-line event (emulation 2) or multiple events (emulation 1)

Ho_Wai_Yung · ‎02-14-2024

It appears no props.conf has been created, I'll talk more with the Admin.

As for the Raw Data, It's Single Multi-line event:

TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value

But in the emulation is to ignore that TimeStamp:

| makeresults
| eval data=split("Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value", "
")
| mvexpand data
| rename data AS _raw
``` emulation assuming Splunk "forgets" to extract ```

Ho_Wai_Yung · ‎02-14-2024

For the one single _raw event would be the following:

eval _raw="TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value"

My apologies, I didn't include the TimeStamp since it didn't appeared important when handling evaluating the data.

Still trying to figure out the lingo for Splunk.

yuanliu · ‎02-14-2024

Multi-line explains why default Component and Section_5 do not contain all data. Do not worry about props.conf, then. This is what you can do:

| sort host _time
| eval data = split(_raw, "
")
| eval data = mvfilter(match(data, "^Component="))
| mvexpand data
| rename data AS _raw
| extract
| rename Section_5 AS Value
| table host Component Value _time

This is an emulation you can play with and compare with real data

| makeresults
| eval _raw="TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value"
``` data emulation above ```

The output is then

host	Component	Value	_time
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25
	F_Type_2	F_Type_2_Section_5_Value	2024-02-14 21:28:25
	F_Type_3	F_Type_3_Section_5_Value	2024-02-14 21:28:25

Hope this helps

Ho_Wai_Yung · ‎02-14-2024

The emulation works wonderfully when doing it my test environment, however when doing the emulation in the search head, the "INTERESTING FIELDS" field names and their values are overriding the extracted values:

host	Component	Value	_time
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25
	F_Type_1	F_Type_1_Section_5_Value	2024-02-14 21:28:25

So I had to remove the auto-extracted field at the beginning

Here is the final emulation in live data:

| fields - Section_5
| dedup host
| eval data = split(_raw, "
")
| eval data = mvfilter(match(data, "^Component="))
| mvexpand data
| rename data AS _raw
| extract pairdelim=",", kvdelim="="
| rename Section_5 AS Value
| table host Component Value _time

Thank you so much for your help!

Creating a table with unique rows base upon unique fields

field extraction

regex

table

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

AppDynamics is now part of Splunk Ideas

Advanced Splunk Data Management Strategies