Splunk Search

Creating a table with unique rows base upon unique fields

Ho_Wai_Yung
Explorer

I am relatively new to the Splunk coding space so bare with me in regards to my inquiry.

Currently I am trying to create a table, each row would have the _time, host, and a unique field extracted from the entry:

_Time   Host                         Field-Type       Field-Value

00:00    Unique_Host_1   F_Type_1        F_Type_1_Value

00:00    Unique_Host_1   F_Type_2        F_Type_2_Value

00:00    Unique_Host_1   F_Type_3        F_Type_3_Value

00:00    Unique_Host_2   F_Type_1        F_Type_1_Value

00:00    Unique_Host_2   F_Type_2        F_Type_2_Value

00:00    Unique_Host_2   F_Type_3        F_Type_3_Value

..

The data given for each server:

Field-Type=F_Type_1,.....,Section=F_Type_1_Value
Field-Type=F_Type_2,.....,Section=F_Type_2_Value
Filed-Type=F_Type_3,.....,Section=F_Type_3_Value

 I have created 3 field extractions for F-Type Values:

(.|\n)*?\bF_Type_1.*?\b Section=(?<F_Type_1_Value>-?\d+)

This is what I have done so far for the table:

index="nothing" source-type="nothing" | first( F_Type_1) by host

I am not sure this is the best approach, and I can also refine the field extraction if needed.

Generally, my thought process follows:
Source
| Obtain first entries for all the hosts
| Extract fields values
| Create table


But I am currently hitting a road block in the syntax to create rows for each of the unique Field-Types and their value. 

 

Labels (3)
0 Karma
1 Solution

Ho_Wai_Yung
Explorer

The emulation works wonderfully when doing it my test environment, however when doing the emulation in the search head, the "INTERESTING FIELDS" field names and their values are overriding the extracted values:

hostComponentValue_time
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25

 

So I had to remove the auto-extracted field at the beginning

Here is the final emulation in live data:

| fields - Section_5
| dedup host
| eval data = split(_raw, "
")
| eval data = mvfilter(match(data, "^Component="))
| mvexpand data
| rename data AS _raw
| extract pairdelim=",", kvdelim="="
| rename Section_5 AS Value
| table host Component Value _time

 

Thank you so much for your help!

 

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It depends what you mean by first - if you want the first event returned by the search, this is going to be the latest as events are returned newest first - if you want the first event in time, then you could sort by _time first.

In both cases, you could then use dedup which keeps the first event for each unique field values, in your instance you want host and field type

| dedup Host Field-Type

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Do you have some custom extraction in this sourcetype that is preventing Splunk from automatically extract these fields?  With the exception of a typo in your data sample (Filed_Type should be Field_Type as the other rows), the following is an emulation

 

| makeresults | eval data = split("Field-Type=F_Type_1,.....,Section=F_Type_1_Value
Field-Type=F_Type_2,.....,Section=F_Type_2_Value
Field-Type=F_Type_3,.....,Section=F_Type_3_Value", "
")
| mvexpand data
| rename data AS _raw
| extract
``` data emulation above ```

 

Note the extract is implied in most sourcetypes.

Field_TypeSection_raw_time
F_Type_1F_Type_1_ValueField-Type=F_Type_1,.....,Section=F_Type_1_Value2024-02-13 16:15:12
F_Type_2F_Type_2_ValueField-Type=F_Type_2,.....,Section=F_Type_2_Value2024-02-13 16:15:12
F_Type_3F_Type_3_ValueField-Type=F_Type_3,.....,Section=F_Type_3_Value2024-02-13 16:15:12

Are you not getting fields Field_Type and Section (which in your illustration of desired results is just Field-Value)?  There should be no regex needed. (Also, regex is not the best tool for this rigidly formatted data.)

If you already get Field_Type and Section, the following will give you what you illustrated:

 

| sort host _time
| rename Field_Type as Field-Type, Section as Field-Value
| table _time host Field-Type Field-Value

 

0 Karma

Ho_Wai_Yung
Explorer

For clarification, I am currently using the SplunkForwarder to monitor a custom log file which auto-updates every 10 seconds. This custom log file is monitored in multiple hosts.

After looking my previous example, I incorrectly stated the data format; this is the correct data structure displayed in Splunk:

TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value

There are 4 other sections, but for brevity, the only value I am getting from each component is the last section.

If there is a better way of structuring the data so Splunk can auto detect the new fields, rather than using regex extraction, that would be wonderful.

Splunk will be getting the latest entry for each hosts:

Host

Component

Value

Time

Unique_Host_1

F_Type_1

F_Type_1_Section_5_Value

00:00:00

Unique_Host_1

F_Type_2

F_Type_2_Section_5_Value

00:00:00

Unique_Host_1

F_Type_3

F_Type_3_Section_5_Value

00:00:00

Unique_Host_2

F_Type_1

F_Type_1_Section_5_Value

00:00:00

Unique_Host_2

F_Type_2

F_Type_2_Section_5_Value

00:00:00

Unique_Host_2

F_Type_3

F_Type_3_Section_5_Value

00:00:00

.....

 

 

 

 

The Splunk table creation would be something like this:

index="hosts" sourcetype="logname"
| eval data=split("Field-Type=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Field-Type=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Field-Type=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value", "
")

stats latest(data) by host
| mxexpand
| rename Section_5 AS Value
| extract

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

There is now a conflict between the corrected mock data and the emulation pseudo code.  The former seems to imply that Component contains what you want as Field-Type, but the latter directly uses Field-Type as field name.

Let's take baby steps.  First, can you confirm that your _raw events look like, or contain something like the following emulation? In other words, the mock data you give, are they emulating _raw?

 

| makeresults
| eval data=split("Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value", "
")
| mvexpand data
| rename data AS _raw
``` emulation assuming Splunk "forgets" to extract ```

 

_raw_time
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value2024-02-14 11:10:02
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value2024-02-14 11:10:02
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value2024-02-14 11:10:02

(See how similar this is from my previous emulation? You can simply adopt the formula with the field names.)  Whether you use forwarder or some other mechanism to ingest data is not a factor in Splunk extraction.  But if Splunk does NOT give Component and Section_5, you should dig deeper with admin.  Maybe post the props.conf that contains this source type.  You can always run | extract with _raw.  But it it would be so much better if you don't have to.


TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value


Or, do you mean all these 3 (and more) lines form one single _raw event? In other words, does this emulation better resembles your _raw events?

 

| makeresults
| eval _raw="TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value"

 

_raw_time
TimeStamp Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value2024-02-14 11:20:05
0 Karma

Ho_Wai_Yung
Explorer

I'll see if I can remove the time stamps in the raw data, since it is causing parsing issues.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You do not need to remove timestamp per se.  Just let us know whether the mock data is a single, multi-line event (emulation 2) or multiple events (emulation 1)

0 Karma

Ho_Wai_Yung
Explorer

It appears no props.conf has been created, I'll talk more with the Admin.

As for the Raw Data, It's Single Multi-line event:

TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value

But in the emulation is to ignore that TimeStamp:

| makeresults
| eval data=split("Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value", "
")
| mvexpand data
| rename data AS _raw
``` emulation assuming Splunk "forgets" to extract ```

 

0 Karma

Ho_Wai_Yung
Explorer

For the one single _raw event would be the following:

eval _raw="TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value"

My apologies, I didn't include the TimeStamp since it didn't appeared important when handling evaluating the data.

Still trying to figure out the lingo for Splunk.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Multi-line explains why default Component and Section_5 do not contain all data.  Do not worry about props.conf, then.  This is what you can do:

 

| sort host _time
| eval data = split(_raw, "
")
| eval data = mvfilter(match(data, "^Component="))
| mvexpand data
| rename data AS _raw
| extract
| rename Section_5 AS Value
| table host Component Value _time

 

This is an emulation you can play with and compare with real data

 

| makeresults
| eval _raw="TimeStamp
Component=F_Type_1,.....,Section_5=F_Type_1_Section_5_Value
Component=F_Type_2,.....,Section_5=F_Type_2_Section_5_Value
Component=F_Type_3,.....,Section_5=F_Type_3_Section_5_Value"
``` data emulation above ```

 

The output is then

hostComponentValue_time
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25
 F_Type_2F_Type_2_Section_5_Value2024-02-14 21:28:25
 F_Type_3F_Type_3_Section_5_Value2024-02-14 21:28:25

Hope this helps

Tags (3)

Ho_Wai_Yung
Explorer

The emulation works wonderfully when doing it my test environment, however when doing the emulation in the search head, the "INTERESTING FIELDS" field names and their values are overriding the extracted values:

hostComponentValue_time
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25
 F_Type_1F_Type_1_Section_5_Value2024-02-14 21:28:25

 

So I had to remove the auto-extracted field at the beginning

Here is the final emulation in live data:

| fields - Section_5
| dedup host
| eval data = split(_raw, "
")
| eval data = mvfilter(match(data, "^Component="))
| mvexpand data
| rename data AS _raw
| extract pairdelim=",", kvdelim="="
| rename Section_5 AS Value
| table host Component Value _time

 

Thank you so much for your help!

 

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...