Monitoring Splunk

waterfall saved search

MaryvonneMB
Path Finder

Hi all!

I've got problem. It's a bit complex to explain, but I hope I'll be as clear as possible.
First I want to generate a file based on 2 differents inputs. One of these file is big (about 400,000 lines) and the other one is about 100,000 lines. To correctly generate my new file I have to do 4 join and then an append (to stille have lines that doesn't match). So the result is a file wich is sum of these 2 inputs (about 500,000lines).

After that we want to index this file. Currently I create my file as a CSV based on these 2 inputs which are CSV too. LaterI index my result file.
What my team wants is using the 2 inputs (which for other reason will already be indexed) to generate the result file and index it directly (So we delete our result CSV file).

here is my global request:

|inputlookup firstInputFile.csv
|rex field="fieldA" mode=sed "s/^0{10}//g" 
| rex field="fieldB" mode=sed "s/^0{10}//g" 

| join type=left max=1 "fieldA" [ 
|inputlookup secondInputFile.csv 
|table "fieldC"
|eval "fieldA"='fieldC' 
| eval "new_column"="X"
] 

| join type=left max=1 "fieldB" [
|inputlookup seconInputFile.csv 
|table "fieldC"
|eval "fieldB"='fieldC' 
|eval "new_column"="X"
]

| eval "new_column"=if('new_column'="X","X","Y")

| join type=left max=1 "fieldA" [
| inputlookup thirdInputFile.csv  | search "fieldD"!=""
| table some fields
| eval "fieldA"='fieldD'
]

| join type=left max=1 "fieldB" [
| inputlookup  thirdInputFile.csv  | search "fieldE"!=""
| table some fields
| eval "fieldB"='fieldE' 
] 

| join type=left max=1 "fieldB" [
| inputlookup  thirdInputFile.csv  |search "fieldD"!=""
| table some fields
| eval "fieldB"='fieldD'
]

| join type=left max=1 "fieldA" [
| inputlookup  thirdInputFile.csv  |search "fieldE"!=""
| table some fields
| eval "fieldA"='fieldE'
]

| append [
|inputlookup thirdInputFile.csv 
| rename "fieldD" as "fieldA" "fieldE" as "fieldB" 
| table some fields
]

| fields - some fields
| rex mode=sed field="fieldF" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldG" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldH" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldI" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldJ" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldK" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"

When I run these request and add |collect index="myindex" sourcetype="mysourcetype" source="mysource" the task ended during results finalization and splunk disconnect. If i replace the collect by only a table * it also crash. It succeed if I delete at least the append and one join. So I though it is possible to cut these big request to have temporary results used by the next part of the request. As I want to automate this process I look around saved searches but I don't really see how I can solve my problem.

If anyone has an idea, thanks in advance!

0 Karma
1 Solution

woodcock
Esteemed Legend

Try this (EXACTLY as-is; translate your field names but do not change any commands):

|inputlookup firstInputFile.csv | eval DATASET="LeftJoin"
|appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldA=fieldC 
   | eval new_column="X"
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldB=fieldC
   | eval new_column="X"
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| eval new_column=coalesce(new_column, "Y")

| appendpipe [| inputlookup thirdInputFile.csv
  | search "fieldD"!=""
  | eval fieldA=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldB=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  |search "fieldD"!=""
  | eval fieldB=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldA=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup thirdInputFile.csv 
  | rename fieldD AS fieldA
                    fieldE AS fieldB
  | table some fields
]

| fields - some fields
| rex mode=sed field=fieldF "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldG "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldH "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldI "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldJ "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldK "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"

View solution in original post

woodcock
Esteemed Legend

Try this (EXACTLY as-is; translate your field names but do not change any commands):

|inputlookup firstInputFile.csv | eval DATASET="LeftJoin"
|appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldA=fieldC 
   | eval new_column="X"
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldB=fieldC
   | eval new_column="X"
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| eval new_column=coalesce(new_column, "Y")

| appendpipe [| inputlookup thirdInputFile.csv
  | search "fieldD"!=""
  | eval fieldA=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldB=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  |search "fieldD"!=""
  | eval fieldB=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldA=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup thirdInputFile.csv 
  | rename fieldD AS fieldA
                    fieldE AS fieldB
  | table some fields
]

| fields - some fields
| rex mode=sed field=fieldF "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldG "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldH "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldI "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldJ "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldK "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"

MaryvonneMB
Path Finder

Thanks a lot. And sorry for my delayed answer. Just using your request with lookups as input, the task ending as before. But finally I haved indexed my first and third input and with your request performance is better (without unexpected ending ^^)

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...