Solved: Re: waterfall saved search

MaryvonneMB · ‎10-07-2016

Hi all!

I've got problem. It's a bit complex to explain, but I hope I'll be as clear as possible.
First I want to generate a file based on 2 differents inputs. One of these file is big (about 400,000 lines) and the other one is about 100,000 lines. To correctly generate my new file I have to do 4 join and then an append (to stille have lines that doesn't match). So the result is a file wich is sum of these 2 inputs (about 500,000lines).

After that we want to index this file. Currently I create my file as a CSV based on these 2 inputs which are CSV too. LaterI index my result file.
What my team wants is using the 2 inputs (which for other reason will already be indexed) to generate the result file and index it directly (So we delete our result CSV file).

here is my global request:

|inputlookup firstInputFile.csv
|rex field="fieldA" mode=sed "s/^0{10}//g" 
| rex field="fieldB" mode=sed "s/^0{10}//g" 

| join type=left max=1 "fieldA" [ 
|inputlookup secondInputFile.csv 
|table "fieldC"
|eval "fieldA"='fieldC' 
| eval "new_column"="X"
] 

| join type=left max=1 "fieldB" [
|inputlookup seconInputFile.csv 
|table "fieldC"
|eval "fieldB"='fieldC' 
|eval "new_column"="X"
]

| eval "new_column"=if('new_column'="X","X","Y")

| join type=left max=1 "fieldA" [
| inputlookup thirdInputFile.csv  | search "fieldD"!=""
| table some fields
| eval "fieldA"='fieldD'
]

| join type=left max=1 "fieldB" [
| inputlookup  thirdInputFile.csv  | search "fieldE"!=""
| table some fields
| eval "fieldB"='fieldE' 
] 

| join type=left max=1 "fieldB" [
| inputlookup  thirdInputFile.csv  |search "fieldD"!=""
| table some fields
| eval "fieldB"='fieldD'
]

| join type=left max=1 "fieldA" [
| inputlookup  thirdInputFile.csv  |search "fieldE"!=""
| table some fields
| eval "fieldA"='fieldE'
]

| append [
|inputlookup thirdInputFile.csv 
| rename "fieldD" as "fieldA" "fieldE" as "fieldB" 
| table some fields
]

| fields - some fields
| rex mode=sed field="fieldF" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldG" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldH" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldI" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldJ" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field="fieldK" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"

When I run these request and add |collect index="myindex" sourcetype="mysourcetype" source="mysource" the task ended during results finalization and splunk disconnect. If i replace the collect by only a table * it also crash. It succeed if I delete at least the append and one join. So I though it is possible to cut these big request to have temporary results used by the next part of the request. As I want to automate this process I look around saved searches but I don't really see how I can solve my problem.

If anyone has an idea, thanks in advance!

woodcock · ‎10-14-2016

Try this (EXACTLY as-is; translate your field names but do not change any commands):

|inputlookup firstInputFile.csv | eval DATASET="LeftJoin"
|appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldA=fieldC 
   | eval new_column="X"
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldB=fieldC
   | eval new_column="X"
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| eval new_column=coalesce(new_column, "Y")

| appendpipe [| inputlookup thirdInputFile.csv
  | search "fieldD"!=""
  | eval fieldA=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldB=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  |search "fieldD"!=""
  | eval fieldB=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldA=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup thirdInputFile.csv 
  | rename fieldD AS fieldA
                    fieldE AS fieldB
  | table some fields
]

| fields - some fields
| rex mode=sed field=fieldF "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldG "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldH "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldI "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldJ "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldK "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"

View solution in original post

woodcock · ‎10-14-2016

Try this (EXACTLY as-is; translate your field names but do not change any commands):

|inputlookup firstInputFile.csv | eval DATASET="LeftJoin"
|appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldA=fieldC 
   | eval new_column="X"
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup secondInputFile.csv 
   | table fieldC
   | eval fieldB=fieldC
   | eval new_column="X"
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| eval new_column=coalesce(new_column, "Y")

| appendpipe [| inputlookup thirdInputFile.csv
  | search "fieldD"!=""
  | eval fieldA=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldB=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  |search "fieldD"!=""
  | eval fieldB=fieldD
  | table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"

| appendpipe [| inputlookup  thirdInputFile.csv
  | search "fieldE"!=""
  | eval fieldA=fieldE
  | table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"

| appendpipe [|inputlookup thirdInputFile.csv 
  | rename fieldD AS fieldA
                    fieldE AS fieldB
  | table some fields
]

| fields - some fields
| rex mode=sed field=fieldF "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldG "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldH "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldI "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldJ "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/" 
| rex mode=sed field=fieldK "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"

MaryvonneMB · ‎11-03-2016

Thanks a lot. And sorry for my delayed answer. Just using your request with lookups as input, the task ending as before. But finally I haved indexed my first and third input and with your request performance is better (without unexpected ending ^^)

waterfall saved search

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!