Hi all!
I've got problem. It's a bit complex to explain, but I hope I'll be as clear as possible.
First I want to generate a file based on 2 differents inputs. One of these file is big (about 400,000 lines) and the other one is about 100,000 lines. To correctly generate my new file I have to do 4 join and then an append (to stille have lines that doesn't match). So the result is a file wich is sum of these 2 inputs (about 500,000lines).
After that we want to index this file. Currently I create my file as a CSV based on these 2 inputs which are CSV too. LaterI index my result file.
What my team wants is using the 2 inputs (which for other reason will already be indexed) to generate the result file and index it directly (So we delete our result CSV file).
here is my global request:
|inputlookup firstInputFile.csv
|rex field="fieldA" mode=sed "s/^0{10}//g"
| rex field="fieldB" mode=sed "s/^0{10}//g"
| join type=left max=1 "fieldA" [
|inputlookup secondInputFile.csv
|table "fieldC"
|eval "fieldA"='fieldC'
| eval "new_column"="X"
]
| join type=left max=1 "fieldB" [
|inputlookup seconInputFile.csv
|table "fieldC"
|eval "fieldB"='fieldC'
|eval "new_column"="X"
]
| eval "new_column"=if('new_column'="X","X","Y")
| join type=left max=1 "fieldA" [
| inputlookup thirdInputFile.csv | search "fieldD"!=""
| table some fields
| eval "fieldA"='fieldD'
]
| join type=left max=1 "fieldB" [
| inputlookup thirdInputFile.csv | search "fieldE"!=""
| table some fields
| eval "fieldB"='fieldE'
]
| join type=left max=1 "fieldB" [
| inputlookup thirdInputFile.csv |search "fieldD"!=""
| table some fields
| eval "fieldB"='fieldD'
]
| join type=left max=1 "fieldA" [
| inputlookup thirdInputFile.csv |search "fieldE"!=""
| table some fields
| eval "fieldA"='fieldE'
]
| append [
|inputlookup thirdInputFile.csv
| rename "fieldD" as "fieldA" "fieldE" as "fieldB"
| table some fields
]
| fields - some fields
| rex mode=sed field="fieldF" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldG" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldH" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldI" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldJ" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field="fieldK" "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
When I run these request and add |collect index="myindex" sourcetype="mysourcetype" source="mysource"
the task ended during results finalization and splunk disconnect. If i replace the collect
by only a table *
it also crash. It succeed if I delete at least the append and one join. So I though it is possible to cut these big request to have temporary results used by the next part of the request. As I want to automate this process I look around saved searches but I don't really see how I can solve my problem.
If anyone has an idea, thanks in advance!
Try this (EXACTLY as-is; translate your field names but do not change any commands):
|inputlookup firstInputFile.csv | eval DATASET="LeftJoin"
|appendpipe [|inputlookup secondInputFile.csv
| table fieldC
| eval fieldA=fieldC
| eval new_column="X"
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"
| appendpipe [|inputlookup secondInputFile.csv
| table fieldC
| eval fieldB=fieldC
| eval new_column="X"
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"
| eval new_column=coalesce(new_column, "Y")
| appendpipe [| inputlookup thirdInputFile.csv
| search "fieldD"!=""
| eval fieldA=fieldD
| table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"
| appendpipe [| inputlookup thirdInputFile.csv
| search "fieldE"!=""
| eval fieldB=fieldE
| table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"
| appendpipe [| inputlookup thirdInputFile.csv
|search "fieldD"!=""
| eval fieldB=fieldD
| table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"
| appendpipe [| inputlookup thirdInputFile.csv
| search "fieldE"!=""
| eval fieldA=fieldE
| table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"
| appendpipe [|inputlookup thirdInputFile.csv
| rename fieldD AS fieldA
fieldE AS fieldB
| table some fields
]
| fields - some fields
| rex mode=sed field=fieldF "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldG "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldH "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldI "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldJ "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldK "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
Try this (EXACTLY as-is; translate your field names but do not change any commands):
|inputlookup firstInputFile.csv | eval DATASET="LeftJoin"
|appendpipe [|inputlookup secondInputFile.csv
| table fieldC
| eval fieldA=fieldC
| eval new_column="X"
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"
| appendpipe [|inputlookup secondInputFile.csv
| table fieldC
| eval fieldB=fieldC
| eval new_column="X"
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"
| eval new_column=coalesce(new_column, "Y")
| appendpipe [| inputlookup thirdInputFile.csv
| search "fieldD"!=""
| eval fieldA=fieldD
| table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"
| appendpipe [| inputlookup thirdInputFile.csv
| search "fieldE"!=""
| eval fieldB=fieldE
| table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"
| appendpipe [| inputlookup thirdInputFile.csv
|search "fieldD"!=""
| eval fieldB=fieldD
| table some fields
]
| stats values(*) AS * BY fieldB
| search DATASET="LeftJoin"
| appendpipe [| inputlookup thirdInputFile.csv
| search "fieldE"!=""
| eval fieldA=fieldE
| table some fields
]
| stats values(*) AS * BY fieldA
| search DATASET="LeftJoin"
| appendpipe [|inputlookup thirdInputFile.csv
| rename fieldD AS fieldA
fieldE AS fieldB
| table some fields
]
| fields - some fields
| rex mode=sed field=fieldF "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldG "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldH "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldI "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldJ "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
| rex mode=sed field=fieldK "s/([0-9]{4})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])/\3\/\2\/\1/"
Thanks a lot. And sorry for my delayed answer. Just using your request with lookups as input, the task ending as before. But finally I haved indexed my first and third input and with your request performance is better (without unexpected ending ^^)