Hi
I have a question:
I have 3-4 CSV files from different IT reports that I need to analyze and prepare a combined dashboard.
Now my question is :
1. Should I do pre-analysis on some files before uploading to splunk
2. or should I upload all files and then use splunk search commands to do the analysis.
Preanalysis means as below:
For AV update file with computers name, Virus defn installed, date of scan - Preanalysis involves categorizing systems as compliant on non-compliant based on org Virus scan policy.
For Patch Installed report with the computer name, patch name, patch required, patch installed status- Pre analysis involves categorizing each system for each patch as compliant or non-compliant based on org policy
for Software installation report with the computer name, software name, version, year- Preanalysis involves categorizing each software for each system as compliant or non-compliant based on org policy
Further, I need to merge this preanalysis and categorize systems as high compliant, medium compliant and Warning based on above 3 categorizations
Problems I am facing are as below:
1. I upload all csv directly into splunk and then do the necessary categorization on individual csv files , but when I try to merge the output fields, ComputerName, Categoristion_ from each file in splunk I am unable to generate any data
2. However, If I perform the pre-analysis of 2 files lets say (AV and PAtch) in python and merge them in Python and then upload the resulting csv along with the 3rd CSV, I am able to generate the report for final categorization.
However, I am expected to add more parameters going forward to generate more complete categorization for dashboards.
Please, anyone, help me, I can share my code as well if its required, Please help.
Hi vikfnu,
still not much of information in here, and also no hint to the search your using ..... so I'm asking the magic glass ball and try to answer what I reckon you might want to do ....
One thing first, upload the data as is into Splunk no need to do any preanalysis of any kind.
Upload your csv files (lets call them 1.csv
to 4.csv
) into an index, let's say it is called csv
and create a search that will return all the event, don't care about grouping or anything else for now - just get the events:
index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv
this will just return all the events.
Now, we need to find some field that can be used to correlate the events you mentioned ComputerName
and Categoristion_
I assume the later has something after the _
so we ignore this one for now.
By adding a case()
statement to the SPL we can get all variations of ComputerName
from the events and use it:
| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")
this list need to be extended so it matches all possible variation for any computer name field available in the events, and the last 1=1
is a catch all and will show you unhandled cases 😉
After that step you can correlate the events using a stats
on ComputerName
and _time
:
| stats count values(*) AS * by ComputerName _time
Once this is done, you can add more PSL to munch the data in every possible way until you get your expected result. So here is everything from above as one SPL statement:
index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv
| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")
| stats count values(*) AS * by ComputerName _time
If this is not what you are after, than I need to bring my magic glass ball into service or you need to provide more details on the events (anonymise any sensitive data!) and you SPL you used.
Hope this helps ...
cheers, MuS
Hi MuS
I have been trying to write the comment and reply but I get a message that it will be moderated and then updated, but even after 24 hours its not appearing
Data lets say:
Files Fields ExpectedOutput. OUTPUT FIELDS
File1.csv” Field1, Field2 Field3 Field3 -Perform analysis on field2 wrt filed1 to FIELD1, AnalysisField1
get results as compliant or non compliant
File2.csv Field1 FA FB FC FD FE FG -Combine Fields from File1 and File2 Field1, Analysis Field1, FA FB
FC FD FE FG
Filre3.csv Field1 fa fb fc fd fe fg fh -Analyse based on fa fb fc to Field1, Analysis Field3, fd fe fg
get results as compliant or non compliant fh
-Combine Fields from Fiel1.csv /File2.csv Field 1, FA FB FC FD FE FG fd
and File 3.csv fe fg fh
-Analyse the data to generated a combined JointAnalysisField1UField3,
compliance /non compliance field
File4.csv Field1 , f1 f2 f3 f4 f5 f6 -Analyse File4.csv based on field f1 f2 and get Field 1, Analysis Field4, f3 f4
compliance / non compliant result field f5 f6
-Combine the Analysed fields “JointAnalysisField1UField3”
& “ Analysis Field4” to generate the final analysis result of
COMPLIANCE/NONCOMPLIANCE
Search Query
index= mine source=“File4.csv”
| table Field1 , f1 f2 f3 f4 f5 f6
| eval “ Analysis Field4”=case(like(f1,”Y”), "Compliant", like(f2,”Y”), "NonCompliant")
|join Field 1, Analysis Field4, f3 f4 f5 f6 type=outer
[
search (source=File1.csv or source=File2.csv) index=mine
|fields+Field1, Field2 Field3 Field3 FA FB FC FD FE FG
| eval AnalysisField1= case(like(field2,”Y”), "Compliant", like(field2,”N”), "NonCompliant")
| join Field 1 type=outer
[
search source=File3.csv index=mine
|eval Analysis Field3= case (like(fa,”Y”) AND (like(fb,”Y”) OR like(fb,”N”) ), "Compliant", like(fa,”N”) AND (like(fb,”Y”) OR like(fb,”N”) ),"NonCompliant”)
]
| fileds Field1, Analysis Field1, FA FB FC FD FE FG Analysis Field3, fd fe fg fh
| eval JointAnalysisField1UField3=case (like(“Analysis Field1”,”Compliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),
"Compliant”, like(“Analysis Field1”,”NonCompliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),"NonCompliant”)
]
| fields Field1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh Analysis Field4, f3 f4 f5 f6
| eval finalanalysis= if ( match( JointAnalysisField1UField3, “Compliant”) AND match (“Analysis Field1”,”Compliant”), “COMPLIANT”,
if(match( JointAnalysisField1UField3, “NonCompliant”) AND match (“Analysis Field1”,”NonCompliant”), “NONCOMPLIANT”, “UNDEFINED))
010
Note: Eval represent analysis steps , for brevity I have reduced long analysis statements in very simple manner.
What I observed:
1. Join should happen from log file with higher number of logs to log file with less number of logs.
My Question can I improve the search query to make better dashboards.
Also this is just a POC, I am expected to increase the number of parameters in the report by including more logs from various other sources.
Can I continue to use this approach
Hi vikfnu,
join
!join
this will break sooner than later, because of all the obvious and hidden limits of the sub search http://docs.splunk.com/Documentation/Splunk/7.1.2/Search/Aboutsubsearches#Subsearch_performance_cons...But back to your SPL, try this instead:
index=mine source=File4.csv OR source=File1.csv OR source=File2.csv OR source=File3.csv
| eval " Analysis Field4" = case(like(f1,"Y"), "Compliant", like(f2,"Y"), "NonCompliant")
| eval AnalysisField1 = case(like(field2,"Y"), "Compliant", like(field2,"N"), "NonCompliant")
| eval "Analysis Field3" = case (like(fa,"Y") AND (like(fb,"Y") OR like(fb,"N") ), "Compliant", like(fa,"N") AND (like(fb,"Y") OR like(fb,"N") ),"NonCompliant")
| eval JointAnalysisField1UField3=case (like("Analysis Field1”,”Compliant”) AND (like("Analysis Field3”,”Compliant”) OR like("Analysis Field3”,”NonCompliant”) ), "Compliant”, like("Analysis Field1”,”NonCompliant”) AND (like("Analysis Field3”,”Compliant”) OR like("Analysis Field3”,”NonCompliant”) ),"NonCompliant”)
| eval finalanalysis= if ( match( JointAnalysisField1UField3, "Compliant”) AND match ("Analysis Field1”,”Compliant”), "COMPLIANT”, if(match( JointAnalysisField1UField3, "NonCompliant”) AND match ("Analysis Field1”,”NonCompliant”), "NONCOMPLIANT”, "UNDEFINED))
| stats values(*) AS * by Field1
This will do the same as your multiple join
search, it gets all various events from all 4 csv files and performs 'analytics' on them, and and shows results by Field1
.
To add more sources, sourcetype, and/or indexes simply expand the first base search by adding what you need.
One thing I noticed:
”
instead of "
- this is a big difference in SPL"Y"
use it like this FA="Y"
instead of like(FA,"Y")
Be consistent with filed names, and be sure to use the right "
, also please read all the answer posts, you listed above, again. You found the right advice still you use join
😉cheers, MuS
Hi vikfnu,
still not much of information in here, and also no hint to the search your using ..... so I'm asking the magic glass ball and try to answer what I reckon you might want to do ....
One thing first, upload the data as is into Splunk no need to do any preanalysis of any kind.
Upload your csv files (lets call them 1.csv
to 4.csv
) into an index, let's say it is called csv
and create a search that will return all the event, don't care about grouping or anything else for now - just get the events:
index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv
this will just return all the events.
Now, we need to find some field that can be used to correlate the events you mentioned ComputerName
and Categoristion_
I assume the later has something after the _
so we ignore this one for now.
By adding a case()
statement to the SPL we can get all variations of ComputerName
from the events and use it:
| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")
this list need to be extended so it matches all possible variation for any computer name field available in the events, and the last 1=1
is a catch all and will show you unhandled cases 😉
After that step you can correlate the events using a stats
on ComputerName
and _time
:
| stats count values(*) AS * by ComputerName _time
Once this is done, you can add more PSL to munch the data in every possible way until you get your expected result. So here is everything from above as one SPL statement:
index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv
| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")
| stats count values(*) AS * by ComputerName _time
If this is not what you are after, than I need to bring my magic glass ball into service or you need to provide more details on the events (anonymise any sensitive data!) and you SPL you used.
Hope this helps ...
cheers, MuS
Hi MuS
The situation has been sorted. But I am detailing out the case for your ref below. Please do tell me if my final approach is good or can be improved further.
I have 4 files. AVstatus.csv, SystemInfo.csv, PatchStatus.csv, SWStatus.csv
Fields as below:
AV Status.csv: computer name, virus defn, lastscandate,
Systeminfo.csv: computer name, user, div, location, system model, serialnumber
PatchStatus.csv: Computer name, patchname, Updateclassification, installed, required, dateposted
SWstatus.csv: computername, SW, version, year,
I needed to perform analysis on each file individually and calculate categorisation for AV, Patch and SW.
using the available info on splunk websites and other sources, I was able to perform the analysis on each individual file and get final fields.
But I was unable to merge the same.
After going through
below answers:
https://answers.splunk.com/answers/521078/joining-fields-from-three-different-indexes-and-so.html
https://answers.splunk.com/answers/434939/joining-multiple-fields-of-two-searches-together-o.html
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
and
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
I tried doing the merge using join, append and appendcols but nothing was working.
Finally I figured that to join two searches using ComputerName as the primary key, I should put the file with less number of rows in the subsearch. Continuing thus I have been able to get the final result.
The search is finally as below:
index= mine source="SWstatus.csv"
| rename as field
|table
| eval SWcompliance1=case(like(sw,"Y"), "Compliant", like(sw,"n"), "NonCompliant")
|join computername swcompliance type=outer [search (source=2.csv or source=3.csv) index mine
|fields+computername, div,location,user,model,virusdefn, lastscand
|
Data lets say:
Files Fields ExpectedOutput. OUTPUT FIELDS
File1.csv” Field1, Field2 Field3 Field3 -Perform analysis on field2 wrt filed1 to FIELD1, AnalysisField1
get results as compliant or non compliant
File2.csv Field1 FA FB FC FD FE FG -Combine Fields from File1 and File2 Field1, Analysis Field1, FA FB FC FD FE FG
Filre3.csv Field1 fa fb fc fd fe fg fh -Analyse based on fa fb fc to Field1, Analysis Field3, fd fe fg fh
get results as compliant or non compliant
-Combine Fields from Fiel1.csv /File2.csv Field 1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh
and File 3.csv
-Analyse the data to generated a combined
compliance /non compliance field
File4.csv Field1 , f1 f2 f3 f4 f5 f6 -Analyse File4.csv based on field f1 f2 and get Field 1, Analysis Field4, f3 f4 f5 f6
compliance / non compliant result field
-Combine the Analysed fields “JointAnalysisField1UField3”
& “ Analysis Field4” to generate the final analysis result of
COMPLIANCE/NONCOMPLIANCE
Search Query
101
index= mine source=“File4.csv”
| table Field1 , f1 f2 f3 f4 f5 f6
| eval “ Analysis Field4”=case(like(f1,”Y”), "Compliant", like(f2,”Y”), "NonCompliant")
|join Field 1, Analysis Field4, f3 f4 f5 f6 type=outer
[
search (source=File1.csv or source=File2.csv) index=mine
|fields+Field1, Field2 Field3 Field3 FA FB FC FD FE FG
| eval AnalysisField1= case(like(field2,”Y”), "Compliant", like(field2,”N”), "NonCompliant")
| join Field 1 type=outer
[
search source=File3.csv index=mine
|eval Analysis Field3= case (like(fa,”Y”) AND (like(fb,”Y”) OR like(fb,”N”) ), "Compliant", like(fa,”N”) AND (like(fb,”Y”) OR like(fb,”N”) ),"NonCompliant”)
]
| fileds Field1, Analysis Field1, FA FB FC FD FE FG Analysis Field3, fd fe fg fh
| eval JointAnalysisField1UField3=case (like(“Analysis Field1”,”Compliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),
"Compliant”, like(“Analysis Field1”,”NonCompliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),"NonCompliant”)
]
| fields Field1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh Analysis Field4, f3 f4 f5 f6
| eval finalanalysis= if ( match( JointAnalysisField1UField3, “Compliant”) AND match (“Analysis Field1”,”Compliant”), “COMPLIANT”,
if(match( JointAnalysisField1UField3, “NonCompliant”) AND match (“Analysis Field1”,”NonCompliant”), “NONCOMPLIANT”, “UNDEFINED))
010
Note: Eval represent analysis steps , for brevity I have reduced long analysis statements in very simple manner.
What I observed:
1. Join should happen from log file with higher number of logs to log file with less number of logs.
My Question can I improve the search query to make better dashboards.
Also this is just a POC, I am expected to increase the number of parameters in the report by including more logs from various other sources.
Can I continue to use this approach
Hi @MuS
Please see below:
Data lets say:
Files Fields ExpectedOutput. O/p Fields
File1.csv :
Fields: Field1, Field2 Field3 Field3
Operation: Perform analysis on field2 wrt field1 to get results as compliant or non compliant
O/p Fields: FIELD1, AnalysisField1
File2.csv
Fields: Field1 FA FB FC FD FE FG
Operation: -Combine Fields from File1 and File2
O/p Fields: Field1, Analysis Field1, FA FB FC FD FE FG
File3.csv
Fields: Field1 fa fb fc fd fe fg fh
Operation 1: -Analyse based on fa fb fc to get results as compliant or non compliant
O/P Fileds: Field1, Analysis Field3, fd fe fg fh
OPeration 2: -Combine Fields from Fiel1.csv /File2.csv and File 3.csv
-Analyse the data to generated a combined compliance /non compliance field
O/P Fileds: Field 1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh
File4.csv
Fields: Field1 , f1 f2 f3 f4 f5 f6
Operation 1: -Analyse File4.csv based on field f1 f2 and get compliance / non compliant result field
O/P Fileds: Field 1, Analysis Field4, f3 f4 f5 f6
O/P Fileds: -Combine the Analysed fields “JointAnalysisField1UField3”& “ Analysis Field4” to generate the final analysis result of COMPLIANCE/NONCOMPLIANCE
Search Query
101
index= mine source=“File4.csv”
| table Field1 , f1 f2 f3 f4 f5 f6
| eval “ Analysis Field4”=case(like(f1,”Y”), "Compliant", like(f2,”Y”), "NonCompliant")
|join Field 1, Analysis Field4, f3 f4 f5 f6 type=outer
[
search (source=File1.csv or source=File2.csv) index=mine
|fields+Field1, Field2 Field3 Field3 FA FB FC FD FE FG
| eval AnalysisField1= case(like(field2,”Y”), "Compliant", like(field2,”N”), "NonCompliant")
| join Field 1 type=outer
[
search source=File3.csv index=mine
|eval Analysis Field3= case (like(fa,”Y”) AND (like(fb,”Y”) OR like(fb,”N”) ), "Compliant", like(fa,”N”) AND (like(fb,”Y”) OR like(fb,”N”) ),"NonCompliant”)
]
| fileds Field1, Analysis Field1, FA FB FC FD FE FG Analysis Field3, fd fe fg fh
| eval JointAnalysisField1UField3=case (like(“Analysis Field1”,”Compliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),
"Compliant”, like(“Analysis Field1”,”NonCompliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),"NonCompliant”)
]
| fields Field1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh Analysis Field4, f3 f4 f5 f6
| eval finalanalysis= if ( match( JointAnalysisField1UField3, “Compliant”) AND match (“Analysis Field1”,”Compliant”), “COMPLIANT”,
if(match( JointAnalysisField1UField3, “NonCompliant”) AND match (“Analysis Field1”,”NonCompliant”), “NONCOMPLIANT”, “UNDEFINED))
010
Note: Eval represent analysis steps , for brevity I have reduced long analysis statements in very simple manner.
What I observed:
1. Join should happen from log file with higher number of logs to log file with less number of logs.
My Question
1. can I improve the search query to make better dashboards.
Can you please post the search again, it looks like some parts are missing. Also please use either the 101 010
icon to mark it to be code or select the search and press Ctrl + K
to mark it as code.
cheers, MuS