Solved: What are the best practices for uploading CSV file...

vikfnu · ‎07-29-2018

Hi
I have a question:
I have 3-4 CSV files from different IT reports that I need to analyze and prepare a combined dashboard.
Now my question is :
1. Should I do pre-analysis on some files before uploading to splunk
2. or should I upload all files and then use splunk search commands to do the analysis.

Preanalysis means as below:
For AV update file with computers name, Virus defn installed, date of scan - Preanalysis involves categorizing systems as compliant on non-compliant based on org Virus scan policy.
For Patch Installed report with the computer name, patch name, patch required, patch installed status- Pre analysis involves categorizing each system for each patch as compliant or non-compliant based on org policy
for Software installation report with the computer name, software name, version, year- Preanalysis involves categorizing each software for each system as compliant or non-compliant based on org policy

Further, I need to merge this preanalysis and categorize systems as high compliant, medium compliant and Warning based on above 3 categorizations

Problems I am facing are as below:
1. I upload all csv directly into splunk and then do the necessary categorization on individual csv files , but when I try to merge the output fields, ComputerName, Categoristion_ from each file in splunk I am unable to generate any data
2. However, If I perform the pre-analysis of 2 files lets say (AV and PAtch) in python and merge them in Python and then upload the resulting csv along with the 3rd CSV, I am able to generate the report for final categorization.

However, I am expected to add more parameters going forward to generate more complete categorization for dashboards.

Please, anyone, help me, I can share my code as well if its required, Please help.

MuS · ‎07-29-2018

Hi vikfnu,

still not much of information in here, and also no hint to the search your using ..... so I'm asking the magic glass ball and try to answer what I reckon you might want to do ....

One thing first, upload the data as is into Splunk no need to do any preanalysis of any kind.

Upload your csv files (lets call them 1.csv to 4.csv) into an index, let's say it is called csv and create a search that will return all the event, don't care about grouping or anything else for now - just get the events:

index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv

this will just return all the events.

Now, we need to find some field that can be used to correlate the events you mentioned ComputerName and Categoristion_ I assume the later has something after the _ so we ignore this one for now.
By adding a case() statement to the SPL we can get all variations of ComputerName from the events and use it:

| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")

this list need to be extended so it matches all possible variation for any computer name field available in the events, and the last 1=1 is a catch all and will show you unhandled cases 😉

After that step you can correlate the events using a stats on ComputerName and _time:

| stats count values(*) AS * by ComputerName _time

Once this is done, you can add more PSL to munch the data in every possible way until you get your expected result. So here is everything from above as one SPL statement:

 index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv
| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")
| stats count values(*) AS * by ComputerName _time

If this is not what you are after, than I need to bring my magic glass ball into service or you need to provide more details on the events (anonymise any sensitive data!) and you SPL you used.

Hope this helps ...

cheers, MuS

View solution in original post

vikfnu · ‎07-31-2018

Hi MuS
I have been trying to write the comment and reply but I get a message that it will be moderated and then updated, but even after 24 hours its not appearing

Data lets say:
Files Fields ExpectedOutput. OUTPUT FIELDS

File1.csv” Field1, Field2 Field3 Field3 -Perform analysis on field2 wrt filed1 to FIELD1, AnalysisField1
get results as compliant or non compliant

File2.csv Field1 FA FB FC FD FE FG -Combine Fields from File1 and File2 Field1, Analysis Field1, FA FB

FC FD FE FG

Filre3.csv Field1 fa fb fc fd fe fg fh -Analyse based on fa fb fc to Field1, Analysis Field3, fd fe fg
get results as compliant or non compliant fh

                                        -Combine Fields from Fiel1.csv /File2.csv                         Field 1, FA FB FC FD FE FG fd 
                                                                                      and File 3.csv                                                                        fe fg fh 
                                        -Analyse the data to generated a combined           JointAnalysisField1UField3,  
                                                     compliance /non compliance field

File4.csv Field1 , f1 f2 f3 f4 f5 f6 -Analyse File4.csv based on field f1 f2 and get Field 1, Analysis Field4, f3 f4
compliance / non compliant result field f5 f6
-Combine the Analysed fields “JointAnalysisField1UField3”
& “ Analysis Field4” to generate the final analysis result of
COMPLIANCE/NONCOMPLIANCE

Search Query

index= mine source=“File4.csv”

| table  Field1 , f1 f2 f3 f4 f5 f6

| eval “ Analysis Field4”=case(like(f1,”Y”), "Compliant", like(f2,”Y”), "NonCompliant")

    |join  Field 1, Analysis Field4,  f3 f4 f5 f6 type=outer 
        [ 
        search (source=File1.csv or source=File2.csv) index=mine

        |fields+Field1, Field2 Field3 Field3 FA FB FC FD FE FG

        | eval AnalysisField1= case(like(field2,”Y”), "Compliant", like(field2,”N”), "NonCompliant")

        | join Field 1 type=outer 
            [
            search source=File3.csv index=mine

            |eval Analysis Field3= case (like(fa,”Y”) AND (like(fb,”Y”) OR like(fb,”N”) ), "Compliant", like(fa,”N”) AND  (like(fb,”Y”) OR like(fb,”N”) ),"NonCompliant”)
            ]

        | fileds Field1, Analysis Field1, FA FB FC FD FE FG Analysis Field3, fd fe fg fh

        | eval  JointAnalysisField1UField3=case (like(“Analysis Field1”,”Compliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ), 
        "Compliant”, like(“Analysis Field1”,”NonCompliant”) AND  (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),"NonCompliant”) 
        ]

| fields  Field1, JointAnalysisField1UField3, FA FB FC FD FE FG  fd fe fg fh Analysis Field4,  f3 f4 f5 f6

| eval finalanalysis= if ( match( JointAnalysisField1UField3, “Compliant”) AND match (“Analysis Field1”,”Compliant”), “COMPLIANT”, 
                if(match( JointAnalysisField1UField3, “NonCompliant”) AND match (“Analysis Field1”,”NonCompliant”), “NONCOMPLIANT”, “UNDEFINED))

010

Note: Eval represent analysis steps , for brevity I have reduced long analysis statements in very simple manner.

What I observed:
1. Join should happen from log file with higher number of logs to log file with less number of logs.

My Question can I improve the search query to make better dashboards.
Also this is just a POC, I am expected to increase the number of parameters in the report by including more logs from various other sources.
Can I continue to use this approach

MuS · ‎07-31-2018

Hi vikfnu,

yes, you can improve this search by not using join!
Nope, as long as you use join this will break sooner than later, because of all the obvious and hidden limits of the sub search http://docs.splunk.com/Documentation/Splunk/7.1.2/Search/Aboutsubsearches#Subsearch_performance_cons...

But back to your SPL, try this instead:

index=mine source=File4.csv OR source=File1.csv OR source=File2.csv OR source=File3.csv
| eval " Analysis Field4" = case(like(f1,"Y"), "Compliant", like(f2,"Y"), "NonCompliant")
| eval AnalysisField1 = case(like(field2,"Y"), "Compliant", like(field2,"N"), "NonCompliant")
| eval "Analysis Field3" = case (like(fa,"Y") AND (like(fb,"Y") OR like(fb,"N") ), "Compliant", like(fa,"N") AND  (like(fb,"Y") OR like(fb,"N") ),"NonCompliant")
| eval  JointAnalysisField1UField3=case (like("Analysis Field1”,”Compliant”) AND (like("Analysis Field3”,”Compliant”) OR like("Analysis Field3”,”NonCompliant”) ),          "Compliant”, like("Analysis Field1”,”NonCompliant”) AND  (like("Analysis Field3”,”Compliant”) OR like("Analysis Field3”,”NonCompliant”) ),"NonCompliant”) 
| eval finalanalysis= if ( match( JointAnalysisField1UField3, "Compliant”) AND match ("Analysis Field1”,”Compliant”), "COMPLIANT”, if(match( JointAnalysisField1UField3, "NonCompliant”) AND match ("Analysis Field1”,”NonCompliant”), "NONCOMPLIANT”, "UNDEFINED))
| stats values(*) AS * by Field1

This will do the same as your multiple join search, it gets all various events from all 4 csv files and performs 'analytics' on them, and and shows results by Field1.

To add more sources, sourcetype, and/or indexes simply expand the first base search by adding what you need.

One thing I noticed:

You use multiple variations of field names, like upper and lower case, added spaces in front, and so on.
You use ” instead of " - this is a big difference in SPL
If a field value is literal "Y" use it like this FA="Y" instead of like(FA,"Y") Be consistent with filed names, and be sure to use the right ", also please read all the answer posts, you listed above, again. You found the right advice still you use join 😉

cheers, MuS

MuS · ‎07-29-2018

Hi vikfnu,

still not much of information in here, and also no hint to the search your using ..... so I'm asking the magic glass ball and try to answer what I reckon you might want to do ....

One thing first, upload the data as is into Splunk no need to do any preanalysis of any kind.

Upload your csv files (lets call them 1.csv to 4.csv) into an index, let's say it is called csv and create a search that will return all the event, don't care about grouping or anything else for now - just get the events:

index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv

this will just return all the events.

Now, we need to find some field that can be used to correlate the events you mentioned ComputerName and Categoristion_ I assume the later has something after the _ so we ignore this one for now.
By adding a case() statement to the SPL we can get all variations of ComputerName from the events and use it:

| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")

this list need to be extended so it matches all possible variation for any computer name field available in the events, and the last 1=1 is a catch all and will show you unhandled cases 😉

After that step you can correlate the events using a stats on ComputerName and _time:

| stats count values(*) AS * by ComputerName _time

Once this is done, you can add more PSL to munch the data in every possible way until you get your expected result. So here is everything from above as one SPL statement:

 index=csv source=1.csv OR source=2.csv OR source=3.csv OR source=4.csv
| eval ComputerName = case(isnotnull(ComputerName), ComputerName, isnotnull(computername), computername, isnotnull(computerName), computerName, 1=1, "unknown")
| stats count values(*) AS * by ComputerName _time

If this is not what you are after, than I need to bring my magic glass ball into service or you need to provide more details on the events (anonymise any sensitive data!) and you SPL you used.

Hope this helps ...

cheers, MuS

vikfnu · ‎07-29-2018

Hi MuS
The situation has been sorted. But I am detailing out the case for your ref below. Please do tell me if my final approach is good or can be improved further.
I have 4 files. AVstatus.csv, SystemInfo.csv, PatchStatus.csv, SWStatus.csv
Fields as below:
AV Status.csv: computer name, virus defn, lastscandate,
Systeminfo.csv: computer name, user, div, location, system model, serialnumber
PatchStatus.csv: Computer name, patchname, Updateclassification, installed, required, dateposted
SWstatus.csv: computername, SW, version, year,

I needed to perform analysis on each file individually and calculate categorisation for AV, Patch and SW.

using the available info on splunk websites and other sources, I was able to perform the analysis on each individual file and get final fields.

But I was unable to merge the same.

After going through
below answers:
https://answers.splunk.com/answers/521078/joining-fields-from-three-different-indexes-and-so.html
https://answers.splunk.com/answers/434939/joining-multiple-fields-of-two-searches-together-o.html
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
and
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

I tried doing the merge using join, append and appendcols but nothing was working.
Finally I figured that to join two searches using ComputerName as the primary key, I should put the file with less number of rows in the subsearch. Continuing thus I have been able to get the final result.

The search is finally as below:

index= mine source="SWstatus.csv"
| rename as field
|table
| eval SWcompliance1=case(like(sw,"Y"), "Compliant", like(sw,"n"), "NonCompliant")
|join computername swcompliance type=outer [search (source=2.csv or source=3.csv) index mine
|fields+computername, div,location,user,model,virusdefn, lastscand
|

vikfnu · ‎07-30-2018

Data lets say:
Files Fields ExpectedOutput. OUTPUT FIELDS

File1.csv” Field1, Field2 Field3 Field3 -Perform analysis on field2 wrt filed1 to FIELD1, AnalysisField1
get results as compliant or non compliant

File2.csv Field1 FA FB FC FD FE FG -Combine Fields from File1 and File2 Field1, Analysis Field1, FA FB FC FD FE FG

Filre3.csv Field1 fa fb fc fd fe fg fh -Analyse based on fa fb fc to Field1, Analysis Field3, fd fe fg fh
get results as compliant or non compliant

                            -Combine Fields from Fiel1.csv /File2.csv                   Field 1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh
                            and File 3.csv
                            -Analyse the data to generated a combined 
                            compliance /non compliance field

File4.csv Field1 , f1 f2 f3 f4 f5 f6 -Analyse File4.csv based on field f1 f2 and get Field 1, Analysis Field4, f3 f4 f5 f6
compliance / non compliant result field

                            -Combine the Analysed fields “JointAnalysisField1UField3”
                            & “ Analysis Field4” to generate the final analysis result of 
                            COMPLIANCE/NONCOMPLIANCE

Search Query
101
index= mine source=“File4.csv”

| table Field1 , f1 f2 f3 f4 f5 f6

| eval “ Analysis Field4”=case(like(f1,”Y”), "Compliant", like(f2,”Y”), "NonCompliant")

|join  Field 1, Analysis Field4,  f3 f4 f5 f6 type=outer 
    [ 
    search (source=File1.csv or source=File2.csv) index=mine

    |fields+Field1, Field2 Field3 Field3 FA FB FC FD FE FG

    | eval AnalysisField1= case(like(field2,”Y”), "Compliant", like(field2,”N”), "NonCompliant")

    | join Field 1 type=outer 
        [
        search source=File3.csv index=mine

        |eval Analysis Field3= case (like(fa,”Y”) AND (like(fb,”Y”) OR like(fb,”N”) ), "Compliant", like(fa,”N”) AND  (like(fb,”Y”) OR like(fb,”N”) ),"NonCompliant”)
        ]

    | fileds Field1, Analysis Field1, FA FB FC FD FE FG Analysis Field3, fd fe fg fh

    | eval  JointAnalysisField1UField3=case (like(“Analysis Field1”,”Compliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ), 
    "Compliant”, like(“Analysis Field1”,”NonCompliant”) AND  (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),"NonCompliant”) 
    ]

| fields Field1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh Analysis Field4, f3 f4 f5 f6

| eval finalanalysis= if ( match( JointAnalysisField1UField3, “Compliant”) AND match (“Analysis Field1”,”Compliant”), “COMPLIANT”,
if(match( JointAnalysisField1UField3, “NonCompliant”) AND match (“Analysis Field1”,”NonCompliant”), “NONCOMPLIANT”, “UNDEFINED))
010

Note: Eval represent analysis steps , for brevity I have reduced long analysis statements in very simple manner.

What I observed:
1. Join should happen from log file with higher number of logs to log file with less number of logs.

My Question can I improve the search query to make better dashboards.
Also this is just a POC, I am expected to increase the number of parameters in the report by including more logs from various other sources.
Can I continue to use this approach

vikfnu · ‎07-30-2018

Hi @MuS
Please see below:

Data lets say:
Files Fields ExpectedOutput. O/p Fields

File1.csv :
Fields: Field1, Field2 Field3 Field3

Operation: Perform analysis on field2 wrt field1 to get results as compliant or non compliant
O/p Fields: FIELD1, AnalysisField1

File2.csv

Fields: Field1 FA FB FC FD FE FG

Operation: -Combine Fields from File1 and File2

O/p Fields: Field1, Analysis Field1, FA FB FC FD FE FG

File3.csv

Fields: Field1 fa fb fc fd fe fg fh

Operation 1: -Analyse based on fa fb fc to get results as compliant or non compliant

O/P Fileds: Field1, Analysis Field3, fd fe fg fh

OPeration 2: -Combine Fields from Fiel1.csv /File2.csv and File 3.csv

-Analyse the data to generated a combined compliance /non compliance field
O/P Fileds: Field 1, JointAnalysisField1UField3, FA FB FC FD FE FG fd fe fg fh

File4.csv

Fields: Field1 , f1 f2 f3 f4 f5 f6

Operation 1: -Analyse File4.csv based on field f1 f2 and get compliance / non compliant result field

O/P Fileds: Field 1, Analysis Field4, f3 f4 f5 f6

O/P Fileds: -Combine the Analysed fields “JointAnalysisField1UField3”& “ Analysis Field4” to generate the final analysis result of COMPLIANCE/NONCOMPLIANCE

Search Query
101

index= mine source=“File4.csv”

| table  Field1 , f1 f2 f3 f4 f5 f6

| eval “ Analysis Field4”=case(like(f1,”Y”), "Compliant", like(f2,”Y”), "NonCompliant")

    |join  Field 1, Analysis Field4,  f3 f4 f5 f6 type=outer 
        [ 
        search (source=File1.csv or source=File2.csv) index=mine

        |fields+Field1, Field2 Field3 Field3 FA FB FC FD FE FG

        | eval AnalysisField1= case(like(field2,”Y”), "Compliant", like(field2,”N”), "NonCompliant")

        | join Field 1 type=outer 
            [
            search source=File3.csv index=mine

            |eval Analysis Field3= case (like(fa,”Y”) AND (like(fb,”Y”) OR like(fb,”N”) ), "Compliant", like(fa,”N”) AND  (like(fb,”Y”) OR like(fb,”N”) ),"NonCompliant”)
            ]

        | fileds Field1, Analysis Field1, FA FB FC FD FE FG Analysis Field3, fd fe fg fh

        | eval  JointAnalysisField1UField3=case (like(“Analysis Field1”,”Compliant”) AND (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ), 
        "Compliant”, like(“Analysis Field1”,”NonCompliant”) AND  (like(“Analysis Field3”,”Compliant”) OR like(“Analysis Field3”,”NonCompliant”) ),"NonCompliant”) 
        ]

| fields  Field1, JointAnalysisField1UField3, FA FB FC FD FE FG  fd fe fg fh Analysis Field4,  f3 f4 f5 f6

| eval finalanalysis= if ( match( JointAnalysisField1UField3, “Compliant”) AND match (“Analysis Field1”,”Compliant”), “COMPLIANT”, 
                if(match( JointAnalysisField1UField3, “NonCompliant”) AND match (“Analysis Field1”,”NonCompliant”), “NONCOMPLIANT”, “UNDEFINED))

010

Note: Eval represent analysis steps , for brevity I have reduced long analysis statements in very simple manner.

What I observed:
1. Join should happen from log file with higher number of logs to log file with less number of logs.

My Question
1. can I improve the search query to make better dashboards.

Also this is just a POC, I am expected to increase the number of parameters in the report by including more logs from various other sources. Can I continue to use this approach of joining output from various search queries?

MuS · ‎07-29-2018

Can you please post the search again, it looks like some parts are missing. Also please use either the 101 010 icon to mark it to be code or select the search and press Ctrl + K to mark it as code.

cheers, MuS

What are the best practices for uploading CSV files in merging fields from more than two sources?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

What are the best practices for uploading CSV files in merging fields from more than two sources?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits