Solved: How to query data on an index that is monitoring a...

aashish_122001 · ‎09-15-2014

Hi Experts,

Can anyone please tell me how to write a search query on index which is monitoring a directory having CSV files with 10 columns. Suppose Col1, col2,col3...col10. For CSV we can write as below

| inputcsv File_Name.csv | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

How to convert this query on an index.

Thanks in advance.

Regards,
Aashish Kumar

MuS · ‎09-15-2014

Hi aashish_122001,

if you indexed your csv, for example, into index=foo then simply do this:

index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

hope this helps ...

cheers, MuS

View solution in original post

MuS · ‎09-15-2014

Hi aashish_122001,

if you indexed your csv, for example, into index=foo then simply do this:

index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

hope this helps ...

cheers, MuS

aashish_122001 · ‎09-16-2014

can you please explain your answer...I will tell you the requirement tell me how we can use timestamp here...

I loaded a File1 on say 16 Sep
Id Name
1 John
2 Michael
3 Aashish

After that I loaded a File2 on 17 Sep
Id Name
3 Michael
4 Aashish
5 Stephen

Now, Index will show me 6 records with duplicates for ID : 3
Id Name
1 John
2 Michael
3 Aashish
3 Michael
4 Aashish
5 Stephen

How I can use timestamp here to resolve the issue..

pradeepkumarg · ‎09-16-2014

you can use ...| dedup ID if you want to restrict your search to recent values of the same ID

MuS · ‎09-16-2014

Looks like you don't have a timestamp in your events, therefore you are propably left with _indextime Which is a default field, it represents the time when your event where indexed.

aashish_122001 · ‎09-16-2014

Yes you are right..I tested it myself I got 100 record in first go & then 125 after loading 2nd file...
What I did to resolve is to Stop the server, Clean the data on index & restart the server...now it shows latest records..
Is there any better way to get the latest record without using tail command...because in my requirement, going forward there will be a large number of files in 1 directory & I have to report on top of them

MuS · ‎09-16-2014

Well, if your events have a time stamp and this will recognized by Splunk you can use the time stamp 😉

aashish_122001 · ‎09-16-2014

Thank you MuS..
It worked !!!

Suppose I load a file with 100 records starting from 1....100 now I load another file in same directory say, 25 records whose keys are already present but the measure value is changed in new file...
Now in indexed directory there are 2 files File 1 : 100 records , File 2 : 25 records
What will be the behavior of index in this case...
If it shows 125 records, then how can I display the latest 100 records ?

Thank You.

MuS · ‎09-16-2014

Splunk will only index new data if the CRC of the old data does not matches. In your example case it should/would be 125. To display only the latest 100 events, you can use | tail 100 in your search.

MuS · ‎09-16-2014

in addition, it also could be that the file will produce a file too small error in that case you have to change some settings like seekptr checksum or crcSalt - best thing to do: test and try it 😉

How to query data on an index that is monitoring a directory with CSV files?

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Hunt Smarter, Not Harder: Discover New SPL “Recipes” in Our Threat Hunting Webinar

Splunk ITSI & Correlated Network Visibility

Are you a member of the Splunk Community?

How to query data on an index that is monitoring a directory with CSV files?

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Hunt Smarter, Not Harder: Discover New SPL “Recipes” in Our Threat Hunting Webinar

Splunk ITSI & Correlated Network Visibility