Getting Data In
Highlighted

How to query data on an index that is monitoring a directory with CSV files?

Explorer

Hi Experts,

Can anyone please tell me how to write a search query on index which is monitoring a directory having CSV files with 10 columns. Suppose Col1, col2,col3...col10. For CSV we can write as below

| inputcsv FileName.csv | rename Col1as DOB, Col2 as NumberC, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

How to convert this query on an index.

Thanks in advance.

Regards,
Aashish Kumar

Tags (4)
0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

SplunkTrust
SplunkTrust

Hi aashish_122001,

if you indexed your csv, for example, into index=foo then simply do this:

index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

hope this helps ...

cheers, MuS

View solution in original post

Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

Explorer

Thank you MuS..
It worked !!!

Suppose I load a file with 100 records starting from 1....100 now I load another file in same directory say, 25 records whose keys are already present but the measure value is changed in new file...
Now in indexed directory there are 2 files File 1 : 100 records , File 2 : 25 records
What will be the behavior of index in this case...
If it shows 125 records, then how can I display the latest 100 records ?

Thank You.

0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

SplunkTrust
SplunkTrust

Splunk will only index new data if the CRC of the old data does not matches. In your example case it should/would be 125. To display only the latest 100 events, you can use | tail 100 in your search.

Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

SplunkTrust
SplunkTrust

in addition, it also could be that the file will produce a file too small error in that case you have to change some settings like seekptr checksum or crcSalt - best thing to do: test and try it 😉

0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

Explorer

Yes you are right..I tested it myself I got 100 record in first go & then 125 after loading 2nd file...
What I did to resolve is to Stop the server, Clean the data on index & restart the server...now it shows latest records..
Is there any better way to get the latest record without using tail command...because in my requirement, going forward there will be a large number of files in 1 directory & I have to report on top of them

0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

SplunkTrust
SplunkTrust

Well, if your events have a time stamp and this will recognized by Splunk you can use the time stamp 😉

0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

Explorer

can you please explain your answer...I will tell you the requirement tell me how we can use timestamp here...

I loaded a File1 on say 16 Sep
Id Name
1 John
2 Michael
3 Aashish

After that I loaded a File2 on 17 Sep
Id Name
3 Michael
4 Aashish
5 Stephen

Now, Index will show me 6 records with duplicates for ID : 3
Id Name
1 John
2 Michael
3 Aashish
3 Michael
4 Aashish
5 Stephen

How I can use timestamp here to resolve the issue..

0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

SplunkTrust
SplunkTrust

Looks like you don't have a timestamp in your events, therefore you are propably left with _indextime Which is a default field, it represents the time when your event where indexed.

0 Karma
Highlighted

Re: How to query data on an index that is monitoring a directory with CSV files?

Influencer

you can use ...| dedup ID if you want to restrict your search to recent values of the same ID

0 Karma