Hi Experts,
Can anyone please tell me how to write a search query on index which is monitoring a directory having CSV files with 10 columns. Suppose Col1, col2,col3...col10. For CSV we can write as below
| inputcsv File_Name.csv | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")
How to convert this query on an index.
Thanks in advance.
Regards,
Aashish Kumar
Hi aashish_122001,
if you indexed your csv, for example, into index=foo
then simply do this:
index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")
hope this helps ...
cheers, MuS
Hi aashish_122001,
if you indexed your csv, for example, into index=foo
then simply do this:
index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")
hope this helps ...
cheers, MuS
can you please explain your answer...I will tell you the requirement tell me how we can use timestamp here...
I loaded a File1 on say 16 Sep
Id Name
1 John
2 Michael
3 Aashish
After that I loaded a File2 on 17 Sep
Id Name
3 Michael
4 Aashish
5 Stephen
Now, Index will show me 6 records with duplicates for ID : 3
Id Name
1 John
2 Michael
3 Aashish
3 Michael
4 Aashish
5 Stephen
How I can use timestamp here to resolve the issue..
you can use ...| dedup ID if you want to restrict your search to recent values of the same ID
Looks like you don't have a timestamp in your events, therefore you are propably left with _indextime
Which is a default field, it represents the time when your event where indexed.
Yes you are right..I tested it myself I got 100 record in first go & then 125 after loading 2nd file...
What I did to resolve is to Stop the server, Clean the data on index & restart the server...now it shows latest records..
Is there any better way to get the latest record without using tail command...because in my requirement, going forward there will be a large number of files in 1 directory & I have to report on top of them
Well, if your events have a time stamp and this will recognized by Splunk you can use the time stamp 😉
Thank you MuS..
It worked !!!
Suppose I load a file with 100 records starting from 1....100 now I load another file in same directory say, 25 records whose keys are already present but the measure value is changed in new file...
Now in indexed directory there are 2 files File 1 : 100 records , File 2 : 25 records
What will be the behavior of index in this case...
If it shows 125 records, then how can I display the latest 100 records ?
Thank You.
Splunk will only index new data if the CRC of the old data does not matches. In your example case it should/would be 125. To display only the latest 100 events, you can use | tail 100
in your search.
in addition, it also could be that the file will produce a file too small error in that case you have to change some settings like seekptr checksum or crcSalt - best thing to do: test and try it 😉