Can anyone please tell me how to write a search query on index which is monitoring a directory having CSV files with 10 columns. Suppose Col1, col2,col3...col10. For CSV we can write as below
| inputcsv FileName.csv | rename Col1as DOB, Col2 as NumberC, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")
How to convert this query on an index.
Thanks in advance.
if you indexed your csv, for example, into
index=foo then simply do this:
index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")
hope this helps ...
Thank you MuS..
It worked !!!
Suppose I load a file with 100 records starting from 1....100 now I load another file in same directory say, 25 records whose keys are already present but the measure value is changed in new file...
Now in indexed directory there are 2 files File 1 : 100 records , File 2 : 25 records
What will be the behavior of index in this case...
If it shows 125 records, then how can I display the latest 100 records ?
Splunk will only index new data if the CRC of the old data does not matches. In your example case it should/would be 125. To display only the latest 100 events, you can use
| tail 100 in your search.
in addition, it also could be that the file will produce a file too small error in that case you have to change some settings like seekptr checksum or crcSalt - best thing to do: test and try it 😉
Yes you are right..I tested it myself I got 100 record in first go & then 125 after loading 2nd file...
What I did to resolve is to Stop the server, Clean the data on index & restart the server...now it shows latest records..
Is there any better way to get the latest record without using tail command...because in my requirement, going forward there will be a large number of files in 1 directory & I have to report on top of them
Well, if your events have a time stamp and this will recognized by Splunk you can use the time stamp 😉
can you please explain your answer...I will tell you the requirement tell me how we can use timestamp here...
I loaded a File1 on say 16 Sep
After that I loaded a File2 on 17 Sep
Now, Index will show me 6 records with duplicates for ID : 3
How I can use timestamp here to resolve the issue..
Looks like you don't have a timestamp in your events, therefore you are propably left with
_indextime Which is a default field, it represents the time when your event where indexed.
you can use ...| dedup ID if you want to restrict your search to recent values of the same ID