Getting Data In

How to query data on an index that is monitoring a directory with CSV files?

aashish_122001
Explorer

Hi Experts,

Can anyone please tell me how to write a search query on index which is monitoring a directory having CSV files with 10 columns. Suppose Col1, col2,col3...col10. For CSV we can write as below

| inputcsv File_Name.csv | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

How to convert this query on an index.

Thanks in advance.

Regards,
Aashish Kumar

Tags (4)
0 Karma
1 Solution

MuS
Legend

Hi aashish_122001,

if you indexed your csv, for example, into index=foo then simply do this:

index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

hope this helps ...

cheers, MuS

View solution in original post

MuS
Legend

Hi aashish_122001,

if you indexed your csv, for example, into index=foo then simply do this:

index=foo | rename Col1as DOB, Col2 as Number_C, Col3 as "Day of Birth ", Col4 as "Month of Birth ", Col5 as "Day of Birth ", Col6 as "Birth year", Col7 as "First name",Col8 as Location,Col9 as "Country Code",Col10 as Year| eval Catg = case(Year < 1920,"Under 1920",Year >= 1920 AND Year <= 1980,"Between 1920-1980", Year > 1980, "Above 1980")

hope this helps ...

cheers, MuS

aashish_122001
Explorer

can you please explain your answer...I will tell you the requirement tell me how we can use timestamp here...

I loaded a File1 on say 16 Sep
Id Name
1 John
2 Michael
3 Aashish

After that I loaded a File2 on 17 Sep
Id Name
3 Michael
4 Aashish
5 Stephen

Now, Index will show me 6 records with duplicates for ID : 3
Id Name
1 John
2 Michael
3 Aashish
3 Michael
4 Aashish
5 Stephen

How I can use timestamp here to resolve the issue..

0 Karma

pradeepkumarg
Influencer

you can use ...| dedup ID if you want to restrict your search to recent values of the same ID

0 Karma

MuS
Legend

Looks like you don't have a timestamp in your events, therefore you are propably left with _indextime Which is a default field, it represents the time when your event where indexed.

0 Karma

aashish_122001
Explorer

Yes you are right..I tested it myself I got 100 record in first go & then 125 after loading 2nd file...
What I did to resolve is to Stop the server, Clean the data on index & restart the server...now it shows latest records..
Is there any better way to get the latest record without using tail command...because in my requirement, going forward there will be a large number of files in 1 directory & I have to report on top of them

0 Karma

MuS
Legend

Well, if your events have a time stamp and this will recognized by Splunk you can use the time stamp 😉

0 Karma

aashish_122001
Explorer

Thank you MuS..
It worked !!!

Suppose I load a file with 100 records starting from 1....100 now I load another file in same directory say, 25 records whose keys are already present but the measure value is changed in new file...
Now in indexed directory there are 2 files File 1 : 100 records , File 2 : 25 records
What will be the behavior of index in this case...
If it shows 125 records, then how can I display the latest 100 records ?

Thank You.

0 Karma

MuS
Legend

Splunk will only index new data if the CRC of the old data does not matches. In your example case it should/would be 125. To display only the latest 100 events, you can use | tail 100 in your search.

MuS
Legend

in addition, it also could be that the file will produce a file too small error in that case you have to change some settings like seekptr checksum or crcSalt - best thing to do: test and try it 😉

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...