csv advise

a212830 · ‎05-25-2012

Hi,

Hoping someone can help me with some csv input questions. I have some csv files that are created by users, and which are generated by a cron job (which I don't control). The files are all in the same directory, and have a suffix of .txt. What is the best way to get this data into Splunk? I was thinking curl, but the system doesn't have it. Should I just setup a splunk agent to monitor the files in that directory? The files do get cleaned up, so what happens if they get removed? Will it confuse Splunk?

My second question is how to get splunk to auto-identify the columns? Sample data is below - is there a way to tell splunk that the host is in column 2?

Timestamp=24-May-12 18:41:00,host=APF-US412-RH-Cpu-0,metric=CPU_Utilization,value=2.00000000,DURATION=336
Timestamp=24-May-12 18:46:36,host=APF-US412-RH-Cpu-0,metric=CPU_Utilization,value=2.00000000,DURATION=339
Timestamp=24-May-12 18:52:15,host=APF-US412-RH-Cpu-0,metric=CPU_Utilization,value=2.00000000,DURATION=338

Ayn · ‎05-25-2012

Just have Splunk monitor the directory. It won't be confused by files being removed - it has pulled the data into its own index so it doesn't rely on that files are kept.
Splunk automatically extracts key/value pairs that are in the form key=value, however host is a special field in Splunk that specifies which host log data originated from so it's likely that the host value from the log events is being overwritten with that. You could extract the field and call it something else, let's say myhost: ... | rex "host=(?<myhost>.+?),"

fk319 · ‎05-25-2012

1) monitor the files, splunk is very good about not loading the same file twice, or for that matter, knowing where it left of on a file.

2) look at transforms.conf for:

[extract_csv]

DELIMS = ","

FIELDS = "field1", "field2", "field3"

Ayn · ‎05-25-2012

Yes, #2 is definitely related to #1. By default, Splunk breaks incoming data into new events whenever it sees a valid timestamp, so if it doesn't find one it won't break. This section in the docs covers configuring timestamp recognition well: http://docs.splunk.com/Documentation/Splunk/4.3.2/Data/Configuretimestamprecognition

a212830 · ‎05-25-2012

Thanks. I see data coming in, but a couple of things aren't right:

1) it's not picking up the timestamp in the file as the timestamp to use.
2) It's not generating unique events for each line. Bunches of them are indexed as one event. (Maybe related to # 1?)

Any ideas?

csv advise

Splunk Enterprise Security: Your Command Center for PCI DSS Compliance

Developer Spotlight with Guilhem Marchand

Cisco Catalyst Center Meets Splunk ITSI: From 'Payments Are Down' to Root Cause in ...

Join the Conversation

csv advise

Splunk Enterprise Security: Your Command Center for PCI DSS Compliance

Developer Spotlight with Guilhem Marchand

Cisco Catalyst Center Meets Splunk ITSI: From 'Payments Are Down' to Root Cause in ...