Getting Data In

Same sourcetype different data sets

willadams
Contributor

I have a source type for multiple CSV files where it is configured as with a no_timestamp.  For now I have used this sourcetype as it does the parsing I need and I have been able to use source to look for the data I specifically want.  The problem of course is that as more CSV files get pulled in, the column names change and consequently I would need to write a seperate sourcetype for each source because if the column is different, the end result is that my fields and the associated values are no longer aligned.  For example I have 2 CSV's that have the same sourcetype (e.g. csv:notimestamp) but each source is different so say (csv:users and csv:workstations).  The files are configured to be monitored and are pulled in daily.  As I am using the source to differentiate my data.

So in csv:workstations the field "IPv4Address" is correct, but when using the source for csv:users, this field is now showing the user's phone number for example.  This is obvious because the columns themselves will be in a specific order so it explains why the field is extracted where it is.

What I would like to know, short of effectively copying the csv:notimestamp sourcetype and creating a new sourcetype called "csv:workstations" and then another sourcetype called "csv:users", is there a better way?  My problem is scaling over time, meaning that each new CSV that uses the same stanza configuration would need to have its own sourcetype meaning an update to the indexers on each new CSV file.  

Is there a way to omit the above behaviour by referencing the source within SPL or am I forced to create a sourcetype per CSV type. 

Labels (5)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

CSVs are supposed to get the field names from the header row.  If that's not happening then perhaps the files are not true CSVs or there is something wrong with the settings.  Please share the props.conf stanza.

---
If this reply helps you, Karma would be appreciated.
0 Karma

willadams
Contributor

Here is the configuration

SourceType

 

[csv:notimestamp]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
MAX_TIMESTAMP_LOOKAHEAD = 1
TIME_FORMAT =
SHOULD_LINEMERGE = False
TRUNCATE = 10000
category = Structured
description = Comma-separated value format with no timestamps.
pulldown_type = true

 

 

Inputs for the file monitor

 

[monitor://D:\computerlist*.csv]
disabled = false
index = csvdata
source = csv:workstations
sourcetype = csv:notimestamp
crcSalt = <SOURCE>
initCrcLength = 8192

[monitor://D:\userlist*.csv]
disabled = false
index = csvdata
source = csv:users
sourcetype = csv:notimestamp
crcSalt = <SOURCE>
initCrcLength = 8192

 

Note: there are other sources that we also pull in from segregated servers that use the exact same sourcetype to parse the data.

 

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...