Getting Data In

Why is Props.conf pulling Time as indexed first?

colinmchugo
Explorer

Hi,

I have a big ask to solve this. I am making an API call to Redmine to pull data from this and then pulling this into Splunk.

My issue is that Splunk is pulling the "pulled date/time" the time the data is pulled from this api/cron job as the _time rather than making the "Created" time as being the indexed time. So when I search for stuff, for example, all issues for March 2018 I get issues that were pulled on that date, not the date they were created.

I am looking for a way to deal with this issue which is resulting in the wrong results being displayed. I know that the props.conf is one way of potentially dealing with this issue. But this is tricky and I am wondering if there is another way around this?

The cron job produces a CSV which is pulled into Splunk and it comprises of about 30 columns which are comma separated. The field that I want to be the _time is 20 columns in (as seen in an excel spreadsheet) and an example of this would be "04/03/2018 12:33 PM"
There are no other unique identifiers & there are other fields such as updated (date time) and start date (date time) then "Created" column.

So how can I use props.conf or otherwise that I can get this data in which is scheduled to be pulled in every morning into Splunk?

Thanks a million this is a big one to solve so I really appreciate any support, thanks.

Colin

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

The easiest is to trust INDEXED_EXTRACTIONS=csv to do the right thing. You should get what you are looking for with props.conf settings as follows:

[gdotcsv]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = Created

You can make it easier on Splunk if you also specify a valid TIME_FORMAT setting based on the Created date format in the data.

Note that some of these props.conf settings will have to be available to the universal forwarder that processes this file as well as on the indexer(s), so deploy to both.

0 Karma

colinmchugo
Explorer

There is something that is unique that is the two fields before the "Created" column which is ",0,0," so the two columns before the "Created" column have a 0 and then a 0. So maybe we could use a regex to capture this? thanks

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Does your CSV file have a header row?
Did you review the documentation here that explains how to identify header rows and use specific column(s) for your timestamp value?

It is best practice to always explicitly configure your sourcetypes using props.conf settings, rather than relying on Splunk to guess what you may want to do. Doing this will document your sourcetypes for yours and other people's benefit and result in the most efficient indexing processing possible.

colinmchugo
Explorer

Thanks i read the documentation and i am still trying to figure out how to get it working for my scenario. So the example of in this documentation is pasted below. My csv caled "g.csv" has the field delimiter of a "," also. There are a number of columns that have date fields such as "Updated" ,"Start Date", "Created", "Due Date" "Closed" these all come in the format of date and time.
I am not sure what they are looking for when they say HEADER_FIELD_LINE_NUMBER as the header is on row one and if they are talking about tabs its 29 tabs until the comma before "Created" column.

Where i am lost is where do you specify that you want "Created" column to be brought in as the indexed _time so its searchable. I had been trying a strftime function but this was not viable it was not accurate when looking for data e..g all of issues for february. It then gave me back all the issues pulled in february not all the issues created in february. thanks gain really appreciate the assistance.

I cant add an attachment as i don't have enough karam points (there was thinking i had loads of karma 🙂 ) .Ill paste it below thanks.

C.

,Tracker,Status,Priority,Subject,Assignee,Updated,Environment,Category,IR Shift,Normalized Detection Source,Private,IPs,Risk,Hostname,Dept,Country,Office Location,User Action,Project,Parent task,Target version,Start date,Due date,Author,Estimated time,Total estimated time,Spent time,Total spent time,Created,Closed,% Done,Related issues,owner-email,Username,Hash,Remediation Actions

123,issue,False Positive,P3,Test1,Colin ,4/3/18 13:40,Corp ,Test1,Europe,None,No,8.8.8.8,,DSA00001,Sales,IE,Dublin,None,IR,,,4/3/18,,REST API User,,,0,0,4/3/18 12:01,4/3/18 13:42,0,,,A.Watts,,None
124,issue,In Progress,P3,Test2,Colin ,4/3/18 13:25,Corp ,Test2,Off-Shift,None,No,1.1.1.1.,,,Marketing,US,Washington,None,IR,,,4/3/18,,REST API User,,,0,0,4/3/18 9:12,,0,,,B.Wayne,,None
125,issue,Resolved,P3,Test3,Niall,4/3/18 13:32,Corp ,Test3,US,None,No,8.8.8.6,,,Customer,AU,New York,None,IR,,,4/2/18,,REST API User,,,0,0,4/3/18 7:43,4/3/18 13:31,0,,,S.Costello,,None

Example from Documentation

[CSVWithFewHeaderFieldsWithoutAnyValues]
FIELD_DELIMITER=,

[VeryLargeCSVFile]
FIELD_DELIMITER=,

[UselessLongHeaderToBeIgnored]
HEADER_FIELD_LINE_NUMBER=35
TIMESTAMP_FIELDS=Date,Time,TimeZone
FIELD_DELIMITER=\s
FIELD_QUOTE="

[HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim]
FIELD_DELIMITER=,
HEADER_FIELD_DELIMITER=\s
FIELD_QUOTE="

[ExtractCorrectHeaders]
FIELD_HEADER_REGEX=Ignore_This_Stuff:\s(.*)
FIELD_DELIMITER=,

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...