Splunk Search

How to extract the date and time from a single column in a CSV file as separate fields?

ashabc
Contributor

I have a CSV file with headers which have date and time stamp fields in a single column. I want to extract date and time as separate fields. e.g. on 1st row of dataset the DeclaredDate is "26/12/2008 10:30" I want to extract fields DeclareDate as 26/12/2008 in date format and 10:30 as DeclareTime in time format.

If I import data using default CSV input type, the DeclareDate is extracted as timestamp for the event. However, I want both DeclaredDate as well as RevokedDate to be extracted in time format..

Can someone give me some clue? I do not have background in regex.

Sample data below:

Id,Name,DeclaredDate,RevokedDate,RegionId,LgaId,Area,AgencyId,ReferenceNo
1,Mt Mologone S44,26/12/2008 10:30,29/12/2008 12:00,4,800,Bland District,NULL,NULL
2,Gulp Road,6/01/2009 15:00,18/01/2009 20:00,1,8350,"Wingecarribee District, nm",NULL,NULL
3,Beaumont,15/01/2009 14:00,16/01/2009 17:00,1,4000,Hornsby Local Goverment Area,NULL,NULL
0 Karma
1 Solution

woodcock
Esteemed Legend

Like this:

... | rex field=DeclaredDate "(?<DeclareDate>\S+)\s+(?<DeclareTime>.*)" | rex field=RevokedDate "(?<RevokeDate>\S+)\s+(?<RevokeTime>.*)"

View solution in original post

landen99
Motivator
| rex field=DeclaredDate "(?<DeclareDate>\S+)\s+(?<DeclareTime>.*)" | rex field=RevokedDate "(?<RevokeDate>\S+)\s+(?<RevokeTime>.*)" | eval DeclaredDate=DeclaredDate." GMT" | eval RevokedDate=RevokedDate." GMT" | eval DeclaredDateEpoch=strptime(DeclaredDate, "%m/%e/%Y %H:%M %Z") | eval RevokedDateEpoch=strptime(RevokedDate, "%m/%e/%Y %H:%M %Z") | eval duration_sec = RevokedDateEpoch - DeclaredDateEpoch | eval duration=tostring(durationSeconds,"duration")

Time zone is an integral part of creating epoch time. "GMT" is replaced with the respective time zones. The difference on times of the timezones will not care what the timezone is set to, but other operations will care.

ashabc
Contributor

Thank you both Woddcock and Landen. Both works!

0 Karma

woodcock
Esteemed Legend

Like this:

... | rex field=DeclaredDate "(?<DeclareDate>\S+)\s+(?<DeclareTime>.*)" | rex field=RevokedDate "(?<RevokeDate>\S+)\s+(?<RevokeTime>.*)"

View solution in original post

woodcock
Esteemed Legend

@landen99, I encountered a bug editing the comments in this post (when deleting a duplicate comment of yours) and it deleted the last few comments of yours on accident. I am working to have it restored.

0 Karma

landen99
Motivator

No problem, Woodcock. I was just trying to be helpful with a solution which was already on the right track. I'll just compile all of my ideas and corrections to your solution into my own solution. I did not intend to "wander" to topics that you were not interested in discussing. I do think that they are fully relevant to the question, though.

PS: I was not aware that users could delete each other's comments. I've never done that before or seen that option.

0 Karma

landen99
Motivator

you don't need "[" or "]".

\S+ works just as well by itself
0 Karma

ashabc
Contributor

Thanks Woodcock. It certainly works.
One other question in relation to this. If I now want the duration between DeclareDate and RevokeDate, how can I get that. I tried the example in the url below, but seems not working.

http://answers.splunk.com/answers/39463/calculate-time-difference-between-2-fields-sum-and-group-by-...

I tried this:

| convert timeformat="%d/%m/%yT%H:%M" mktime(RevokedDate) mktime(DeclaredDate) | eval duration=RevokedDate-DeclaredDate

0 Karma

landen99
Motivator

This is where you do my solution with eval strftime, strptime, subtraction and convert duration.

0 Karma

landen99
Motivator

Oh, and time zones may be a factor, because it is going to Unix time to be subtracted.

0 Karma

woodcock
Esteemed Legend

For duration TZ should not be a factor because both times should be the same TZ and we don't care what it is because we are subtracting them; do it like this:

... | rex field=DeclaredDate "(?<DeclareDate>\S+)\s+(?<DeclareTime>.*)" | rex field=RevokedDate "(?<RevokeDate>\S+)\s+(?<RevokeTime>.*)" | DeclaredDateEpoch=strptime(DeclaredDate, "%m/%d/%Y %H:%M") | RevokedDateEpoch=strptime(RevokedDate, "%m/%d/%Y %H:%M") | durationSeconds = RevokedDateEpoch - DeclaredDateEpoch
0 Karma

landen99
Motivator

You Rex date and time separately, so they will not work. Also, the timezones may not be the same, as in the case with bit9 source. Lastly, duration in sec can be made pretty with tostring(sec, "duration").

0 Karma

woodcock
Esteemed Legend

He asked for 2 things: to separate the TimeDates and to calculate the difference. My answer is in 2 parts that both use the original field. Your comment about tostring is fair, and I thought of it, but this Q&A has been wondering very much so I decided to stick to the main path.

0 Karma

woodcock
Esteemed Legend

True. I will update answer and delete comments.

0 Karma

landen99
Motivator

Otherwise, a very elegant answer, I must say.

I considered giving the eval strftime, and convert or fieldformat strptime approach instead of this one based on rex, but the author never mentioned the need for sorting or filtering the time fields or formatting it from Unix time to various formats or between different time zones, so I feel your rex answer meets the question perfectly.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!