I'm dealing with bash_history files in the following format. I would like to extract the timestamp and use that as the event timestamp, but I'm having some issues doing so.
#1579207583
whoami
#1579207584
cd /var/log
#1579207590
cat messages
#1579207595
id
#1579207598
exit
I'm using the following thread as reference: https://answers.splunk.com/answers/60015/splunking-bash-history.html
[bash_history]
BREAK_ONLY_BEFORE = #(?=\d+)
MAX_TIMESTAMP_LOOKAHEAD = 11
SHOULD_LINEMERGE = true
TIME_FORMAT = %s
TIME_PREFIX = #
We've changed a number of variables (set TIME_PREFIX = ^#, set MAX_TIMESTAMP_LOOKAHEAD to a higher value, etc.), but nothing seems to be working correctly.
The events do break in the correct place (#), and they do merge, so we get "groups" of events like:
#1579207583
whoami
However, the timestamp for the event isn't set to that value. All events are set to the date/time that history was written on, so everything for any given session is the same.
That props.conf configuration -appears- correct, and our sourcetype is named bash_history (we've also tried source::/root/.bash_history, without success). I'm not sure where we are going wrong, but any suggestions would be welcome.
I figured it out. The "default/props.conf" in Splunk_TA_nix contains several lines that affect the timestamp. I copied these to "local/props.conf" and unset them (didn't provide a value), and now it's working. Final props.conf looks like...
[bash_history]
BREAK_ONLY_BEFORE = #(?=\d+)
MAX_TIMESTAMP_LOOKAHEAD = 10
SHOULD_LINEMERGE = true
TIME_FORMAT = %s
TIME_PREFIX = ^#
EVENT_BREAKER_ENABLE =
DATETIME_CONFIG =
I also added a field extraction for the command itself:
^#\d+\s+(?P<command>.+)
TL;DR - It was working from the beginning, but other values in default were affecting the final result.
Never use the break_*
settings. Try this:
[bash_history]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+[\s#]*)
TIME_PREFIX = ^
TIME_FORMAT = %s
MAX_TIMESTAMP_LOOKAHEAD = 10
But that is probably not your problem. If you are sure that your settings are correct, it must be something else. If you are doing a sourcetype override/overwrite, you must use the ORIGINAL value, NOT the new value. You must deploy your settings to the first full instance(s) of Splunk that handle the events (usually either the HF tier if you use one, or else your Indexer tier) UNLESS you are using HEC's JSON endpoint (it gets pre-cooked) or INDEXED_EXTRACTIONS (configs go on the UF in that case), then restart all Splunk instances there. When (re)evaluating, you must send in new events (old events will stay broken), then test using _index_earliest=-5m
to be absolutely certain that you are only examining the newly indexed events.
I figured it out. The "default/props.conf" in Splunk_TA_nix contains several lines that affect the timestamp. I copied these to "local/props.conf" and unset them (didn't provide a value), and now it's working. Final props.conf looks like...
[bash_history]
BREAK_ONLY_BEFORE = #(?=\d+)
MAX_TIMESTAMP_LOOKAHEAD = 10
SHOULD_LINEMERGE = true
TIME_FORMAT = %s
TIME_PREFIX = ^#
EVENT_BREAKER_ENABLE =
DATETIME_CONFIG =
I also added a field extraction for the command itself:
^#\d+\s+(?P<command>.+)
TL;DR - It was working from the beginning, but other values in default were affecting the final result.
If it is still possible to change the host configuration, I'd suggest to set the variable HISTTIMEFORMAT to '%F %T ' which will not make any time extraction work unnecessary, but also human readable. For example for CentOS you can add to /etc/profile (or some other bash config file):
HISTTIMEFORMAT='%F %T '
the bash_history looks like this:
999 2020-01-17 11:30:27 ping 192.168.1.2
1000 2020-01-17 11:30:30 history
1001 2020-01-17 11:30:40 set|grep FORMAT
1002 2020-01-17 11:30:44 man bash
1003 2020-01-17 11:31:12 export HISTTIMEFORMAT='%F %T '
1004 2020-01-17 11:31:13 history
don't miss a space before the final quote!
That's actually exactly what's in place. However, the internal log format is always timestamped with the #epoch timestamp. The behavior is described here: https://unix.stackexchange.com/questions/214322/write-bash-history-to-a-file-with-a-timestamp
In other words, if you can the raw log, regardless of HISTTIMEFORMAT, you get #. Since Splunk is reading the raw log is what it gets.
| makeresults
| eval _raw="#1579207583
whoami
#1579207584
cd /var/log
#1579207590
cat messages
#1579207595
id
#1579207598
exit"
`comment("this is sample you provide")`
| rex max_match=100 "(?:#)(?<time>\w+)"
| rex max_match=100 "(?m)^(?=[^#])(?<command>.+)$"
| eval tmp=mvzip(time,command)
| stats count by tmp
| eval _time=mvindex(split(tmp,","),0), command=mvindex(split(tmp,","),1)
| table _time command
If props.conf
doesn't work, you can extract it with this query.
Try
# props.conf
[bash_history]
# define event breaking behavior
LINE_BREAKER = ([\r\n]+)\#\d+
SHOULD_LINEMERGE = false
# define time parsing behavior
TIME_PREFIX = #
TIME_FORMAT = %s
MAX_TIMESTAMP_LOOKAHEAD = 12
No luck, it appears to be line breaking at the correct place, as my original props.conf did. However, it's still not parsing the timestamp.
I wonder if you replaced your entire props config as posted with just the below if this would cover both the line breaking and the timestamping? Maybe test and let me know?
[bash_history]
LINE_BREAKER = (^\#)\d+
No luck, it's breaking... weird. So one event comes in as
hi this is a text
#1579273320
exit
And the previous one as:
1579273315
(the timestamp minux the #). It appears to alternate like this. Neither appears to be actually using this as the timestamp for the event though.
Where did you place your props.conf ?
It was deployed from the deployment server within the Splunk_TA_nix app to the UF's (so /opt/splunk/etc/deployment-apps/Splunk_TA_nix/local/)
Can you check the errors and warning you are receiving for date time parsing on the receiving SPLUNK instance
After looking in a few logs where I would expect and error to be (if there was one) I did a grep of -all- logs in /opt/splunk/var/log/splunk/ for "bash" and found nothing. Is there a specific log and/or keyword you know to check for?