I have a log source with a terrible timestamping scheme. The first line contains the startup date/time, and each event in the log is marked with a seconds.millis offset from that time (left space-padded).
I know how I would go about this if I wanted to ingest the whole file in one go after it's closed (use python to make a copy with calculated times replacing the offset), but I'm interested in ingesting as the log's written. I have no idea where to start for that.
Does anyone have a ready solution for something like this? If not, I'd appreciate some guidance toward a useful direction to research.
0.012 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha)
0.013 Operating system: Windows 10 (build 18363)
0.013 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl"
0.013 Read data path: E:/GOG/Factorio/data
0.013 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB]
0.013 Binaries path: E:/GOG/Factorio/bin
0.032 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB]
0.032 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en]
0.036 Available displays: 1
0.036 [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz}
0.684 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00
0.684 [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes
0.684 [Version] 3.3
0.684 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB
0.684 Graphics settings preset: very-high
0.684 Dedicated video memory size 8192 MB
1.095 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds.
1.149 Desktop composition is active.
1.150 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit]
1.150 [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]
You can create a script input that will remember where it left off and print the difference.
Below is an example script
import hashlib
from datetime import datetime, timedelta
from itertools import islice
from re import search
log_file="sample.log"
try:
with open(hashlib.sha256(log_file.encode()).hexdigest()) as state:
last_position=int(state.readline())
except IOError:
last_position=0
data_pattern = r"^(\d+\.\d+) (.+)$"
date_pattern = r"^\d+\.\d+ (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})"
with open(log_file, 'r') as f:
data=search(date_pattern, f.readline())
datetime_object = datetime.strptime(data.group(1), r'%Y-%m-%d %H:%M:%S')
f.seek(last_position)
for line in f:
data = search(data_pattern, line)
if data is not None:
print('{} {}'.format(
datetime_object+timedelta(seconds=float(data.group(1))),
data.group(2)))
last_position=f.tell()
try:
with open(hashlib.sha256(log_file.encode()).hexdigest(),'w') as state:
state.write('{}'.format(last_position))
except IOError:
print("Error writing state file")
Example output
2020-02-05 17:35:53.012000 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha)
2020-02-05 17:35:53.013000 Operating system: Windows 10 (build 18363)
2020-02-05 17:35:53.013000 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl"
2020-02-05 17:35:53.013000 Read data path: E:/GOG/Factorio/data
2020-02-05 17:35:53.013000 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB]
2020-02-05 17:35:53.013000 Binaries path: E:/GOG/Factorio/bin
2020-02-05 17:35:53.032000 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB]
2020-02-05 17:35:53.032000 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en]
2020-02-05 17:35:53.036000 Available displays: 1
2020-02-05 17:35:53.036000 [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz}
2020-02-05 17:35:53.684000 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00
2020-02-05 17:35:53.684000 [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes
2020-02-05 17:35:53.684000 [Version] 3.3
2020-02-05 17:35:53.684000 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB
2020-02-05 17:35:53.684000 Graphics settings preset: very-high
2020-02-05 17:35:53.684000 Dedicated video memory size 8192 MB
2020-02-05 17:35:54.095000 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds.
2020-02-05 17:35:54.149000 Desktop composition is active.
2020-02-05 17:35:54.150000 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit]
2020-02-05 17:35:54.150000 [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]
The idea of the script is to be expandable so that with slight modifications you can modify the script to receive parameters and keep state in a convenient location
You can create a script input that will remember where it left off and print the difference.
Below is an example script
import hashlib
from datetime import datetime, timedelta
from itertools import islice
from re import search
log_file="sample.log"
try:
with open(hashlib.sha256(log_file.encode()).hexdigest()) as state:
last_position=int(state.readline())
except IOError:
last_position=0
data_pattern = r"^(\d+\.\d+) (.+)$"
date_pattern = r"^\d+\.\d+ (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})"
with open(log_file, 'r') as f:
data=search(date_pattern, f.readline())
datetime_object = datetime.strptime(data.group(1), r'%Y-%m-%d %H:%M:%S')
f.seek(last_position)
for line in f:
data = search(data_pattern, line)
if data is not None:
print('{} {}'.format(
datetime_object+timedelta(seconds=float(data.group(1))),
data.group(2)))
last_position=f.tell()
try:
with open(hashlib.sha256(log_file.encode()).hexdigest(),'w') as state:
state.write('{}'.format(last_position))
except IOError:
print("Error writing state file")
Example output
2020-02-05 17:35:53.012000 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha)
2020-02-05 17:35:53.013000 Operating system: Windows 10 (build 18363)
2020-02-05 17:35:53.013000 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl"
2020-02-05 17:35:53.013000 Read data path: E:/GOG/Factorio/data
2020-02-05 17:35:53.013000 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB]
2020-02-05 17:35:53.013000 Binaries path: E:/GOG/Factorio/bin
2020-02-05 17:35:53.032000 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB]
2020-02-05 17:35:53.032000 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en]
2020-02-05 17:35:53.036000 Available displays: 1
2020-02-05 17:35:53.036000 [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz}
2020-02-05 17:35:53.684000 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00
2020-02-05 17:35:53.684000 [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes
2020-02-05 17:35:53.684000 [Version] 3.3
2020-02-05 17:35:53.684000 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB
2020-02-05 17:35:53.684000 Graphics settings preset: very-high
2020-02-05 17:35:53.684000 Dedicated video memory size 8192 MB
2020-02-05 17:35:54.095000 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds.
2020-02-05 17:35:54.149000 Desktop composition is active.
2020-02-05 17:35:54.150000 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit]
2020-02-05 17:35:54.150000 [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]
The idea of the script is to be expandable so that with slight modifications you can modify the script to receive parameters and keep state in a convenient location
With some tweaks, I believe this will do exactly what I want. Nicely done!
Namely, I want to have checkpointing done per log, not file, so I'll use the timestamp rather than filename. I will also need to tweak it to accept filenames and clean up checkpoints for non-existent filenames.
This is a great start, and I'm happy to give you credit in this project. Thanks!
Hi @nplamondon,
just curious if you have any usable results from your little Factorio ingest approach or if I need to re-invent the wheel myself 🙂
Would be glad to hear from you.
Kind regards,
Bojan
You need date in your log to extract timestamp. If offset doesn't matter much in timestamp then you can extract timestamp from first event and Splunk applies same timestamp to all other events.
Below are configurations in props.conf:
[ test ]
SHOULD_LINEMERGE=false
CHARSET=UTF-8
TIME_FORMAT=%S.%3N %Y-%m-%d %H:%M:%S
Example query of calculating step_time by adding step_offset_ms to the _time field
source="graphics.log" host="splunk_lab" sourcetype="display_log"
| rex "(?<step_offset_ms>\d+(\.)?\d+)\s(?<step_description>.+)"
| eval epoch_time=_time
| eval step_time=epoch_time+step_offset_ms
| convert ctime(step_time)
| table _time step_offset_ms step_time step_description
| sort _time step_time