Getting Data In

Calculate event time, given a startup time and an offset per event?

nplamondon
SplunkTrust
SplunkTrust

I have a log source with a terrible timestamping scheme. The first line contains the startup date/time, and each event in the log is marked with a seconds.millis offset from that time (left space-padded).

I know how I would go about this if I wanted to ingest the whole file in one go after it's closed (use python to make a copy with calculated times replacing the offset), but I'm interested in ingesting as the log's written. I have no idea where to start for that.

Does anyone have a ready solution for something like this? If not, I'd appreciate some guidance toward a useful direction to research.

   0.012 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha)
   0.013 Operating system: Windows 10 (build 18363)
   0.013 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl"
   0.013 Read data path: E:/GOG/Factorio/data
   0.013 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB]
   0.013 Binaries path: E:/GOG/Factorio/bin
   0.032 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB]
   0.032 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en]
   0.036 Available displays: 1
   0.036  [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz}
   0.684 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00
   0.684   [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes
   0.684   [Version] 3.3
   0.684 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB
   0.684 Graphics settings preset: very-high
   0.684   Dedicated video memory size 8192 MB
   1.095 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds.
   1.149 Desktop composition is active.
   1.150 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit]
   1.150                   [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]
Tags (1)
0 Karma
1 Solution

bkresoja_2
Engager

You can create a script input that will remember where it left off and print the difference.
Below is an example script

import hashlib
from datetime import datetime, timedelta
from itertools import islice
from re import search

log_file="sample.log"

try:
    with open(hashlib.sha256(log_file.encode()).hexdigest()) as state:
        last_position=int(state.readline())
except IOError:
    last_position=0

data_pattern = r"^(\d+\.\d+) (.+)$"
date_pattern = r"^\d+\.\d+ (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})"
with open(log_file, 'r') as f:
    data=search(date_pattern, f.readline())
    datetime_object = datetime.strptime(data.group(1), r'%Y-%m-%d %H:%M:%S')
    f.seek(last_position)
    for line in f:
        data = search(data_pattern, line)
        if data is not None:
            print('{} {}'.format(
                datetime_object+timedelta(seconds=float(data.group(1))),
                data.group(2)))
    last_position=f.tell()
try:
    with open(hashlib.sha256(log_file.encode()).hexdigest(),'w') as state:
        state.write('{}'.format(last_position))
except IOError:
    print("Error writing state file")

Example output

2020-02-05 17:35:53.012000 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha)
2020-02-05 17:35:53.013000 Operating system: Windows 10 (build 18363)
2020-02-05 17:35:53.013000 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl"
2020-02-05 17:35:53.013000 Read data path: E:/GOG/Factorio/data
2020-02-05 17:35:53.013000 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB]
2020-02-05 17:35:53.013000 Binaries path: E:/GOG/Factorio/bin
2020-02-05 17:35:53.032000 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB]
2020-02-05 17:35:53.032000 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en]
2020-02-05 17:35:53.036000 Available displays: 1
2020-02-05 17:35:53.036000  [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz}
2020-02-05 17:35:53.684000 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00
2020-02-05 17:35:53.684000   [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes
2020-02-05 17:35:53.684000   [Version] 3.3
2020-02-05 17:35:53.684000 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB
2020-02-05 17:35:53.684000 Graphics settings preset: very-high
2020-02-05 17:35:53.684000   Dedicated video memory size 8192 MB
2020-02-05 17:35:54.095000 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds.
2020-02-05 17:35:54.149000 Desktop composition is active.
2020-02-05 17:35:54.150000 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit]
2020-02-05 17:35:54.150000                   [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]

The idea of the script is to be expandable so that with slight modifications you can modify the script to receive parameters and keep state in a convenient location

View solution in original post

0 Karma

bkresoja_2
Engager

You can create a script input that will remember where it left off and print the difference.
Below is an example script

import hashlib
from datetime import datetime, timedelta
from itertools import islice
from re import search

log_file="sample.log"

try:
    with open(hashlib.sha256(log_file.encode()).hexdigest()) as state:
        last_position=int(state.readline())
except IOError:
    last_position=0

data_pattern = r"^(\d+\.\d+) (.+)$"
date_pattern = r"^\d+\.\d+ (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})"
with open(log_file, 'r') as f:
    data=search(date_pattern, f.readline())
    datetime_object = datetime.strptime(data.group(1), r'%Y-%m-%d %H:%M:%S')
    f.seek(last_position)
    for line in f:
        data = search(data_pattern, line)
        if data is not None:
            print('{} {}'.format(
                datetime_object+timedelta(seconds=float(data.group(1))),
                data.group(2)))
    last_position=f.tell()
try:
    with open(hashlib.sha256(log_file.encode()).hexdigest(),'w') as state:
        state.write('{}'.format(last_position))
except IOError:
    print("Error writing state file")

Example output

2020-02-05 17:35:53.012000 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha)
2020-02-05 17:35:53.013000 Operating system: Windows 10 (build 18363)
2020-02-05 17:35:53.013000 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl"
2020-02-05 17:35:53.013000 Read data path: E:/GOG/Factorio/data
2020-02-05 17:35:53.013000 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB]
2020-02-05 17:35:53.013000 Binaries path: E:/GOG/Factorio/bin
2020-02-05 17:35:53.032000 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB]
2020-02-05 17:35:53.032000 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en]
2020-02-05 17:35:53.036000 Available displays: 1
2020-02-05 17:35:53.036000  [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz}
2020-02-05 17:35:53.684000 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00
2020-02-05 17:35:53.684000   [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes
2020-02-05 17:35:53.684000   [Version] 3.3
2020-02-05 17:35:53.684000 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB
2020-02-05 17:35:53.684000 Graphics settings preset: very-high
2020-02-05 17:35:53.684000   Dedicated video memory size 8192 MB
2020-02-05 17:35:54.095000 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds.
2020-02-05 17:35:54.149000 Desktop composition is active.
2020-02-05 17:35:54.150000 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit]
2020-02-05 17:35:54.150000                   [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]

The idea of the script is to be expandable so that with slight modifications you can modify the script to receive parameters and keep state in a convenient location

0 Karma

nplamondon
SplunkTrust
SplunkTrust

With some tweaks, I believe this will do exactly what I want. Nicely done!

Namely, I want to have checkpointing done per log, not file, so I'll use the timestamp rather than filename. I will also need to tweak it to accept filenames and clean up checkpoints for non-existent filenames.

This is a great start, and I'm happy to give you credit in this project. Thanks!

0 Karma

bojanjanisch
New Member

Hi @nplamondon,

just curious if you have any usable results from your little Factorio ingest approach or if I need to re-invent the wheel myself 🙂

Would be glad to hear from you.

Kind regards,
Bojan

0 Karma

manjunathmeti
SplunkTrust
SplunkTrust

You need date in your log to extract timestamp. If offset doesn't matter much in timestamp then you can extract timestamp from first event and Splunk applies same timestamp to all other events.alt text

Below are configurations in props.conf:

[ test ]
SHOULD_LINEMERGE=false
CHARSET=UTF-8
TIME_FORMAT=%S.%3N %Y-%m-%d %H:%M:%S
0 Karma

bandit
Motivator

Example query of calculating step_time by adding step_offset_ms to the _time field

source="graphics.log" host="splunk_lab" sourcetype="display_log" 
| rex "(?<step_offset_ms>\d+(\.)?\d+)\s(?<step_description>.+)" 
| eval epoch_time=_time 
| eval step_time=epoch_time+step_offset_ms 
| convert ctime(step_time) 
| table _time step_offset_ms step_time step_description 
| sort _time step_time

alt text

0 Karma
Get Updates on the Splunk Community!

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...

Avoid Certificate Expiry Issues in Splunk Enterprise with Certificate Assist

This blog post is part 2 of 4 of a series on Splunk Assist. Click the links below to see the other ...

Using Machine Learning for Hunting Security Threats

REGISTER NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more ...