Hi Guys, I am performing a POC to import our parquet files into splunk, i have manage to write a python script to extract out the events aka raw logs to a df. I also did a python script to pump the logs via the syslog protocol to HF than to indexer. I am using the syslog method because i got many log type and i can do this by using the [udp://portnumber] to ingest multiple types of logs at once and to a different sourcetype however when i do this I am not able to retain the original datatime on the raw event but it is taking the datetime on the point i was sending the event. secondly i am using python because all these parquet files are storing in a s3 container hence it will be easier for me to loop thru the directory and extract the file. I was hoping if anyone can help me out how can i get the original timestamp of the logs? Or there are other more effective way of doing this? sample logs from splunk after index, - Nov 10 09:45:50 <190>2023-09-01T16:59:12Z server1 server2 %NGIPS-6-430002: DeviceUUID: xxx-xxx-xxx heres my code to push the event via syslog. import logging
import logging.handlers
import socket
from IPython.display import clear_output
#Create you logger. Please note that this logger is different from ArcSight logger.
#my_loggerudp = logging.getLogger('MyLoggerUDP')
#my_loggertcp = logging.getLogger('MyLoggerTCP')
#We will pass the message as INFO
#Define SyslogHandler
#handlertcp = logging.handlers.SysLogHandler(address = ('localhost',1026), socktype=socket.SOCK_STREAM)
handlerudp = logging.handlers.SysLogHandler(address = ('localhost',1025), socktype=socket.SOCK_DGRAM)
#X.X.X.X =IP Address of the Syslog Collector(Connector Appliance,Loggers etc.)
#514 = Syslog port , You need to specify the port which you have defined ,by default it is 514 for Syslog)
#Example: We will pass values from a List
event = df["event"]
count = len(event)
#for x in range(2):
for x in event:
clear_output (wait=True)
count -= 1
print(f"logs left to be transmit {count}")
print (x)
... View more