Importing logs from Parquet into Splunk

w344423 — Fri, 10 Nov 2023 13:06:14 GMT

Hi Guys, I am performing a POC to import our parquet files into splunk, i have manage to write a python script to extract out the events aka raw logs to a df.

I also did a python script to pump the logs via the syslog protocol to HF than to indexer. I am using the syslog method because i got many log type and i can do this by using the [udp://portnumber] to ingest multiple types of logs at once and to a different sourcetype

however when i do this I am not able to retain the original datatime on the raw event but it is taking the datetime on the point i was sending the event. secondly i am using python because all these parquet files are storing in a s3 container hence it will be easier for me to loop thru the directory and extract the file.

I was hoping if anyone can help me out how can i get the original timestamp of the logs? Or there are other more effective way of doing this?

sample logs from splunk after index,

- Nov 10 09:45:50 127.0.0.1 <190>2023-09-01T16:59:12Z server1 server2 %NGIPS-6-430002: DeviceUUID: xxx-xxx-xxx

heres my code to push the event via syslog.

import logging import logging.handlers import socket from IPython.display import clear_output #Create you logger. Please note that this logger is different from ArcSight logger. #my_loggerudp = logging.getLogger('MyLoggerUDP') #my_loggertcp = logging.getLogger('MyLoggerTCP') #We will pass the message as INFO my_loggerudp.setLevel(logging.INFO) #Define SyslogHandler #TCP #handlertcp = logging.handlers.SysLogHandler(address = ('localhost',1026), socktype=socket.SOCK_STREAM) #UDP handlerudp = logging.handlers.SysLogHandler(address = ('localhost',1025), socktype=socket.SOCK_DGRAM) #X.X.X.X =IP Address of the Syslog Collector(Connector Appliance,Loggers etc.) #514 = Syslog port , You need to specify the port which you have defined ,by default it is 514 for Syslog) my_loggerudp.addHandler(handlerudp) #my_loggertcp.addHandler(handlertcp) #Example: We will pass values from a List event = df["event"] count = len(event) #for x in range(2): for x in event: clear_output (wait=True) my_loggerudp.info(x) my_loggerudp.handlers[0].flush() count -= 1 print(f"logs left to be transmit {count}") print (x)

Re: Importing logs from Parquet into Splunk

richgalloway — Fri, 10 Nov 2023 18:07:12 GMT

IMO, syslog should the onboarding choice of last resort. There are too many syslog "standards" and issues always arise (like yours).

Since you're building your own ingestion program, consider sending the data to Splunk using HTTP Event Collector (HEC). See "To Add Data Directly to an Index" at https://dev.splunk.com/enterprise/docs/devtools/python/sdk-python/howtousesplunkpython/howtogetdatapython

topic Re: Importing logs from Parquet into Splunk in Getting Data In

Importing logs from Parquet into Splunk

Re: Importing logs from Parquet into Splunk