<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Importing logs from Parquet into Splunk in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Importing-logs-from-Parquet-into-Splunk/m-p/668270#M112014</link>
    <description>&lt;P&gt;IMO, syslog should the onboarding choice of last resort.&amp;nbsp; There are too many syslog "standards" and issues always arise (like yours).&lt;/P&gt;&lt;P&gt;Since you're building your own ingestion program, consider sending the data to Splunk using HTTP Event Collector (HEC).&amp;nbsp; See "To Add Data Directly to an Index" at &lt;A href="https://dev.splunk.com/enterprise/docs/devtools/python/sdk-python/howtousesplunkpython/howtogetdatapython" target="_blank"&gt;https://dev.splunk.com/enterprise/docs/devtools/python/sdk-python/howtousesplunkpython/howtogetdatapython&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 10 Nov 2023 18:07:12 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2023-11-10T18:07:12Z</dc:date>
    <item>
      <title>Importing logs from Parquet into Splunk</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Importing-logs-from-Parquet-into-Splunk/m-p/668194#M112003</link>
      <description>&lt;P&gt;Hi Guys, I am performing a POC to import our parquet files into splunk, i have manage to write a python script to extract out the events aka raw logs to a df.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also did a python script to pump the logs via the syslog protocol to HF than to indexer. I am using the syslog method because i got many log type and i can do this by using the [udp://portnumber] to ingest multiple types of logs at once and to a different sourcetype&lt;/P&gt;&lt;P&gt;however when i do this I am not able to retain the original datatime on the raw event but it is taking the datetime on the point i was sending the event. secondly i am using python because all these parquet files are storing in a s3 container hence it will be easier for me to loop thru the directory and extract the file.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was hoping if anyone can help me out how can i get the original timestamp of the logs? Or there are other more effective way of doing this?&lt;/P&gt;&lt;P&gt;sample logs from splunk after index,&lt;/P&gt;&lt;P&gt;-&amp;nbsp;Nov 10 09:45:50 127.0.0.1 &amp;lt;190&amp;gt;2023-09-01T16:59:12Z server1 server2 %NGIPS-6-430002: DeviceUUID: xxx-xxx-xxx&lt;/P&gt;&lt;P&gt;heres my code to push the event via syslog.&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import logging
import logging.handlers
import socket
from IPython.display import clear_output


#Create you logger. Please note that this logger is different from ArcSight logger.
#my_loggerudp = logging.getLogger('MyLoggerUDP')
#my_loggertcp = logging.getLogger('MyLoggerTCP')

#We will pass the message as INFO
my_loggerudp.setLevel(logging.INFO)

#Define SyslogHandler

#TCP
#handlertcp = logging.handlers.SysLogHandler(address = ('localhost',1026), socktype=socket.SOCK_STREAM)

#UDP
handlerudp = logging.handlers.SysLogHandler(address = ('localhost',1025), socktype=socket.SOCK_DGRAM)

#X.X.X.X =IP Address of the Syslog Collector(Connector Appliance,Loggers etc.)

#514 = Syslog port , You need to specify the port which you have defined ,by default it is 514 for Syslog)
my_loggerudp.addHandler(handlerudp)
#my_loggertcp.addHandler(handlertcp)

#Example: We will pass values from a List

event = df["event"]
count = len(event)
#for x in range(2):
for x in event:
clear_output (wait=True)
my_loggerudp.info(x)
my_loggerudp.handlers[0].flush()
count -= 1
print(f"logs left to be transmit {count}")
print (x)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Nov 2023 13:06:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Importing-logs-from-Parquet-into-Splunk/m-p/668194#M112003</guid>
      <dc:creator>w344423</dc:creator>
      <dc:date>2023-11-10T13:06:14Z</dc:date>
    </item>
    <item>
      <title>Re: Importing logs from Parquet into Splunk</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Importing-logs-from-Parquet-into-Splunk/m-p/668270#M112014</link>
      <description>&lt;P&gt;IMO, syslog should the onboarding choice of last resort.&amp;nbsp; There are too many syslog "standards" and issues always arise (like yours).&lt;/P&gt;&lt;P&gt;Since you're building your own ingestion program, consider sending the data to Splunk using HTTP Event Collector (HEC).&amp;nbsp; See "To Add Data Directly to an Index" at &lt;A href="https://dev.splunk.com/enterprise/docs/devtools/python/sdk-python/howtousesplunkpython/howtogetdatapython" target="_blank"&gt;https://dev.splunk.com/enterprise/docs/devtools/python/sdk-python/howtousesplunkpython/howtogetdatapython&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Nov 2023 18:07:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Importing-logs-from-Parquet-into-Splunk/m-p/668270#M112014</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2023-11-10T18:07:12Z</dc:date>
    </item>
  </channel>
</rss>

