<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Indexing Multi-File Formats inside Zip Files Using Upload/Oneshoot Method and AutoSourcetyping in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Multi-File-Formats-inside-Zip-Files-Using-Upload/m-p/399932#M71208</link>
    <description>&lt;P&gt;Hello Everyone&lt;/P&gt;

&lt;P&gt;For Endpoint Security Analysis Purposes we Gather Logs from Machines using Tools that Generate archives With lots of Files in it with Different Formats Like &lt;CODE&gt;XML, JSO, SQLite, txt, Log, evt, evtx, bin, etc&lt;/CODE&gt;...&lt;/P&gt;

&lt;P&gt;My Aim is to have all these Data Indexed Manually (Using &lt;CODE&gt;Web Upload&lt;/CODE&gt; Method or &lt;CODE&gt;CLI oneshoot&lt;/CODE&gt;) for the Team to use Splunk Search Capabilities to Simplify analysis Process&lt;/P&gt;

&lt;P&gt;Therefore I'm trying to do the Flowing :-&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;I Created Sourcetypes for Each a sample of File-types inside these archives (TextLog and XML)&lt;/LI&gt;
&lt;LI&gt;I created &lt;CODE&gt;[source::]&lt;/CODE&gt; Stanzas For Files Inside Archives to Assign these Previously Created Sourcetypes Automatically&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;after I uploaded the Zip file the Results was that Some Extensions were indexed Successful and Some were having a sourcetype of &lt;CODE&gt;"unknown1"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;One of the Successfully indexed File-types was &lt;CODE&gt;"*.bin"&lt;/CODE&gt; (Which is a plain Text Log file With time stamped lines) and Source Filed Was as Follows:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;KYPD_GSD_OFFICE_2018_04_22_16_06.zip:.\ThisIsaSample Folder Logs/Documents and Settings/All Users/Application Data/setupdownloader.1524402598.bdinstall.bin
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and one of the Unsuccessful Ones were &lt;CODE&gt;"*.XML"&lt;/CODE&gt; (a Typical XML File with no time Stamps) and source Filed Was as Follows :-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;KYPD_GSD_OFFICE_2018_04_22_16_06.zip:.\ThisIsaSample Folder Logs/output.xml
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I tried the Flowing &lt;CODE&gt;Props.conf&lt;/CODE&gt; Settings on &lt;CODE&gt;System/Local&lt;/CODE&gt; Folder:-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[KYPD_XML]
LINE_BREAKER = (&amp;lt;\?\w++.*\?&amp;gt;)|&amp;lt;\/\w+&amp;gt;(\s*)|\/&amp;gt;(\s*)|&amp;lt;\w+&amp;gt;(\s*)&amp;lt;\w+
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = true
MAX_EVENTS = 9999999
TRUNCATE = 0
KV_MODE = none
BREAK_ONLY_BEFORE = (?!)
#LEARN_MODEL = false
#=================================================================
[source::(?i)KYPD_[\w\-]+_[\d_]+[.]zip[:].[\\\/]ThisIsaSample Folder Logs[\\\/]output.xml]
sourcetype = KYPD_XML
priority = 100
#=================================================================
[KYPD_TXTLog]
DATETIME_CONFIG = CURRENT
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
TRUNCATE = 0
LEARN_MODEL = false
LEARN_SOURCETYPE = false
#=================================================================
[source::(?i)KYPD_[\w\-]+_[\d_]+[.]zip[:].[\\\/]ThisIsaSample Folder Logs...(.txt|.bdx|.log|.log.1|.bin|.dbg|.dbg.old|_debug.txt(.old)?)]
sourcetype = KYPD_TXTLog
priority = 99
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I tried Also the Following But Still the XML file "output.xml" Get "unknown1" as a Source type:-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[KYPD_XML]
LINE_BREAKER = (&amp;lt;\?\w++.*\?&amp;gt;)|&amp;lt;\/\w+&amp;gt;(\s*)|\/&amp;gt;(\s*)|&amp;lt;\w+&amp;gt;(\s*)&amp;lt;\w+
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = true
MAX_EVENTS = 9999999
TRUNCATE = 0
KV_MODE = none
BREAK_ONLY_BEFORE = (?!)
#LEARN_MODEL = false
#=================================================================
[source::…output.xml] # also tried ([source::*output.xml], [source::*.zip[:][.]…output.xml], [source::*.zip[:][.]\\…/output.xml] and [source::*.zip[:][.]\\ThisIsaSample Folder Logs/output.xml] )
sourcetype = KYPD_XML
priority = 100
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;to be Honest I'm about To Give up the whole Idea...&lt;/P&gt;

&lt;P&gt;I tried Many things but I cannot Understand Why the XML file is not Getting the Sourcetype Automatically...&lt;/P&gt;

&lt;P&gt;I Appreciate if you Can Tell Me if I'm Missing Something Here..&lt;/P&gt;</description>
    <pubDate>Wed, 21 Nov 2018 13:30:08 GMT</pubDate>
    <dc:creator>averlie_lina</dc:creator>
    <dc:date>2018-11-21T13:30:08Z</dc:date>
    <item>
      <title>Indexing Multi-File Formats inside Zip Files Using Upload/Oneshoot Method and AutoSourcetyping</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Multi-File-Formats-inside-Zip-Files-Using-Upload/m-p/399932#M71208</link>
      <description>&lt;P&gt;Hello Everyone&lt;/P&gt;

&lt;P&gt;For Endpoint Security Analysis Purposes we Gather Logs from Machines using Tools that Generate archives With lots of Files in it with Different Formats Like &lt;CODE&gt;XML, JSO, SQLite, txt, Log, evt, evtx, bin, etc&lt;/CODE&gt;...&lt;/P&gt;

&lt;P&gt;My Aim is to have all these Data Indexed Manually (Using &lt;CODE&gt;Web Upload&lt;/CODE&gt; Method or &lt;CODE&gt;CLI oneshoot&lt;/CODE&gt;) for the Team to use Splunk Search Capabilities to Simplify analysis Process&lt;/P&gt;

&lt;P&gt;Therefore I'm trying to do the Flowing :-&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;I Created Sourcetypes for Each a sample of File-types inside these archives (TextLog and XML)&lt;/LI&gt;
&lt;LI&gt;I created &lt;CODE&gt;[source::]&lt;/CODE&gt; Stanzas For Files Inside Archives to Assign these Previously Created Sourcetypes Automatically&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;after I uploaded the Zip file the Results was that Some Extensions were indexed Successful and Some were having a sourcetype of &lt;CODE&gt;"unknown1"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;One of the Successfully indexed File-types was &lt;CODE&gt;"*.bin"&lt;/CODE&gt; (Which is a plain Text Log file With time stamped lines) and Source Filed Was as Follows:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;KYPD_GSD_OFFICE_2018_04_22_16_06.zip:.\ThisIsaSample Folder Logs/Documents and Settings/All Users/Application Data/setupdownloader.1524402598.bdinstall.bin
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and one of the Unsuccessful Ones were &lt;CODE&gt;"*.XML"&lt;/CODE&gt; (a Typical XML File with no time Stamps) and source Filed Was as Follows :-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;KYPD_GSD_OFFICE_2018_04_22_16_06.zip:.\ThisIsaSample Folder Logs/output.xml
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I tried the Flowing &lt;CODE&gt;Props.conf&lt;/CODE&gt; Settings on &lt;CODE&gt;System/Local&lt;/CODE&gt; Folder:-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[KYPD_XML]
LINE_BREAKER = (&amp;lt;\?\w++.*\?&amp;gt;)|&amp;lt;\/\w+&amp;gt;(\s*)|\/&amp;gt;(\s*)|&amp;lt;\w+&amp;gt;(\s*)&amp;lt;\w+
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = true
MAX_EVENTS = 9999999
TRUNCATE = 0
KV_MODE = none
BREAK_ONLY_BEFORE = (?!)
#LEARN_MODEL = false
#=================================================================
[source::(?i)KYPD_[\w\-]+_[\d_]+[.]zip[:].[\\\/]ThisIsaSample Folder Logs[\\\/]output.xml]
sourcetype = KYPD_XML
priority = 100
#=================================================================
[KYPD_TXTLog]
DATETIME_CONFIG = CURRENT
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
TRUNCATE = 0
LEARN_MODEL = false
LEARN_SOURCETYPE = false
#=================================================================
[source::(?i)KYPD_[\w\-]+_[\d_]+[.]zip[:].[\\\/]ThisIsaSample Folder Logs...(.txt|.bdx|.log|.log.1|.bin|.dbg|.dbg.old|_debug.txt(.old)?)]
sourcetype = KYPD_TXTLog
priority = 99
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I tried Also the Following But Still the XML file "output.xml" Get "unknown1" as a Source type:-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[KYPD_XML]
LINE_BREAKER = (&amp;lt;\?\w++.*\?&amp;gt;)|&amp;lt;\/\w+&amp;gt;(\s*)|\/&amp;gt;(\s*)|&amp;lt;\w+&amp;gt;(\s*)&amp;lt;\w+
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = true
MAX_EVENTS = 9999999
TRUNCATE = 0
KV_MODE = none
BREAK_ONLY_BEFORE = (?!)
#LEARN_MODEL = false
#=================================================================
[source::…output.xml] # also tried ([source::*output.xml], [source::*.zip[:][.]…output.xml], [source::*.zip[:][.]\\…/output.xml] and [source::*.zip[:][.]\\ThisIsaSample Folder Logs/output.xml] )
sourcetype = KYPD_XML
priority = 100
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;to be Honest I'm about To Give up the whole Idea...&lt;/P&gt;

&lt;P&gt;I tried Many things but I cannot Understand Why the XML file is not Getting the Sourcetype Automatically...&lt;/P&gt;

&lt;P&gt;I Appreciate if you Can Tell Me if I'm Missing Something Here..&lt;/P&gt;</description>
      <pubDate>Wed, 21 Nov 2018 13:30:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Multi-File-Formats-inside-Zip-Files-Using-Upload/m-p/399932#M71208</guid>
      <dc:creator>averlie_lina</dc:creator>
      <dc:date>2018-11-21T13:30:08Z</dc:date>
    </item>
    <item>
      <title>Re: Indexing Multi-File Formats inside Zip Files Using Upload/Oneshoot Method and AutoSourcetyping</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Indexing-Multi-File-Formats-inside-Zip-Files-Using-Upload/m-p/399933#M71209</link>
      <description>&lt;P&gt;Hello All&lt;/P&gt;

&lt;P&gt;I tried to Extract the Files and Created Monitor Stanza For it and It Worked.&lt;/P&gt;

&lt;P&gt;But the Upload Method is Much more Convenient in Our Case&lt;/P&gt;

&lt;P&gt;I noticed Also that the Previously Uploaded Files Doesn't Index any New  Data (after Deleting and Creating New Index or Deleting Events Using the &lt;CODE&gt;|Delete&lt;/CODE&gt; Search Command).&lt;/P&gt;

&lt;P&gt;after some Readings I Got to Know that Splunk Doesn't Re-index Duplicate Files (CRCing the first 265 bytes of a file) and one can Configure &lt;CODE&gt;crcSalt&lt;/CODE&gt; in &lt;CODE&gt;inputs.conf&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;But I'm not sure if this Can work with Web Uploaded or CLI (&lt;CODE&gt;oneshot&lt;/CODE&gt;) indexed Files.&lt;/P&gt;

&lt;P&gt;the other thing that I Came Across is (I'm also not sure also if this is Right as it was in the Forums) that  Splunk Caches the Automatically Assigned SourceType of a file for 5 Minutes will will not recalculate it Before that time Elapses.&lt;/P&gt;

&lt;P&gt;I spend Hours trying to Modify the  &lt;CODE&gt;Props.conf&lt;/CODE&gt; &lt;CODE&gt;&amp;lt;source::&amp;gt;&lt;/CODE&gt; Stanzas and Kept on Retrying Deleting indexed Events &lt;BR /&gt;
(also Deleting the Entire Index and Recreating it) and Re indexing it for the Same Archive Files with no luck, and I See that May be this is the Reason.&lt;/P&gt;

&lt;P&gt;Appreciate if Someone Correct me if I'm Wrong, or Have Solution in mind.&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 13 Dec 2018 12:28:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Indexing-Multi-File-Formats-inside-Zip-Files-Using-Upload/m-p/399933#M71209</guid>
      <dc:creator>averlie_lina</dc:creator>
      <dc:date>2018-12-13T12:28:55Z</dc:date>
    </item>
  </channel>
</rss>

