<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: issues with escaped quotes and index extrations with regex in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496035#M194622</link>
    <description>&lt;P&gt;Thanks to4kawa, but if you don't mind me asking - how would I use this? I see how well it works in the search window, but how would I set this up for ongoing use? For example, I want to create an app or source type that does this each time. How would this be used? Any hints or videos/articles to get this figured out would be appreciated?&lt;/P&gt;</description>
    <pubDate>Sun, 26 Jan 2020 00:23:54 GMT</pubDate>
    <dc:creator>thadfield</dc:creator>
    <dc:date>2020-01-26T00:23:54Z</dc:date>
    <item>
      <title>issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496033#M194620</link>
      <description>&lt;P&gt;ok, so I am trying to pull some fields from the following log file entry:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"127.0.0.1",11/21/2019 8:19:49 PM,11/21/2019 8:19:49 PM,"\CS\Projects\Sample\Development Environment",10429,"Config","Info","7016943","local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}","31C6E90FC53FAAE9B1273378DB1FF34D2338195D","0","0","SIGNING_AUDIT","745","{""Algorithm"":""SHA256"",""CommandLine"":""\""C:\\Program Files\\Microsoft Office\\Root\\Office16\\WINWORD.EXE\"" \/n \""C:\\Users\\tb\\Documents\\Evaluation Guide Supplement.docx"",""Executable"":""C:\\Program Files\\Microsoft Office\\Root\\Office16\\WINWORD.EXE"",""ExecutableHash"":""A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905"",""ExecutableSigner"":""CN=Microsoft Corporation, O=Microsoft Corporation, L=Redmond, S=Washington, C=US"",""ExecutableSize"":1951728,""Key"":""31C6E90FC53FAAE9B1273378DB1FF34D2338195D"",""Machine"":""07WKSWIN150536"",""PlaintextBase64"":""DslN3Fo9lTUEJZkwGdYQ1uua+9zkVsji9nZJD3M1qV4="",""PrefixedUniversal"":""local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}"",""WindowsUser"":""ad\\tb""}","CS - Signing Successful","A signing request with key 31C6E90FC53FAAE9B1273378DB1FF34D2338195D from user tb@redacted.com was successfully completed. 
    Code Signing Audit record:
      Key: 31C6E90FC53FAAE9B1273378DB1FF34D2338195D
      Artifact: {0E, C9, 4D, DC, 5A, 3D, 95, 35, 04, 25, 99, 30, 19, D6, 10, D6, EB, 9A, FB, DC, E4, 56, C8, E2, F6, 76, 49, 0F, 73, 35, A9, 5E}
      Hashing Algorithm: SHA256
      Machine: 07WKSWIN150536
      Remote Account: tony.hadfield
      Authenticated User: tb@redacted.com  Command: ""C:\Program Files\Microsoft Office\Root\Office16\WINWORD.EXE"" /n ""C:\Users\tb\Documents\Evaluation Guide Supplement.docx
      Application Hash: A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905
    "
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The regex I am using in my transforms.conf works fine on regex101.com:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(?:\"\")(\w+)(?:\"\":)(\"\".*?(?&amp;lt;!\\)\"\")
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here is my transforms.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[MyStringValues]
    REGEX = (?:\"\")(\w+)(?:\"\":)(?:\"\")(.*?)(?&amp;lt;!\\\\)(?:\"\")
    FORMAT = $1::$2
    REPEAT_MATCH = true
    WRITE_META = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And my props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[myCustomType]
    KV_MODE = none
    NO_BINARY_CHECK = true
    SHOULD_LINEMRGE = true
    category = custom
    pulldown_type = true
    TRANSFORMS-MyCustomType = MyStringValues
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The issue I am having, is the matches are only partially working. It pulling out a bunch of stuff not related to my regex and destroying my regex results. Here is what is pulled out into the index:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Algorithm = SHA256C=US =  CommandLine = \Corporation, =   Corporation, =  Executable = C:\ProgramExecutableHash = A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905ExecutableSigner = CN=MicrosoftFiles\Microsoft =  Key = 31C6E90FC53FAAE9B1273378DB1FF34D2338195DL=Redmond, =  Machine = 07WKSWIN150536O=Microsoft =  Office\Root\Office16\WINWORD.EXE =  PlaintextBase64 = DslN3Fo9lTUEJZkwGdYQ1uua+9zkVsji9nZJD3M1qV4=PrefixedUniv
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Notice it's pulling a bunch of "= " garbage values. It's completely confused by my escaped quotes withing the file paths. Any ideas of what I am doing wrong?&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jan 2020 01:04:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496033#M194620</guid>
      <dc:creator>thadfield</dc:creator>
      <dc:date>2020-01-25T01:04:46Z</dc:date>
    </item>
    <item>
      <title>Re: issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496034#M194621</link>
      <description>&lt;P&gt;UPDATE:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults 
| eval _raw="\"127.0.0.1\",11/21/2019 8:19:49 PM,11/21/2019 8:19:49 PM,\"\\CS\\Projects\\Sample\\Development Environment\",10429,\"Config\",\"Info\",\"7016943\",\"local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}\",\"31C6E90FC53FAAE9B1273378DB1FF34D2338195D\",\"0\",\"0\",\"SIGNING_AUDIT\",\"745\",\"{\"\"Algorithm\"\":\"\"SHA256\"\",\"\"CommandLine\"\":\"\"\\\"\"C:\\\\Program Files\\\\Microsoft Office\\\\Root\\\\Office16\\\\WINWORD.EXE\\\"\" \\/n \\\"\"C:\\\\Users\\\\tb\\\\Documents\\\\Evaluation Guide Supplement.docx\"\",\"\"Executable\"\":\"\"C:\\\\Program Files\\\\Microsoft Office\\\\Root\\\\Office16\\\\WINWORD.EXE\"\",\"\"ExecutableHash\"\":\"\"A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905\"\",\"\"ExecutableSigner\"\":\"\"CN=Microsoft Corporation, O=Microsoft Corporation, L=Redmond, S=Washington, C=US\"\",\"\"ExecutableSize\"\":1951728,\"\"Key\"\":\"\"31C6E90FC53FAAE9B1273378DB1FF34D2338195D\"\",\"\"Machine\"\":\"\"07WKSWIN150536\"\",\"\"PlaintextBase64\"\":\"\"DslN3Fo9lTUEJZkwGdYQ1uua+9zkVsji9nZJD3M1qV4=\"\",\"\"PrefixedUniversal\"\":\"\"local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}\"\",\"\"WindowsUser\"\":\"\"ad\\\\tb\"\"}\",\"CS - Signing Successful\",\"A signing request with key 31C6E90FC53FAAE9B1273378DB1FF34D2338195D from user tb@redacted.com was successfully completed. 
     Code Signing Audit record:
       Key: 31C6E90FC53FAAE9B1273378DB1FF34D2338195D
       Artifact: {0E, C9, 4D, DC, 5A, 3D, 95, 35, 04, 25, 99, 30, 19, D6, 10, D6, EB, 9A, FB, DC, E4, 56, C8, E2, F6, 76, 49, 0F, 73, 35, A9, 5E}
       Hashing Algorithm: SHA256
       Machine: 07WKSWIN150536
       Remote Account: tony.hadfield
       Authenticated User: tb@redacted.com
       Command: \"\"C:\\Program Files\\Microsoft Office\\Root\\Office16\\WINWORD.EXE\"\" /n \"\"C:\\Users\\tb\\Documents\\Evaluation Guide Supplement.docx
       Application Hash: A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905
     \"" 
| rex "(?s)(?&amp;lt;json&amp;gt;\"{\".+?\"}\"),(?&amp;lt;message&amp;gt;.+)" 
| eval json=trim(replace(json,"\"\"","\""),"\"") 
| spath input=json 
| rex "^(?&amp;lt;clientip&amp;gt;[^,]+),(?&amp;lt;ctime&amp;gt;[^,]+),(?&amp;lt;atime&amp;gt;[^,]+),(?&amp;lt;project&amp;gt;[^,]+)"
| appendpipe 
    [eval message=split(message,"
    ")
    | mvexpand message
    | rex max_match=20 field=message "(?im)\s+(?&amp;lt;fieldname&amp;gt;[A-Z].+): (?&amp;lt;unit&amp;gt;.+$)"
| eval {fieldname}=unit
    | stats values(*) as *
    | fields - fieldname unit]
    | selfjoin Machine
| fields - _raw _time json message
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transfoms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; [MyStringValues]
 REGEX = (?:\"\")(\w+)(?:\"\":)(\d+|((?:\"\")(.+?)(?:\"\")))(?:,|})
 FORMAT = $1::$4
 REPEAT_MATCH = true
 WRITE_META = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;A href="https://regex101.com/r/P613Br/1"&gt;https://regex101.com/r/P613Br/1&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I tried a lot, but eventually came to the conclusion that it was better to cut it in &lt;EM&gt;transforms.conf&lt;/EM&gt;.&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;spath&lt;/CODE&gt; is useful for extracting by search&lt;BR /&gt;
so,Instead of doing it in &lt;CODE&gt;transforms.conf&lt;/CODE&gt;&lt;BR /&gt;
there is also a way to run my query and make it a summary index with &lt;CODE&gt;collect&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.1/SearchReference/Collect"&gt;collect&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jan 2020 22:21:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496034#M194621</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-01-25T22:21:47Z</dc:date>
    </item>
    <item>
      <title>Re: issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496035#M194622</link>
      <description>&lt;P&gt;Thanks to4kawa, but if you don't mind me asking - how would I use this? I see how well it works in the search window, but how would I set this up for ongoing use? For example, I want to create an app or source type that does this each time. How would this be used? Any hints or videos/articles to get this figured out would be appreciated?&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jan 2020 00:23:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496035#M194622</guid>
      <dc:creator>thadfield</dc:creator>
      <dc:date>2020-01-26T00:23:54Z</dc:date>
    </item>
    <item>
      <title>Re: issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496036#M194623</link>
      <description>&lt;P&gt;Thanks to4kawa, this looks fantastic and is exactly the type of output I was hoping to see. How would you take this same approach for doing this at time of ingestion or index? Any pointers to either video or tutorial, I am pretty new at this... &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jan 2020 00:35:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496036#M194623</guid>
      <dc:creator>thadfield</dc:creator>
      <dc:date>2020-01-26T00:35:57Z</dc:date>
    </item>
    <item>
      <title>Re: issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496037#M194624</link>
      <description>&lt;P&gt;@thadfield&lt;BR /&gt;
I amended my answer, please confirm.&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jan 2020 03:17:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496037#M194624</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-01-26T03:17:29Z</dc:date>
    </item>
    <item>
      <title>Re: issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496038#M194625</link>
      <description>&lt;P&gt;Do you read &lt;CODE&gt;collect&lt;/CODE&gt; docs? &lt;BR /&gt;
please output the results to summary index using &lt;EM&gt;Reports&lt;/EM&gt; .&lt;BR /&gt;
your dashboard can search &lt;CODE&gt;index=your_summary_index&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;cf.&lt;BR /&gt;
 &lt;A href="https://www.youtube.com/watch?v=joZ3jokt9qs"&gt;Splunk Knowledge Object: Detail discussion on Summary Index@youtube&lt;/A&gt;&lt;BR /&gt;
 &lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.1/Knowledge/Usesummaryindexing"&gt;Use summary indexing@Splunk&amp;gt;docs&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jan 2020 03:17:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496038#M194625</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-01-27T03:17:25Z</dc:date>
    </item>
    <item>
      <title>Re: issues with escaped quotes and index extrations with regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496039#M194626</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Here is my regex approach:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;(?:\"\")(\w+)(?:\"\":)(\"\"[\w\W]+?\"\")(?:,|})
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note: It will not capture values that are not escaped (e.g. ExecutableSize"":1951728). For those values I would write a new extraction.&lt;/P&gt;

&lt;P&gt;I had bad experience before with Splunk regex and look ahead/behind.&lt;/P&gt;

&lt;P&gt;BR,&lt;BR /&gt;
Marko P.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jan 2020 08:55:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/issues-with-escaped-quotes-and-index-extrations-with-regex/m-p/496039#M194626</guid>
      <dc:creator>plaftaric</dc:creator>
      <dc:date>2020-01-27T08:55:57Z</dc:date>
    </item>
  </channel>
</rss>

