Archive

issues with escaped quotes and index extrations with regex

New Member

ok, so I am trying to pull some fields from the following log file entry:

"127.0.0.1",11/21/2019 8:19:49 PM,11/21/2019 8:19:49 PM,"\CS\Projects\Sample\Development Environment",10429,"Config","Info","7016943","local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}","31C6E90FC53FAAE9B1273378DB1FF34D2338195D","0","0","SIGNING_AUDIT","745","{""Algorithm"":""SHA256"",""CommandLine"":""\""C:\\Program Files\\Microsoft Office\\Root\\Office16\\WINWORD.EXE\"" \/n \""C:\\Users\\tb\\Documents\\Evaluation Guide Supplement.docx"",""Executable"":""C:\\Program Files\\Microsoft Office\\Root\\Office16\\WINWORD.EXE"",""ExecutableHash"":""A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905"",""ExecutableSigner"":""CN=Microsoft Corporation, O=Microsoft Corporation, L=Redmond, S=Washington, C=US"",""ExecutableSize"":1951728,""Key"":""31C6E90FC53FAAE9B1273378DB1FF34D2338195D"",""Machine"":""07WKSWIN150536"",""PlaintextBase64"":""DslN3Fo9lTUEJZkwGdYQ1uua+9zkVsji9nZJD3M1qV4="",""PrefixedUniversal"":""local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}"",""WindowsUser"":""ad\\tb""}","CS - Signing Successful","A signing request with key 31C6E90FC53FAAE9B1273378DB1FF34D2338195D from user tb@redacted.com was successfully completed. 
    Code Signing Audit record:
      Key: 31C6E90FC53FAAE9B1273378DB1FF34D2338195D
      Artifact: {0E, C9, 4D, DC, 5A, 3D, 95, 35, 04, 25, 99, 30, 19, D6, 10, D6, EB, 9A, FB, DC, E4, 56, C8, E2, F6, 76, 49, 0F, 73, 35, A9, 5E}
      Hashing Algorithm: SHA256
      Machine: 07WKSWIN150536
      Remote Account: tony.hadfield
      Authenticated User: tb@redacted.com  Command: ""C:\Program Files\Microsoft Office\Root\Office16\WINWORD.EXE"" /n ""C:\Users\tb\Documents\Evaluation Guide Supplement.docx
      Application Hash: A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905
    "

The regex I am using in my transforms.conf works fine on regex101.com:

(?:\"\")(\w+)(?:\"\":)(\"\".*?(?<!\\)\"\")

Here is my transforms.conf:

[MyStringValues]
    REGEX = (?:\"\")(\w+)(?:\"\":)(?:\"\")(.*?)(?<!\\\\)(?:\"\")
    FORMAT = $1::$2
    REPEAT_MATCH = true
    WRITE_META = true

And my props.conf:

[myCustomType]
    KV_MODE = none
    NO_BINARY_CHECK = true
    SHOULD_LINEMRGE = true
    category = custom
    pulldown_type = true
    TRANSFORMS-MyCustomType = MyStringValues

The issue I am having, is the matches are only partially working. It pulling out a bunch of stuff not related to my regex and destroying my regex results. Here is what is pulled out into the index:

Algorithm = SHA256C=US =  CommandLine = \Corporation, =   Corporation, =  Executable = C:\ProgramExecutableHash = A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905ExecutableSigner = CN=MicrosoftFiles\Microsoft =  Key = 31C6E90FC53FAAE9B1273378DB1FF34D2338195DL=Redmond, =  Machine = 07WKSWIN150536O=Microsoft =  Office\Root\Office16\WINWORD.EXE =  PlaintextBase64 = DslN3Fo9lTUEJZkwGdYQ1uua+9zkVsji9nZJD3M1qV4=PrefixedUniv

Notice it's pulling a bunch of "= " garbage values. It's completely confused by my escaped quotes withing the file paths. Any ideas of what I am doing wrong?

0 Karma

Explorer

Hi,

Here is my regex approach:

(?:\"\")(\w+)(?:\"\":)(\"\"[\w\W]+?\"\")(?:,|})

Note: It will not capture values that are not escaped (e.g. ExecutableSize"":1951728). For those values I would write a new extraction.

I had bad experience before with Splunk regex and look ahead/behind.

BR,
Marko P.

0 Karma

New Member

Thanks to4kawa, but if you don't mind me asking - how would I use this? I see how well it works in the search window, but how would I set this up for ongoing use? For example, I want to create an app or source type that does this each time. How would this be used? Any hints or videos/articles to get this figured out would be appreciated?

0 Karma

Ultra Champion

Do you read collect docs?
please output the results to summary index using Reports .
your dashboard can search index=your_summary_index

cf.
Splunk Knowledge Object: Detail discussion on Summary Index@youtube
Use summary indexing@Splunk>docs

0 Karma

Ultra Champion

UPDATE:

| makeresults 
| eval _raw="\"127.0.0.1\",11/21/2019 8:19:49 PM,11/21/2019 8:19:49 PM,\"\\CS\\Projects\\Sample\\Development Environment\",10429,\"Config\",\"Info\",\"7016943\",\"local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}\",\"31C6E90FC53FAAE9B1273378DB1FF34D2338195D\",\"0\",\"0\",\"SIGNING_AUDIT\",\"745\",\"{\"\"Algorithm\"\":\"\"SHA256\"\",\"\"CommandLine\"\":\"\"\\\"\"C:\\\\Program Files\\\\Microsoft Office\\\\Root\\\\Office16\\\\WINWORD.EXE\\\"\" \\/n \\\"\"C:\\\\Users\\\\tb\\\\Documents\\\\Evaluation Guide Supplement.docx\"\",\"\"Executable\"\":\"\"C:\\\\Program Files\\\\Microsoft Office\\\\Root\\\\Office16\\\\WINWORD.EXE\"\",\"\"ExecutableHash\"\":\"\"A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905\"\",\"\"ExecutableSigner\"\":\"\"CN=Microsoft Corporation, O=Microsoft Corporation, L=Redmond, S=Washington, C=US\"\",\"\"ExecutableSize\"\":1951728,\"\"Key\"\":\"\"31C6E90FC53FAAE9B1273378DB1FF34D2338195D\"\",\"\"Machine\"\":\"\"07WKSWIN150536\"\",\"\"PlaintextBase64\"\":\"\"DslN3Fo9lTUEJZkwGdYQ1uua+9zkVsji9nZJD3M1qV4=\"\",\"\"PrefixedUniversal\"\":\"\"local:{d597da58-6b69-4a9a-b494-0e97e49a43b8}\"\",\"\"WindowsUser\"\":\"\"ad\\\\tb\"\"}\",\"CS - Signing Successful\",\"A signing request with key 31C6E90FC53FAAE9B1273378DB1FF34D2338195D from user tb@redacted.com was successfully completed. 
     Code Signing Audit record:
       Key: 31C6E90FC53FAAE9B1273378DB1FF34D2338195D
       Artifact: {0E, C9, 4D, DC, 5A, 3D, 95, 35, 04, 25, 99, 30, 19, D6, 10, D6, EB, 9A, FB, DC, E4, 56, C8, E2, F6, 76, 49, 0F, 73, 35, A9, 5E}
       Hashing Algorithm: SHA256
       Machine: 07WKSWIN150536
       Remote Account: tony.hadfield
       Authenticated User: tb@redacted.com
       Command: \"\"C:\\Program Files\\Microsoft Office\\Root\\Office16\\WINWORD.EXE\"\" /n \"\"C:\\Users\\tb\\Documents\\Evaluation Guide Supplement.docx
       Application Hash: A5EE905C1E7372904AF2BFD2695337B1214440D0DB89033D26BD070360838905
     \"" 
| rex "(?s)(?<json>\"{\".+?\"}\"),(?<message>.+)" 
| eval json=trim(replace(json,"\"\"","\""),"\"") 
| spath input=json 
| rex "^(?<clientip>[^,]+),(?<ctime>[^,]+),(?<atime>[^,]+),(?<project>[^,]+)"
| appendpipe 
    [eval message=split(message,"
    ")
    | mvexpand message
    | rex max_match=20 field=message "(?im)\s+(?<fieldname>[A-Z].+): (?<unit>.+$)"
| eval {fieldname}=unit
    | stats values(*) as *
    | fields - fieldname unit]
    | selfjoin Machine
| fields - _raw _time json message

transfoms.conf

 [MyStringValues]
 REGEX = (?:\"\")(\w+)(?:\"\":)(\d+|((?:\"\")(.+?)(?:\"\")))(?:,|})
 FORMAT = $1::$4
 REPEAT_MATCH = true
 WRITE_META = true

https://regex101.com/r/P613Br/1

I tried a lot, but eventually came to the conclusion that it was better to cut it in transforms.conf.

spath is useful for extracting by search
so,Instead of doing it in transforms.conf
there is also a way to run my query and make it a summary index with collect

collect

0 Karma

New Member

Thanks to4kawa, this looks fantastic and is exactly the type of output I was hoping to see. How would you take this same approach for doing this at time of ingestion or index? Any pointers to either video or tutorial, I am pretty new at this... 🙂

0 Karma

Ultra Champion

@thadfield
I amended my answer, please confirm.

0 Karma