Getting Data In

Help with SEDCMD

dfurtaw
Path Finder

Hi guys,

 

I have done my due diligence scouring internet forums and docs but can't seem to figure out how to tailor this sedcmd I'm using in this query to remove everything after host. This log is from a mainframe system and is broken in a funky manner. I am wanting to keep the host entry that is bolded but remove everything after this. Can I get someone to assist me with the sedcmd and explain how to apply this "wildcard" to everything in the log not just the line in question?

Log:

Feb 23 11:45:51.469604 host-qa log {start 1614098751.47494} {addr xx,xx.xx.194} {port 85488} {method GET} {url /statictest.html} {bytes 399} {status 200} {end 1614098751.461997} {host 88.10.30.7} Feb 23 11:45:51.469604 nstop-qa CommonLog: Feb 23 11:45:51.469604 nstop-qa 10.13.80.7 - - [23/Feb/2021:11:45:51 -0500] "GET /statictest.html HTTP/1.0" 200 399 Feb 23 11:45:51.470248 host-qa ' Feb 23 11:45:51.470248 host-qa 

 

 

Query:

basesearch

| rex mode=sed field=_raw "s/{host.*//g"

Results after I run this query

 

Feb 23 11:45:51.469604 host-qa log {start 1614098751.47494} {addr xx,xx.xx.194} {port 85488} {method GET} {url /statictest.html} {bytes 399} {status 200} {end 1614098751.461997} Feb 23 11:45:51.469604 nstop-qa CommonLog: Feb 23 11:45:51.469604 nstop-qa 10.13.80.7 - - [23/Feb/2021:11:45:51 -0500] "GET /statictest.html HTTP/1.0" 200 399 Feb 23 11:45:51.470248 host-qa ' Feb 23 11:45:51.470248 host-qa 

 

 

Labels (2)
0 Karma
1 Solution

tscroggins
Influencer

To keep everything up to and including {host n.n.n.n}, try:

"s/(.*{host \d+\.\d+\.\d+\.\d+}).*/\1/g"

View solution in original post

0 Karma

effem2
Path Finder

Hi dfurtaw

In order to fulfill your requested action you would do the following (eval and makeresults are not necessary, only as an example):

 

 

| makeresults 
| eval _raw="Feb 23 11:45:51.469604 host-qa log {start 1614098751.47494} {addr xx,xx.xx.194} {port 85488} {method GET} {url /statictest.html} {bytes 399} {status 200} {end 1614098751.461997} {host 88.10.30.7} Feb 23 11:45:51.469604 nstop-qa CommonLog: Feb 23 11:45:51.469604 nstop-qa 10.13.80.7 - - [23/Feb/2021:11:45:51 -0500] \"GET /statictest.html HTTP/1.0\" 200 399 Feb 23 11:45:51.470248 host-qa ' Feb 23 11:45:51.470248 host-qa" 
| rex mode=sed field=_raw "s/.*({host\s(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)}).*/\1/g"

 

 

 

 

What are we doing here?

mode=sed
-> Run rex command using stream editor mode.
-> nice SED documentation(gnu SED): https://www.gnu.org/software/sed/manual/sed.html
field=_raw -> Run the following sed command on the _raw field of the event.
"s/.*({host\s(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)}).*/\1/g"
First thing we set sed command s for substitution. That way everything we match gets replaced with the second part of the expression.

In your case we match everything before and after host in order to discard it.
See: .*
next up ({host\s(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)})
this is the capture group "()" to match the curly brackets including "host" a space-character and a valid IP address:
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} -> 3 first octets like "169."
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
-> last octet of ip address

What is (:?  ?
The notation of colon question mark  disables the capturing of the group. This is important i order for the replacement to take the right group.

Each capturing group is silently assigned an index. We can then use that index for the replacement.
Third part of the expression:

/\1/g -> Replace with the captured match index \1 globally "g". 

Conclusion 1:
Match everything and put the important stuff into a capturing group and use the reference to replace everything  with the desired capturing group.
------------------------
Now. If you only care for the actual ip occurring at the time. You could also extract the host field. That way you can use _raw for other queries.
Matching could be similar to the previous example we just don't have to match everything:
 

 

| makeresults 
| eval _raw="Feb 23 11:45:51.469604 host-qa log {start 1614098751.47494} {addr xx,xx.xx.194} {port 85488} {method GET} {url /statictest.html} {bytes 399} {status 200} {end 1614098751.461997} {host 88.10.30.7} Feb 23 11:45:51.469604 nstop-qa CommonLog: Feb 23 11:45:51.469604 nstop-qa 10.13.80.7 - - [23/Feb/2021:11:45:51 -0500] \"GET /statictest.html HTTP/1.0\" 200 399 Feb 23 11:45:51.470248 host-qa ' Feb 23 11:45:51.470248 host-qa" 
| rex field=_raw "{host\s(?<host>(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))}"
| timechart count by host

 

 You we are now only capturing the ip address, since { host is outside of the capturing group.
Additionally (?<host> indicates a named capturing group overwriting the existing host field with the captured value (the IP in this case).

Conclusion 2:

Named capturing groups can help creating new fields to work with down the stream.
For more help with regex and easier testing I highly recommend regex101.com

If you found this posting helpful please mark as solution and upvote. 

0 Karma

tscroggins
Influencer

To keep everything up to and including {host n.n.n.n}, try:

"s/(.*{host \d+\.\d+\.\d+\.\d+}).*/\1/g"

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

can you try

| rex mode=sed field=_raw "s/\{host.*//g"

or if that didn’t help then try two \\ before {
r. Ismo 

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...