Splunk Search

dedup not giving any results.

khandelwaly
Explorer

I am not getting any results back using dedup

search query:

index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  
| rex "(?((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P[^ ]+) (?P[^ ]+) \[(?P[^\]]+)\] \"(?P[^\"]+)\" (?\d+) (?\d+) (?\d+) \"(?P[^\"]+)\" \"(?P[^\"]+)\" \"(?P[^\"]+)\"" 
| search NOT user_name=- |search NOT user_name=test|dedup session_id

search data:
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

Tags (2)
0 Karma

nickhills
Ultra Champion

Try this regex instead:

| rex "(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\""

Note: it will only work if your IP address is numeric - it wont work with x.x.x.x
https://regex101.com/r/8OGefw/1
In the regex101 test I used \w+ instead of \d+ for the ip

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

it didnot work 😞

0 Karma

nickhills
Ultra Champion

what results do you see if you leave off the |dedup ?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
0 Karma

nickhills
Ultra Champion

1.) are you replacing the ip addresses with x.x.x.x or do the actual results look like that?
2.) what results do you get with adding |table ipaddr req_id user_name session_id

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

yes replacing x.x.x.x with actual ip
i figured out referer value is coming empty, it means "https://google.com/6492" is not getting picked if i try to add the results in table

0 Karma

khandelwaly
Explorer

can you help me to get the correct regex?

0 Karma

nickhills
Ultra Champion

If there is no referrer, what does it leave in its place. space, - or ""?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

there will be referer always

0 Karma

nickhills
Ultra Champion

This works for me:
(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"

https://regex101.com/r/8OGefw/2

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

thanks that worked

0 Karma

nickhills
Ultra Champion

great - i'll update my answer

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

hi @khandelwaly If this solved your issue, please accept the answer.

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

like this

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

0 Karma

nickhills
Ultra Champion

1.) Where is the session_id?
2.) Where is the user_name
2.) What is that regex supposed to do?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

splunk query
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2" | rex "(?<ipaddr>((\d+).(\d+).(\d+).(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) [(?P<timestamp>[^]]+)] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id

0 Karma

richgalloway
SplunkTrust
SplunkTrust

We can't see the fields extracted by rex because formatting was not preserved by the system. Please update your question.

---
If this reply helps you, Karma would be appreciated.
0 Karma

khandelwaly
Explorer

splunk query

index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  | rex "(?&lt;ipaddr&gt;((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P&lt;req_id&gt;[^ ]+) (?P&lt;user_name&gt;[^ ]+) \[(?P&lt;timestamp&gt;[^\]]+)\] \"(?P&lt;req_url&gt;[^\"]+)\" (?&lt;http_status_code&gt;\d+) (?&lt;resp_size&gt;\d+) (?&lt;req_time&gt;\d+) \"(?P&lt;referer&gt;[^\"]+)\" \"(?P&lt;req_agent&gt;[^\"]+)\" \"(?P&lt;session_id&gt;[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
0 Karma

khandelwaly
Explorer
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  | rex "(?<ipaddr>((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
0 Karma
Get Updates on the Splunk Community!

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...