Splunk Search

dedup not giving any results.

khandelwaly
Explorer

I am not getting any results back using dedup

search query:

index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  
| rex "(?((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P[^ ]+) (?P[^ ]+) \[(?P[^\]]+)\] \"(?P[^\"]+)\" (?\d+) (?\d+) (?\d+) \"(?P[^\"]+)\" \"(?P[^\"]+)\" \"(?P[^\"]+)\"" 
| search NOT user_name=- |search NOT user_name=test|dedup session_id

search data:
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

Tags (2)
0 Karma

nickhills
Ultra Champion

Try this regex instead:

| rex "(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\""

Note: it will only work if your IP address is numeric - it wont work with x.x.x.x
https://regex101.com/r/8OGefw/1
In the regex101 test I used \w+ instead of \d+ for the ip

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

it didnot work 😞

0 Karma

nickhills
Ultra Champion

what results do you see if you leave off the |dedup ?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
0 Karma

nickhills
Ultra Champion

1.) are you replacing the ip addresses with x.x.x.x or do the actual results look like that?
2.) what results do you get with adding |table ipaddr req_id user_name session_id

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

yes replacing x.x.x.x with actual ip
i figured out referer value is coming empty, it means "https://google.com/6492" is not getting picked if i try to add the results in table

0 Karma

khandelwaly
Explorer

can you help me to get the correct regex?

0 Karma

nickhills
Ultra Champion

If there is no referrer, what does it leave in its place. space, - or ""?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

there will be referer always

0 Karma

nickhills
Ultra Champion

This works for me:
(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"

https://regex101.com/r/8OGefw/2

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

thanks that worked

0 Karma

nickhills
Ultra Champion

great - i'll update my answer

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

hi @khandelwaly If this solved your issue, please accept the answer.

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

like this

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

0 Karma

nickhills
Ultra Champion

1.) Where is the session_id?
2.) Where is the user_name
2.) What is that regex supposed to do?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

splunk query
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2" | rex "(?<ipaddr>((\d+).(\d+).(\d+).(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) [(?P<timestamp>[^]]+)] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id

0 Karma

richgalloway
SplunkTrust
SplunkTrust

We can't see the fields extracted by rex because formatting was not preserved by the system. Please update your question.

---
If this reply helps you, Karma would be appreciated.
0 Karma

khandelwaly
Explorer

splunk query

index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  | rex "(?&lt;ipaddr&gt;((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P&lt;req_id&gt;[^ ]+) (?P&lt;user_name&gt;[^ ]+) \[(?P&lt;timestamp&gt;[^\]]+)\] \"(?P&lt;req_url&gt;[^\"]+)\" (?&lt;http_status_code&gt;\d+) (?&lt;resp_size&gt;\d+) (?&lt;req_time&gt;\d+) \"(?P&lt;referer&gt;[^\"]+)\" \"(?P&lt;req_agent&gt;[^\"]+)\" \"(?P&lt;session_id&gt;[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
0 Karma

khandelwaly
Explorer
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  | rex "(?<ipaddr>((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...