Splunk Search

dedup not giving any results.

khandelwaly
Explorer

I am not getting any results back using dedup

search query:

index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  
| rex "(?((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P[^ ]+) (?P[^ ]+) \[(?P[^\]]+)\] \"(?P[^\"]+)\" (?\d+) (?\d+) (?\d+) \"(?P[^\"]+)\" \"(?P[^\"]+)\" \"(?P[^\"]+)\"" 
| search NOT user_name=- |search NOT user_name=test|dedup session_id

search data:
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

Tags (2)
0 Karma

nickhills
Ultra Champion

Try this regex instead:

| rex "(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\""

Note: it will only work if your IP address is numeric - it wont work with x.x.x.x
https://regex101.com/r/8OGefw/1
In the regex101 test I used \w+ instead of \d+ for the ip

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

it didnot work 😞

0 Karma

nickhills
Ultra Champion

what results do you see if you leave off the |dedup ?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
0 Karma

nickhills
Ultra Champion

1.) are you replacing the ip addresses with x.x.x.x or do the actual results look like that?
2.) what results do you get with adding |table ipaddr req_id user_name session_id

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

yes replacing x.x.x.x with actual ip
i figured out referer value is coming empty, it means "https://google.com/6492" is not getting picked if i try to add the results in table

0 Karma

khandelwaly
Explorer

can you help me to get the correct regex?

0 Karma

nickhills
Ultra Champion

If there is no referrer, what does it leave in its place. space, - or ""?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

there will be referer always

0 Karma

nickhills
Ultra Champion

This works for me:
(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"

https://regex101.com/r/8OGefw/2

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

thanks that worked

0 Karma

nickhills
Ultra Champion

great - i'll update my answer

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

hi @khandelwaly If this solved your issue, please accept the answer.

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

like this

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"

x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"

0 Karma

nickhills
Ultra Champion

1.) Where is the session_id?
2.) Where is the user_name
2.) What is that regex supposed to do?

If my comment helps, please give it a thumbs up!
0 Karma

khandelwaly
Explorer

splunk query
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2" | rex "(?<ipaddr>((\d+).(\d+).(\d+).(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) [(?P<timestamp>[^]]+)] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id

0 Karma

richgalloway
SplunkTrust
SplunkTrust

We can't see the fields extracted by rex because formatting was not preserved by the system. Please update your question.

---
If this reply helps you, Karma would be appreciated.
0 Karma

khandelwaly
Explorer

splunk query

index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  | rex "(?&lt;ipaddr&gt;((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P&lt;req_id&gt;[^ ]+) (?P&lt;user_name&gt;[^ ]+) \[(?P&lt;timestamp&gt;[^\]]+)\] \"(?P&lt;req_url&gt;[^\"]+)\" (?&lt;http_status_code&gt;\d+) (?&lt;resp_size&gt;\d+) (?&lt;req_time&gt;\d+) \"(?P&lt;referer&gt;[^\"]+)\" \"(?P&lt;req_agent&gt;[^\"]+)\" \"(?P&lt;session_id&gt;[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
0 Karma

khandelwaly
Explorer
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"  | rex "(?<ipaddr>((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
0 Karma
Get Updates on the Splunk Community!

Fall Into Learning with New Splunk Education Courses

Every month, Splunk Education releases new courses to help you branch out, strengthen your data science roots, ...

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

By Martin Hettervik, Senior Consultant and Team Leader at Accelerate at Iver, Splunk MVPThe stats command is ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes

Your bank's payment processing system is humming along during a busy afternoon, handling millions in hourly ...