I am not getting any results back using dedup
search query:
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2"
| rex "(?((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P[^ ]+) (?P[^ ]+) \[(?P[^\]]+)\] \"(?P[^\"]+)\" (?\d+) (?\d+) (?\d+) \"(?P[^\"]+)\" \"(?P[^\"]+)\" \"(?P[^\"]+)\""
| search NOT user_name=- |search NOT user_name=test|dedup session_id
search data:
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
Try this regex instead:
| rex "(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\""
Note: it will only work if your IP address is numeric - it wont work with x.x.x.x
https://regex101.com/r/8OGefw/1
In the regex101 test I used \w+ instead of \d+ for the ip
it didnot work 😞
what results do you see if you leave off the |dedup
?
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
1.) are you replacing the ip addresses with x.x.x.x or do the actual results look like that?
2.) what results do you get with adding |table ipaddr req_id user_name session_id
yes replacing x.x.x.x with actual ip
i figured out referer value is coming empty, it means "https://google.com/6492" is not getting picked if i try to add the results in table
can you help me to get the correct regex?
If there is no referrer, what does it leave in its place. space
, -
or ""
?
there will be referer always
This works for me:
(?<ipaddr>\d+\.\d+\.\d+\.\d+) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>[^\s]+) (?<resp_size>[^\s]+) (?<req_time>[^\s]+) (?P<referer>[^\s]+) \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"
https://regex101.com/r/8OGefw/2
thanks that worked
great - i'll update my answer
hi @khandelwaly If this solved your issue, please accept the answer.
like this
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
x.x.x.x sadfkanfadskf lsds [06/Feb/2020:08:13:23 -0800] "GET https://tests.com/generate HTTP/1.1" - - - "https://tests.com/34490" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" "el1z6d"
x.x.x.x estsdfasf dsfads [06/Feb/2020:08:13:23 -0800] "GET https://google.com HTTP/1.1" 200 5925 0.0200 "https://google.com/6492" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" "gbhaxl"
1.) Where is the session_id?
2.) Where is the user_name
2.) What is that regex supposed to do?
splunk query
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2" | rex "(?<ipaddr>((\d+).(\d+).(\d+).(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) [(?P<timestamp>[^]]+)] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
We can't see the fields extracted by rex
because formatting was not preserved by the system. Please update your question.
splunk query
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2" | rex "(?<ipaddr>((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id
index=prdidx sourcetype="OUTPUT" source="http-access.log" NOT "ELB-HealthChecker/2" | rex "(?<ipaddr>((\d+)\.(\d+)\.(\d+)\.(\d+))) (?P<req_id>[^ ]+) (?P<user_name>[^ ]+) \[(?P<timestamp>[^\]]+)\] \"(?P<req_url>[^\"]+)\" (?<http_status_code>\d+) (?<resp_size>\d+) (?<req_time>\d+) \"(?P<referer>[^\"]+)\" \"(?P<req_agent>[^\"]+)\" \"(?P<session_id>[^\"]+)\"" | search NOT user_name=- |search NOT user_name=test_monitor |dedup session_id