Hi there,
I want to build a query with strings from the lookup table. I have the list of domains in the look up table that I want to use in the query to match as hostname
base query:
index=* host=1 OR host=1 "HTTP/1.1 30*" "abc" | eval hostname=mvindex(split(....))
lookup
[| inputlookup domain.csv
| fields domain
Domain in the lookup are the hostname in the base search.
Can someone advise on this, please?
This should return results:
index=* host=1 OR host=1 "HTTP/1.1 30*" "abc"
| eval hostname=mvindex(split(....))
| search
[| inputlookup domain.csv
| eval hostname=domain
| table hostname
| format]
You need to make the name of the field that contains the data you want match the name of the field it will be running that search against. The format command will then format the results of the lookup into SPL that can be executed on a search line.
Here is the sample log event and the lookup
First query for hostname from the logs
index=web host=1 OR host=2 "*HTTP/1.1 3*" "*abc*"
| eval hostname=mvindex(split(mvindex(split(_raw,"&HOSTNAME="),1),"&AUTHNLEVEL="),0)
| eval hostname=trim(replace(hostname,"www.",""))
10.154.248.238 - - [03/May/2019:06:49:35.367 -0500] GET /xxx/rest/services/public/abc?ERROR_CODE=0x00000000&ERROR_TEXT=BA0521I%20%20%20Successful%20completion&HOSTNAME=www.yyy.us&AUTHNLEVEL= HTTP/1.1 307 - https://www.yyy.us/logout Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) userid=- sec=0 usec=13508 fwd=,- route=route0
Lookup table
| inputlookup domain.csv
domain
yyy.us
yyy.at
yyy.ae
So in this example if I see yyy.us in the both the result I am expecting this when I combine the query in one.
Can you run this search and post the results for just those three domains you have listed and post the results?
| inputlookup domain.csv
| eval hostname=domain
| table hostname
| format
It should return results that look like this: ( ( hostname="yyy.us" ) OR ( hostname="yyy.at" ) OR ( hostname="yyy.ae" ) )
I got this to work with a run anywhere example using the data you provided. Here's the query that shows your main query and the results:
| makeresults count=1
| eval _raw="10.154.248.238 - - [03/May/2019:06:49:35.367 -0500] GET /xxx/rest/services/public/abc?ERROR_CODE=0x00000000&ERROR_TEXT=BA0521I%20%20%20Successful%20completion&HOSTNAME=www.yyy.us&AUTHNLEVEL= HTTP/1.1 307 - https://www.yyy.us/logout Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) userid=- sec=0 usec=13508 fwd=,- route=route0"
| rex field=_raw "\[(?<timestamp>[^\]]+)"
| eval _time=strptime(timestamp, "%d/%b/%Y:%H:%M:%S.%3N%z")
| fields - timestamp
| eval hostname=mvindex(split(mvindex(split(_raw,"&HOSTNAME="),1),"&AUTHNLEVEL="),0)
| eval hostname=trim(replace(hostname,"www.",""))
Your results should look like this:
_raw
10.154.248.238 - - [03/May/2019:06:49:35.367 -0500] GET /xxx/rest/services/public/abc?ERROR_CODE=0x00000000&ERROR_TEXT=BA0521I%20%20%20Successful%20completion&HOSTNAME=www.yyy.us&AUTHNLEVEL= HTTP/1.1 307 - https://www.yyy.us/logout Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) userid=- sec=0 usec=13508 fwd=,- route=route0
_time
2019-05-03 07:49:35.367
hostname
yyy.us
This run anywhere query mimics your lookup file and produces the results I would expect with the format command:
| makeresults count=1
| fields - _time
| eval domain="yyy.us~yyy.at~yyy.ae"
| makemv domain delim="~"
| mvexpand domain
| eval hostname=domain
| table hostname
| format
You should see this under the search header: ( ( hostname="yyy.us" ) OR ( hostname="yyy.at" ) OR ( hostname="yyy.ae" ) )
Combining them into a single search that will return any hostname that exists in that lookup:
| makeresults count=1
| eval _raw="10.154.248.238 - - [03/May/2019:06:49:35.367 -0500] GET /xxx/rest/services/public/abc?ERROR_CODE=0x00000000&ERROR_TEXT=BA0521I%20%20%20Successful%20completion&HOSTNAME=www.yyy.us&AUTHNLEVEL= HTTP/1.1 307 - https://www.yyy.us/logout Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) userid=- sec=0 usec=13508 fwd=,- route=route0"
| rex field=_raw "\[(?<timestamp>[^\]]+)"
| eval _time=strptime(timestamp, "%d/%b/%Y:%H:%M:%S.%3N%z")
| fields - timestamp
| eval hostname=mvindex(split(mvindex(split(_raw,"&HOSTNAME="),1),"&AUTHNLEVEL="),0)
| eval hostname=trim(replace(hostname,"www.",""))
| search [| makeresults count=1
| fields - _time
| eval domain="yyy.us~yyy.at~yyy.ae"
| makemv domain delim="~"
| mvexpand domain
| eval hostname=domain
| table hostname
| format]
You can change the data after HOSTNAME=www to confirm that the filter is functioning with any three of the domains in the lookup as well as see it return no results if you make it something not in that lookup.
This should return results:
index=* host=1 OR host=1 "HTTP/1.1 30*" "abc"
| eval hostname=mvindex(split(....))
| search
[| inputlookup domain.csv
| eval hostname=domain
| table hostname
| format]
You need to make the name of the field that contains the data you want match the name of the field it will be running that search against. The format command will then format the results of the lookup into SPL that can be executed on a search line.
I had to change the field name in the lookup from sf-domain to whitelistdomain and it worked. It seems "-" in the field name was the culprit which was not evaluating the host name. I had not given the actual domain names and the actual lookup details for security reason. My bad, we could have identified the issue earlier otherwise.
Thanks a lot for your help!!!!!!!
You're welcome! Just as an FYI you can fix the hyphen causing the issue with either using a rename command with double quotes around the field name | rename "sf-domain" as hostname
or use a single quote around the field name when doing an eval | eval hostname='sf-domain'
Yup, sure. Thanks for all your help.
If I run the these 2 subsearch separately, I see the hostname value return from first search also returns from the look up from the second search, however, when run the search together as above, it doesn't return the data. Did it work for anybody for such scenario?
Thanks for the input.
If could provide an example of the events that you are looking at (with any private information removed) form your main query and some examples of the events in your lookup file I can give you a more details query using a run anywhere example.
Any more suggesstions please?
Your lookup should look like this
index=* host=1 OR host=1 "HTTP/1.1 30*" "abc" | eval hostname=mvindex(split(....))
| lookup domain.csv domain AS host OUTPUT <Column name in csv you want to output>
So if you had another column name that was titled URL, your lookup would look like this
index=* host=1 OR host=1 "HTTP/1.1 30*" "abc" | eval hostname=mvindex(split(....))
| lookup domain.csv domain AS host OUTPUT URL
I tried both as suggested below but doesn't work as expected.
This returns no result
| inputlookup domain.csv
| rename domain AS hostname
| join hostname [| search
index=web host=1 OR host=2 sourcetype="apache:access" "*HTTP/1.1 5*" "*abc*"
| eval hostname=mvindex(split(mvindex(split(_raw,"xxxx="),1),"&yyyyy"),0) | stats count by hostname ]
This returns result even the domain value is not in the lookup.
[| inputlookup domain.csv | stats values(domain) as query ]
index=web host=1 OR host=2 sourcetype="apache:access" "*HTTP/1.1 5*" "*abc*" | eval hostname=mvindex(split(mvindex(split(_raw,"xxx"),1),"yyyyy"),0)
Why are you renaming it hostname? It has to join on a common field from the lookup to the events in Splunk. host
is the default field in Splunk. Keep it as host
and it will work
If I'm reading his original post correctly, the domain lookup file equates to the hostname eval he is extracting in his search, not the host that the access log originated from.
May be the labels I used are confusing.
base query example:
index=* host=1 OR host=1 "HTTP/1.1 30*" "abc" | eval hostname=mvindex(split(....))
Here the derived hostname value could be abc.com, def.com etc.
In the look up table for domain column I have listed abc.com, def.com etc.
So my need is that the search should return the result if the derived hostname matches with domain in the lookup. To follow your approach I would need to add another column in the lookup something relevant to it otherwise. Please suggest.
Thats correct. whatever derived as hostname from logs, match it with domain in the lookup table.
Try something like this
| inputlookup domain.csv
| rename domain AS host
| join host
[| search index=* host=1 OR host=1 "HTTP/1.1 30*" "abc" | stats count by host | fields + host ]
So you want to iterate through a lookup table and see what matches from the lookup and whats in Splunk?
Thanks for the input. Actually I have single column look up table with column name 'domain', no other column for host.
To avoid the list of domains in the base query that looks ugly, I want to keep the fix list of domains in the table and I can add new domains as and when required in the table. However, the hostname derived from the logs in the base query can be any value based on the client domain. I want to filter the requests in the apache logs which have the hostname exist in lookup table.
I'm not sure I understand. Your lookup table only has a single column? Lookups are used to enrich your data in Splunk, here's an example of how it works
You have a lookup with 2 columns, first column is labeled host
and second column is labeled domain
. There is no domain fields in Splunk and you want to map the domain to each host. Since your hostnames in Splunk match the hostnames in the lookup, you can link those hostnames together to map the domain field. The logic looks like this
| lookup domain.csv <field that matches your lookup to date> OUTPUT <field that you have in lookup but not logs>
| lookup domain.csv host OUTPUT domain