Im trying to create a search that will check the proxy logs for any URL hits that match a static list of URLs in a csv file. The problem Im running into is that the URLs in the static list are formatted like domainabc.com while the results from the logs include the full URL such as www.domainabc.com, sites.domainabc.com, etc. My initial thought was to use a search like below, and include an eval statement to trim URL. The URLs are dynamic in length so doing a ltrim would not work as far as I can tell. Is there a better way to make this search?
index="proxy" | fields date time site | eval url=site | table date time url | join url type = inner | [inputcsv list.cv | fields item1 item2 | eval url=item1 | table url item2]
If the data in the field URL is always in mentioned format, then you can do following:-
Run following query
index="proxy" | fields date time site | rex field=site "\w+.(?
Once you validate the "rex" is working fine, you can add it to props.conf manually or from Splunk Web -> Settings ->Fields -> Field extractions with appropriate Sharing permissions. (this is recommended)
Can you post some sample logs from "index=proxy" and list.csv?
You can setup a field extraction to get portion of the URL from index=proxy and then you can do join/lookup (will need to create lookup table from your csv file) with list.csv to find the matches.
Unfortunately I cannot post the actual logs. The data from list.csv looks like
123.com
abc.com
google.com
while the data in the field from index=proxy looks like
www.123.com
sites.123.com
page.123.com
www.abc.com
aa.google.com