Hello. Given that Splunk is good at indexing and querying data, I'm thinking of using it for website search. Have some questions about how to start installing and going about doing this.
Firstly, we use shared hosting (we do have root access to the main hosting admin account, if needed) and want to install Splunk once and then set up the indexing in such a way that each website has a separate queryable index. Is this possible? Can we install Splunk on the main server behind the scenes and set up indexing based on the URL of the shared sites, so that the index keeps only to that website?
Secondly, in terms of both indexing and querying, it's important for us to present search results to the public. This is doable, right? Splunk doesn't need to necessarily sit within password-based accounts?
Thirdly, based on the URL structure, we need to present results by categories of the site. This should be easily doable based on the search syntax, which we can create on the fly in our code (we use PHP on our websites, mainly) to query the index and show the results?
Fourthly, some of our websites have PDF files. We want to be able to show these PDF files in the search results, but in a separate sidebar.
Any guides that allow us to install Splunk in this way to make it available to the public in the form of website search? Would love to hear experiences of people who may already have done this.
Thanks!
Splunk is not good for document search, which is what most website search is. For example, stemming, synonyms, phrases, ranking by proximity, or ranked relevance on things other than time is not native to Splunk's indexing. Nor is extraction from non-text sources such as Word or PDF documents, you will also probably need to spend a disproportionate amount of time dealing with extracting only relevant parts of HTML.