A petabyte is more than 1000 terabytes. Several terabytes are challenging but doable, but a thousand of them is quite different. Note that if I'm being generous (and assuming data input problems are solved), then simply doing the indexing would require more than 2000 indexer nodes. A more realistic number is probably closer to 10,000 nodes.
So I would say no, not in practice, not today, not without a few (proposed and probably coming someday) architectural enhancements. Problems that would have to be addressed include:
That's a start.
I would say maybe with reservations.
The number of boxes you would need would be somewhere on the order of 5000-10000 dual Quad cores. Maybe with super high end machines you could get that number into the low thousands.
Your storage needs would probably have to come from local disk rather than SAN seeing as you would need a separate SAN for every 4-5 days of storage. ( Compression of ~50% of raw + the index)
I don't think the current distributed search architecture would like the number of Indexers you would have to connect to to run a search across all boxes but we could theoretically have multiple tiers.
A couple of other questions would obviously come up and help with even getting an idea of feasibility:
How long would you want to store the data?
How fast would you want results?
Do you have to keep all of the data, or could a big chunk of the data be summarized?
What type of searches do you think you would use? Reports? Needle in a haystack?
This would be a crazy hard problem to solve but I think Splunk would have a decent chance of making it happen depending on the use cases and availability to a serious amount of hardware. It would be fun to test. Got a lab we can use? 🙂