Amazon searches inside 120,000 books, 20 TB of data

In the latest issue the Fast Company magazine describes the database used for Amazon’s Search Inside the Book feature. It includes 120,000-plus books. Each one had to be scanned digitally and indexed, a huge logistical challenge at a huge cost. The database took up 20 terabytes, which Bezos says is about 20 times larger than the biggest database that existed anywhere when Amazon was founded. But a large-scale launch was the only way to see whether it would go over with Amazon’s 43 million active customer accounts.