Data+Warehouse

Definition
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains data from different sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

[[image:Data-Warehouse.jpg width="445" height="321"]]
(http://qmetrix.com.au/wp-content/uploads/2014/01/Data-Warehouse.jpg)

media type="youtube" key="zTs5zjSXnvs" width="560" height="315" A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to :
 * Congregate data from multiple sources into a single database so a single query engine can be used to present data.
 * Maintain data history.
 * Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
 * Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
 * Present the organization's information consistently.
 * Restructure the data so that it makes sense to the business users.
 * Add value to operational business applications, notably customer relationships management(CRM) systems.

Specific examples
Facebook has a unique storage scalability challenges when it comes to their data warehouse. Their warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3x growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for their warehouse infrastructure.

A lot of different components come together to provide a comprehensive platform for processing data at Facebook. This infrastructure is used for various different types of jobs each having different requirements – some more interactive in nature as compared to others, some requiring a predictable execution time as compared to others that require the ability to experiment and tweak. This infrastructure also needs to scale with the tremendous amount of data growth. By leveraging and developing a lot of open source technologies we have been able to meet the demands placed on our infrastructure and we are working on many other enhancements to it in order to service those demands even more and in order to evolve this infrastructure to support new use cases and query patterns.

Contributor:
====This section provides additional examples with images and perhaps videos that further define the concept. Label images/videos Figure 1, Figure 2, etc with a title and place the APA citation at the bottom of the page.====

Resources

 * Inmon, W. H. (1996). The data warehouse and data mining. //Communications of the ACM //, //39 //(11), 49-50.
 * English, L. P. (1999). //Improving data warehouse and business information quality //. J. Wiley & Sons.