This file is from Wikimedia Commons and may be used by other projects.
The description on its file description page there is shown below.
Summary
DescriptionHadoop and Beyond. An overview of Analytics infrastructure.webm
English: In this tech talk we will be presenting the analytics infrastructure that we have recently rolled out in production. By now probably everybody knows that wikimedia hosts an instance of hadoop from which we are going to extract pageview data in the near future. But .. how exactly does the data get there?
We will go over the path that webrequest log data takes from varnish to kafka (a distributed log buffer) to hadoop and the challenges of deploying this java-based infrastructure in production. We will also talk about how can we query the data with hive, an SQL-like interface. How can you set up this stack on vagrant to play with and, last but not least, how we used hive recently to provide GLAM folks with image view stats: Commons:GLAMwiki Toolset Project/NARA analytics pilot
to share – to copy, distribute and transmit the work
to remix – to adapt the work
Under the following conditions:
attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
https://creativecommons.org/licenses/by/3.0CC BY 3.0 Creative Commons Attribution 3.0 truetrue