(Our campus in Harrow, London with exceptionally great weather recently)
This month I am participating in the Science to Data Science program which aims to integrate highly skilled ex-academics into various industries that seek to solve challenging problems by using data analysis. In other words it is a summer school to help transition scientists to data science. This first class contains a highly diverse group of eighty five PhDs with backgrounds of research on topics ranging from the subatomic Higgs Boson, understanding how information is transmitted through our brains to those who investigate the origins and evolution of the Universe. For a statistic summary of the diversity see the S2DS official blog. Here you can read a recent article about the initiative in business-cloud.
(… ready … set … go!)
Yesterday’s session contained lectures and formalities. The first speaker from the main sponsor company, KPMG, John Hall, gave a few examples of what his company is looking for in data scientists. The aspect that I found most interesting was exploring methods to identify bankers that go rogue. Besides that he reminded us, probably without realizing it, that finance guys LOVE to talk in acronyms. FT, MI, FS, BI, to name a few. I personally do not recall hearing so many acronyms since my army service years.
He was followed by Hetan Shah the Executive Director of the Royal Statistical Society who introduced the organization. I haven’t looked too thoroughly into their various activities, but one initiative I like is a pro-bono group that aims to help journalists better appreciate statistics so they can be more critical in their articles about advances in science. In a post of theirs they explain the need to convert them from “That exciting!”-ists to “Is that exciting?”-ists.
The tutorial sessions started with Ole Moeller-Nilsson‘s introductory to Good Coding Practices. He did an excellent job at covering the basics that are useful for all levels of coders from the newbies to the more experienced that still have bad habits. The most important message that he conveyed, I think, is that one writes a code for an organization/company, not for themselves. In other words:
Code belongs to the team, not the individual.
Another mention-worthy quote is “A function should do only one thing!”. He also talked about the importance of testing code, refractoring (improving the code), code reviews and the concept of waterfall vs. agile coding.
(Helga and Daniel laughing at my agony of realizing that we might need to use a Windows computer in our project. Fortunately for my group, ConnectomeX, we can use our own laptops.)
In the last tutorial of the day, Chris Harris from Hortonworks gave an introductory to Hadoop. We learned that that while Hadoop 1.0 could be defined as Google’s well known MapReduce algorithm that distributes tasks over many machines and then collects the results, the current Hadoop 2.0 is an open source operating machine for analysis of large data sets. The lectures were a bit overwhelming with information, but it was good to get a sense of the complicated structure and relationships between the different functions and packages that appear like a chaotic zoo (containing elephants, Giraphs, PIGs — which eat anything –, as well as a bee HIVE).
(Wining and dining at 4 Hamilton Place by Green Park.)
The first day was concluded with drinks and dinner in at 4 Hamilton Place by Green Park. While wining and dining, most of us got to meet our project mentors, and we competed for team glory with Legos.
Overall it was a fun way to start program, and I’m looking forward for a fun data incentive month.
To follow the 2014 class of S2DS on Twitter see #S2DS14.
Also, Chris Wallace is writing a blog about his experiences in the program in his blog.
(Hellen from team ConnectomeX building our Lego model in the inter-group competition).