Data Science Job Hunting

After the S2DS workshop I searched for a job as a data scientist in London, which, as the cliche goes, is truly a full time job. As I am transitioning from the academic world to the private sector, I find my efforts a positive experience. The interview process helped me have a better understanding of what I actually want to do as a data scientist. Here I list my personal experiences and conclusions. — SPOILER ALERT — I was offered a data science role at SCL Elections, and commence my new job on November 24th! For more generic advice on how to prepare for a data science interview I recommend reading Jessica Kirkpatrick’s  summary.

I followed a colleague’s suggestion

Commence by accepting any job interview and after gaining experience be more selective.

The logic behind this is to improve your salesmanship for when you interview for a job that you really want.  Make the common mistakes early so that you are ready when it really counts. This led me to interview with companies various shapes and sizes, which, in turn, gave me a better feel for what is out there and what I should expect from projects, work environment and salary.

I realized, for example, that, at this stage, I would prefer to work in a company that understands that data analysis is a group effort. Data management, Statistics and Machine Learning are things that should be communicated with like-minded people. For example,  Tesco (a major produce conglomerate in the UK and abroad) is building an excellent data science team from PhDs. Besides a competitive salary they encourage promotion of data science within the company as well as enable self development (e.g, they enable one day a month to stay at home to study). On the other extreme I met with the co-founder of BorrowMyDoggie (an application that connects dog owners and dog lovers) who was looking for a sole go-to-data-person.

The saying that “Your CV gets your foot through the door” can be paraphrased by “Your CV (and LinkedIN profile) gets you over the first hurdle of many.” Before meeting employers in person, most companies, medium to large, will first conduct various screenings: on the phone and via take-home tests.

The phone/skype calls are normally 30-60 minutes. I interviewed with: Google, Netflix, Facebook, Bookings, Tesco, SkimLinks (affiliate marketing technology), MustardSystems (sports analysis for predicting gambling results) and an NGO. These normally involve questioning on basic understanding in Statistics (e.g, Poisson, Gaussian, Binomial distributions), Coding (e.g, Tesco asked to efficiently code the Fibonacci series), Data analysis (e.g, SQL questions, and study cases). Both on the phone and in person their questions sample from the Data Scientist Venn Diagram.

Take home-exams involved either a data analysis, a coding test or both. Both MustardSystems and OpenSignal (crowd sourcing wireless coverage mapping) had me perform 90 minute coding tests via codility which is an excellent platform for testing coding and algorithm skills.

Analysis assignments varied by the company and the job details. I was mostly given one week or so to return a report. Tesco had me choose a database of my preference and use it to answer a question of my choice that would be relevant for their company. I presented results during my interview at their assessment center. Both SCL Elections and Mustard Systems gave me simulated data and had me perform an analysis involving Machine Learning predictions. WorkDigital (provide web data services) had me solve a statistics question and code a hash table algorithm in a one day notice. Perhaps the most interesting assignment was by TicTrac which involved API calls and analysis of listening trends of LastFM users (a cloud music site).

As previously mentioned, these assignments and the interviews are time consuming and filled much of my time in the two months I was searching. Overall I found them a positive experience to better realize my strong points and which aspects need improvement. In particular I felt that I need to improve two fronts: algorithms, and my online profile. Both of these accounts are worthy of their own entries, so hopefully I will write about them in the future.

This also might be worth of its own entry, but I found it super useful for my job search, so I will say a few words here: Meetups! If you live in a big city there probably are in your city many interesting meetups that you can benefit from attending. These provide the perfect environment to mingle and discuss new ideas and meet potential employers. If you’re an introvert, you might consider going a bit out of your comfort zone, for the sake of learning new cool things, or at least get some free pizza and drinks. Most meetups have volunteer lectures with some mingling, and a few are solely for networking. By attending two to three meetups a week (mostly via I got a few job interviews (including the company SCL Elections with which I signed), learned about Hadoop technology (and for the fun of it 3D printing techniques), learned about the-Internet-Of-Things industry, the startup scene, various languages/technologies like Python, Neo4J, R, Julia, learned about visualization technologies, got useful tips to develop a personal project of mine, volunteered in a launch party of a social media startup and even won a raffle for free tickets to Strata+Hadoop Barcelona (a Big Data conference)! In London I recommend attending: PyData (for everything python), BigData London (for updates on Data systems in London), LondonR (for updates on the R language).

Most of the above might be relevant for data science jobs world wide. Below are a few insights that are more relevant for London. I haven’t looked into all of the UK, just London.

I guess that I shouldn’t have been surprised, but the expected salaries in London are slightly lower than those in the USA. The expected salary for a junior data scientist is 30-45k£ per year for most companies, whereas within banks or large companies, one can expect over 55k£ per year. The upside, compared to the USA, is a better work to life balance. This has been one of my main considerations, and have heard this also from several Americans that I have met here. For example everyone gets three weeks of days off in the UK, which is more than the norm across the pond.

In London a lot of job hiring is through head hunters. Data science is a very broad term and tends to confuse people, especially people who’s skills are outside the Data Scientist Venn Diagram. The is results in many clueless head hunters. (One anonymous quote is tht “80% of the recruiters of data scientists are clueless.”) There are a few, however, that actually listen and try to get a grasp of what it’s all about and do a decent consistent job, but the emphasis is that they appear to be few. I can recommend recruitments firms Hydrogen (ask for Matthew),  Forsythgroup (ask for Brett) and Campbell North (ask for Lewis) that were quite thorough at getting me decent job interviews. As for the clueless ones, I learned how to be patient and what to expect conversations to be like with non data/statistic people in working environments.

Besides headhunters I used online add sites like: LinkedIN, CWJobs (which has a nice feature where you can select adds by employers or recruiters).

If you find this useful, or have some experience of job searching to share, feel free to drop a line. Good luck job hunting — or better yet, avoiding too much time at it!



On the train for another meeting/interview.  We’re on a road to … somewhere? 


S2DS – the kickoff


(Our campus in Harrow, London with exceptionally great weather recently)

This month I am participating in the Science to Data Science program which aims to integrate highly skilled ex-academics into various industries that seek to solve challenging problems by using data analysis. In other words it is a summer school to help transition scientists to data science. This first class contains a highly diverse group of eighty five PhDs with backgrounds of research on topics ranging from the subatomic Higgs Boson, understanding how information is transmitted through our brains to those who investigate the origins and evolution of the Universe. For a statistic summary of the diversity see the S2DS official blog. Here you can read a recent article about the initiative in business-cloud.


(… ready … set … go!)

Yesterday’s session contained lectures and formalities. The first speaker from the main sponsor company, KPMG, John Hall, gave a few examples of  what his company is looking for in data scientists.  The aspect that I found most interesting was exploring methods to identify bankers that go rogue. Besides that he reminded us, probably without realizing it, that finance guys LOVE to talk in acronyms. FT, MI, FS, BI, to name a few. I personally do not recall hearing so many acronyms since my army service years.

He was followed by Hetan Shah the Executive Director of the Royal Statistical Society who introduced the organization. I haven’t looked too thoroughly into their various activities, but one initiative I like is a pro-bono group that aims to help journalists better appreciate statistics so they can be more critical in their articles about advances in science. In a post of theirs they explain the need to convert them from “That exciting!”-ists to “Is that exciting?”-ists.

The tutorial sessions started with Ole Moeller-Nilsson‘s introductory to Good Coding Practices. He did an excellent job at covering the basics that are useful for all levels of coders from the newbies to the more experienced that still have bad habits. The most important message that he conveyed, I think, is that one writes a code for an organization/company, not for themselves. In other words:

Code belongs to the team, not the individual.

Another mention-worthy quote is “A function should do only one thing!”. He also talked about the importance of testing code, refractoring (improving the code), code reviews and the concept of waterfall vs. agile coding.


(Helga and Daniel laughing at my agony of realizing that we might need to use a Windows computer in our project. Fortunately for my group, ConnectomeX, we can use our own laptops.)

In the last tutorial of the day, Chris Harris from Hortonworks gave  an introductory to Hadoop. We learned that  that while Hadoop 1.0 could be defined as Google’s well known MapReduce algorithm that distributes tasks over many machines and then collects the results, the current Hadoop 2.0 is an open source operating machine for analysis of large data sets. The lectures were a bit overwhelming with information, but it was good to get a sense of the complicated structure and relationships between the different functions and packages that appear like a chaotic zoo (containing elephants, Giraphs, PIGs — which eat anything –, as well as a bee HIVE).


(Wining and dining at 4 Hamilton Place by Green Park.)

The first day was concluded with drinks and dinner in at 4 Hamilton Place by Green Park. While wining and dining, most of us got to meet our project mentors, and we competed for team glory with Legos.

Overall it was a fun way to start program, and I’m looking forward for a fun data incentive month.

To follow the 2014 class of S2DS on Twitter see #S2DS14.

Also, Chris Wallace is writing a blog about his experiences in the program in his blog.


(Hellen from team ConnectomeX building our Lego model in the inter-group competition).

Preparation for a transition to data-science


My PhD advisor, Professor Michael Blanton, once gave me advice on how to choose which Postdoc offer to take: “Go where the data are …”. There now appears to be an interesting shift in the world of data.  Academia is where most people went to get interesting data for analysis, but now with the massive amounts of data collected online and from mobile devices, it appears that the industry is where a lot of the action is heading (e.g, health care, municipalities, Facebook and so on). This trend results in a masssive migration of brains from academia. Or rephrasing Bob Marley: Exodus! Movement of Da[ta] people.

Often I am asked by non-scientists: “But what does data-science have to do with your studies in astronomy and physics?” Well, as a PhD student and later a Postdoc doing analysis in cosmology, I jokingly described myself not as a true physicist that contemplates equations of physical processes  in-depth, but rather as a number cruncher, or a data analyst. Now that I have decided to transfer to the glorious (?) world of data-science I have been doing research on the topic in order to prepare myself for the transition. Here I describe the various initial steps and resources that I feel are helping me to prepare for non-academic interviews. This is by no means an exhaustive or generic list. I highly recommend reading Jessica Kirkpatrick’s excellent articles on the topic of transitioning to data-science. I also recommend reading the White Paper of Insight Data Science.

(1) Learning from others – There is a big migration of academics to data science for various professional and personal reasons (job security, flexibility in choice of location, good salaries, high demand, new challenges, to name a few).  Feel free to talk to colleagues in your department and worldwide about their preparations, experiences and opinions. Also, if you have some experience on the topic, it would be great for your academic colleagues to know about your transition, because they are probably wondering what it involves, and might be embarrassed to ask because of possible negative connotations some academic environments have on the matter.

(2) Drew Conway uses a Venn Diagram in this nice overview to describe all the aspects that a data-scientist should be knolegable of. One needs not be an expert in everything data related, but should be aware of what there is out there and then go ahead and learn what they find interesting/useful. E.g, I quickly realized that I need to be a better Bayesian (hence the name of this blog). For this reason I started brushing up on my  statistics and computational skills. The means of doing this was with Coursera courses and dabbling with Kaggle competitions, as described below. The photo above is my new notebook that is full of notes I’ve taken down from various statistics and machine learning courses. To keep record of new syntax from the various new languages I have been learning I find electronic notebooks like Evernote useful. I also find iPython Notebook useful for keeping notes regarding syntax of python and Julia.

(3) Convert your Academic CV to a professional resume. Like anything else you have written, there is an art to it, where now you have to be more concise, and focused on how a company/organization would benefite from hiring you. Your resume should be short – a one two sided page, as opposed to a long list of all your contributions in various conferences world wide. When you build your resume, look at others’ to get a feel for what potential employers are looking for. For example, you can find my first resume here.

(4) Programming Languages: A few languages that are highly sought after these day are Python, R and SQL. The great thing about these high level languages is that they are open source and have large communities that support them. I am very glad that when I started my Postdoc, I decided to ditch the astronomer’s curse of IDL (if you’re not an astronomer, don’t ask …) in favor of the much more fun python. R is very popular amongst statisticians, but I have recnetly been suggested python packages such as pandas, scikit and bokeh which enable you to do all statistics, machine learning and plotting all in-house-python. SQL is basic for data base manipulation and data extraction, so also important to put on your to-do list, as well, and eventually your resume. One more word of advice, if you are still stuck with editors like emacs and vi, you might want to take a look into more user friendly ones like TextWrangler and Sublime.

(5) MOOCs – Massive Open Online Courses. Even though we still live in an age where airline companies still treat economy flyers as cattle and country boarders still exist, I am glad to live in an era where one can learn from top Universities FOR FREE. Various online sites (Udacity, edX, Khan Academy) specialize in connecting between hungry minds and Professors who teach anything from Music and History to Astronomy and Neural Networks.  I personally have a Coursera account and have completed five courses in which I did all the homework assignments and weekly quizes. One of them even had a midterm and final exams, which I prepared for like an undergrad. This is the place to admit that when I had my first non-academic interview in 14 years, which happened to be with Google, I felt great that I understood my interviewer’s questions thanks to the various Machine Learning and Statistics courses I’ve taken. As Coursera claim, the level of the courses are not such that one can go out and do research in the field, but rather be able to immediately implement what they have learned. For example, the now classical Machine Learning course by Andrew Ng (a Coursera co-founder) provides not only the theory behind the state of art of the field, but also actual Matlab code that can be applied. I am currently auditing for free the Probabilistic Graphical Models course given by another of Coursera’s co-founders, Daphne Koller.

OK, so you just learned (or are preparing to learn) a whole lot of statistics, and possibly a programming language or two. In a job interview you will need to show experience. In your academic research you probably worked on long projects, but for the industry you’ll need to show that you can be effective in short term projects. That’s where the following two entries might be useful.

(6) Online competitions: Kaggle serve as a bridge between organizations with questions about data they provide and data scientists that are eager to try out new tricks. The competitions are timed from anywhere between a few weeks to over a year, with prizes that could be cash, kudos or even a job. It hosts competitions for companies such as high tech, car insruance agencies and airlines as well as academics such as the Large Hadron Collidor and Astronomers. It is newbie friendly with forums that are open for communicating methods. Kaggle also has tutorial competitions as well, e.g the Titanic competition in which you can practice techniques on actual data of the passengers to guess the survivors.

(7) Seminars, Workshops, Bootcamps, Retreats. Various recruitment companies are starting to ride the exodus wave to help place scientists (and some engineers, too) in companies that are looking to hire data scientists. The first that I am aware of is Insight Data Science, which have sessions in the San Francisco Bay Area, and in New York City. I failed in my first attempt to get accepted for the January 2014 class (I think that I botched up part of my interview because I did not prepare a working sample of my research), and got accepted to the NYC August 2014 session. I learned that the acceptance rate from the first to the second went from 1/15 to 1/20, and that they have a 100% job placement rate. I did, however, decline their generous offer (5,000$ fellowship) in order to participate in a similar program in London called Science to Data Science. Their formats are slightly different, and will be a focus of a future blog entry. Other workshops that I am aware of: The Data Incubator (NYC), ASI (London) and the Data Science Retreat (Berlin) Note that because of the high demand these are very competitive to get accepted to, hence one should put a lot of effort in the applications and preparation for the interviews, just like for any job.

Please take in mind, that there are probably many other avenues to assist a transition to data science. These a merely the steps that I looked into. Throughout this blog in August and September 2014 I will share my experiences in the Science to Data Science program, as well as for looking for a job in London.


A big part of manipulating  and understanding data is creating visuals. For the past few years I’ve been using the standard libraries of python (matplotlib.pyplot), but yesterday I was recommended to usa a language that embeds data nicely into HTML scripts called D3.

This is a JavaScript library that eases presenting and manipulating data on documents and claims to be consistent across all browsers (*most*, except some old versions of Explorer). Here you can see a nice demo.

Scott Murray from USF has an excellent tutorial to get one started. It is detailed with easy to follow step-by-step instructions.

This is just a short blurb to mention the existence of D3, for  those who not aware of it, not an expert’s reference.

Hopefully, with time, I’ll have more to show using D3.

July 24th 2014:

In the spirit of everything python, I came across a library that provides tools to make D3.js like plots, called Bokeh. See their gallery here for a few examples.


July 27th 2014:

I was just made aware of Mike Bostock’s public notebooks. Thanks John Whitmore!

Data and teenage sex

The objective of this blog is to share my insights and experiences in trying to be a better Bayesian. Yes, yes, Frequenist and Bayesian analyses results should agree, provided the data is significant enough, but … c’mon, our brains are empirical based, with no prior model to explain our surroundings, and incorrect assumptions are (mostly) corrected for. Hence my preference.

I recently finished an eight year stint as a cosmologist, which was a fun introduction to the Bayesian concept, and am now heading to the Wild West of Big Data analysis. More about the transition in upcoming posts. By Wild West I am referring to the fact that our world is becoming ever more digitalized, and everyone wants a piece of the action.  I heard a brilliant quote of how Big Data is like teenage sex:

Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it …

(If anyone knows the source to this analogy, please update me, I noticed it on Facebook.)

I will do my best here to describe insights into the world of the Big Data insight-seekers, through my experiences, discussions and random learnings. Being a music buff and amateur photographer, I will relate these to music I like and photographs that I take.  You might have already noticed the not-so-sophisticated connection between the name of this blog to Bruce Springsteen’s song Better Days from his 1992 album Lucky Town.

Now for some Bayesian fun. Which animal does this paw belong to? (No number crunching required, just your natural machine learning brain.)