Farewell, Twitter!

Dear friends,

The time has come for me to leave the Twitter flock.

If you are the sort of person who finds such things amazing, it is quite amazing that — in all of space and in all of time — a path of my life’s journey crossed with yours. And that too while together building a company that has surely put a little dent in the universe. I have a great sense of pride and gratitude to have been a small part of this. I hope and wish for you that a corner of your heart feels that way too. And I have full faith that you all are going to make this dent bigger.

As I write this, so many memories flash across my mind almost making me relive the feelings and emotions I have experienced these last five years. Of those that will endure though will be about all of you. Perhaps that is what it is all about when all is said and done. To work with people that you come to respect and to give your best so that hopefully they come to respect you too.

What am I going to do next? I am looking forward to the next phase. I don’t need a break, but I am going to take one. It is not going to be short either. It will involve reading and traveling and thinking. Definitely some recreational coding and perhaps a bit of writing too.

I sincerely wish you the very best in Twitter and in life. I hope that our paths will cross again sometime in the future.

See you

Pankaj

A critique of the NYTimes article about Google’s big data study on interviewing

This New York Times article has been doing the rounds on Twitter recently. Titled “In Head-Hunting, Big Data May Not Be Such a Big Deal”, it is a condensed and edited interview of Laszlo Brock, an SVP of people operations at Google. It is apparent to me that the title as well as the apparent nuggets of wisdom present in this article form just the sort of potent combination that invites endless tweets and retweets from the San Francisco/Silicon Valley echo chamber. Well done, Adam Bryant.

However, I find a few too many issues with this article that I am compelled to note. An article mentioning a big data study at a company as venerated for big data as Google naturally leads to wide-ranging conclusions. As I explain below, that would be wrong.

First of all, I read the article three times, but could not find anything emphatic in the body of the article that warrants the rather strong claim in the title that “Big Data may not be such a big deal”. On the contrary, Brock clearly says that just giving managers and leaders a visible set of measurements was enough for them to strive to improve. Yes, there is mention of the (organizational) context being important and the need for human insight, but even the most ardent fans of Big Data have never been heard to advocate jettisoning those. In another place, the article says “I think this will be a constraint to how big the data can get”. I can not help but wonder that the title might have been bolted on by some “social media expert” at NYTimes.

Then there are other questions the article and the correlation study mentioned raises.

Brock mentions that “the proportion of people without any college education at Google has increased over time as well. So we have teams where you have 14 percent of the team made up of people who’ve never gone to college.” Elsewhere, he says that the average team size is 6. Putting the two together, it means that 14% out of 6 i.e., less than 1 of the average team is made up of people who have not gone to college. In other words, Brock is very likely talking about bigger teams. It is not mentioned whether those teams are all engineering/product or contain other departments including some like support which apparently form a large part of Google? Or how many such teams are there? It is a bizarre piece of data mentioned by the article without saying how many such teams might be there.

The article cites a study at Google to detect correlation between interview types and success at Google after 2-3 years. It concludes that things like GPA, test scores, puzzles, brain teasers have no correlation and the only thing that correlates well is “structured behavioral interviewing”. At the very least, this deserves more explanation. Was the study done over all of Google, or only engineering or some other parts of the organization? Does the lack of correlation hold if you restrict to either top 20% or bottom 20%? Should everyone basically just do “structured behavioral interviews”? Now, I am myself a fan of behavioral interviewing (btw, here is a link if you want to read some examples of behavioral interview questions) and include a few such questions in my interviews of candidates at Twitter, but I can’t imagine the whole interview panel of an engineering or product position being of this sort.

Turning to the article’s juicy assertion that GPAs are not predictive of future success at Google, consider this from the article: “Google famously used to ask everyone for a transcript and G.P.A.’s and test scores, but we don’t anymore, unless you’re just a few years out of school.” So, Google still does ask those if you are a “just a few years out of school”? And if a candidate is experienced, I don’t know of anyone in Silicon Valley at least who would go with transcripts and GPAs, so the fact that Google also doesn’t is odd to mention.

My biggest gripe with the article is that since GPAs and puzzles are apparently not predictive in their study, what it leaves us at best is that we are back to square one. We are not told what is predictive so what are we to rely on? Lot of people would jump to the conclusion to not include those in an interview at all. But doing that would be wrong simply because who is to say whatever they would be replaced with will be more predictive…

To be sure, I am myself not a fan of going by GPAs and performance on brainteasers in interviews. I am not even casting doubts on that study at Google and in fact the results of the study look quite plausible to me, simply because in Data Analysis, not finding predictive variables is rather common. I am just distressed at the suggestive tone of the article. It invites wide-ranging prescriptions for how to and how not to interview. New York Times: I am used to better from you.

——

PS: BTW, in my own constant search for the answer to the question “what sort of interview should I do”, here is a recent book by Nolan Bushnell (the founder of Atari and supposed father of the video-game industry, and apparently someone who launched the career of Steve Jobs). Though it seems a bit as if Bushnell might be trying to ride the Steve Jobs’ popularity wave, it still presents many out-of-the-box ideas on ways to go about interviewing, especially if you are looking for creative and exceptional engineers (as I am at Twitter).

A note to engineering managers/management everywhere on the value of opening up the work of your teams

Thanks to conferences and candidate interviews for recruiting, I find myself talking to a fair number of folks from other tech companies who are either engineers or in engineering management. Last week, a candidate was talking about how they did a particular project at his company (a mature public tech company). It was then that it suddenly occurred to me that I had heard similar stories many times before.

I now realize that there is a rather pervasive project pattern out there – that of coming up with what can only be called “yet another solution” to a commonly occurring problem, but never talking openly about the chosen path. Some examples of what I would call “commonly occurring problems” are: a spell-checker, typeahead on a search engine, reverse-proxy, load balancer, a recommendation engine, a method for generation of unique ids in a sharded store, etc. I have no issues at all with coming up with one’s own proprietary, home-baked solutions – it is indeed the case that company situations, availability of technologies and product needs are subtly different enough to often require a diversity of solutions.

At first, just the presence of endless varieties of home-baked solutions to the same problem used to baffle me. I now admit that this might be just because it stands in contrast to our software stack at Twitter, which, by virtue of our deep belief in the value of open-source, is heavily built on top of both external and internal open-source technologies.

Developing one’s own solutions is of course fine, but I have found more and more the pattern that whenever someone describes their proprietary technology or software no part of which has ever been made open in any form, it very quickly begins to sound like an unprincipled mish-mash of things hacked together because of company and project-specific circumstances. Worse, those circumstances often turn out to be such that the participant later finds it really difficult to articulate. So, what we end up with is that the work done was an unfortunate waste of time in the global sense of the advancement of the industry. Neither the design choices made, nor the lessons learned, nor any innovations made (small or big) are now available to be built further on. Worse for the company, lack of an unbiased source of review means that the work was likely sub-par.

I understand that there is need for “secret sauce” and proprietary algorithms and projects that can not be talked openly about. But, it is almost never the case in my experience that no part of the project is worth open-sourcing, or talking about in a workshop or publishing a blog post or paper about it. To give just a couple of examples, Twitter’s Revenue team developed and open-sourced Scalding, and the Search/Relevance team has openly talked about projects like earlybird and cassovary. All of these are key technologies from bigger, proprietary projects. Likewise, Facebook has open-sourced and/or published many of their key technologies (e.g., the “Learning Relevance” SIGIR paper that describes a key mechanism it uses in ad-targeting, a variety of interesting stats about its friendship graph, and more). Same holds with LinkedIn and Google over the years.

Therefore, I am convinced that if a project has not publicly opened any part of their work, the quality of work is very likely inferior. This is less damning a verdict on the engineers themselves than on their management chain. More often that not, it is the engineering management that is prone to think that doing the work needed (polishing, documenting, referencing previous work, explaining design choices etc.) is  a waste of engineering time and without benefit to the company. This is plainly wrong. Openly talking about your work puts it under scrutiny and forces you to step up your quality and strive for being the best globally. Otherwise, it is too easy for a team to believe that it is under some unique circumstances and constraints such that its solution needed to be uniquely the way it ended up being implemented by the team. Promising yourself to make your choices and solutions open are a sure shot way to guard against this trap.

Personally, I emphasize to the engineers in my team that they absolutely need to open up their work – whether via open-sourcing, publishing a paper, or simply giving a company-wide tech-talk if their work is too sensitive to be published outside the company. It will be great if engineering management everywhere makes this a priority and an integral part of their own responsibility. I am convinced that doing so will help the individual, the company as well as the relevant industry as a whole.

Pandodaily is wrong about the reasons for why indians are supposedly not using twitter too much.

I just came across this blog post by Sarah Lacy (@sarahcuda) on PandoDaily titled “Why aren’t more Indians using Twitter?”

I can’t comment on the stats she reports from Semiocast as I happen to be an employee of Twitter and Twitter does not publish country-wise numbers. But those stats are beside the point. Whatever they are, Sarah’s reasons have got to be incorrect because of the data presented in Facebook’s S-1 filing. To quote the relevant paragraph from Facebook’s S-1 [emphasis mine]:

As of December 31, 2011, we had 845 million MAUs, an increase of 39% from December 31, 2010. We experienced growth across different geographies, with users in Brazil and India representing a key source of growth. We had 161 million MAUs in the United States as of December 31, 2011, an increase of 16% from the prior year. We had 37 million MAUs in Brazil as of December 31, 2011, an increase of 268% from the prior year. Additionally, we had 46 million MAUs in India as of December 31, 2011, an increase of 132% from the prior year.

In other words, Facebook seems to be doing great in India. In fact, by some estimates, India is number 2 country for Facebook after the US.

All the four reasons mentioned by Sarah in her blog post (small online population, dysfunctional democracy, English not really being that common, not so rich middle class)
are not specific to Twitter and if they were right, Facebook’s stats in India would not have been what they are. As simple as that.

Understanding why (and when) any network gets popular in a certain region is a very difficult question. Social scientists (and more recently CS researchers) have grappled with it from many angles, and without much success. It would have been more interesting if @sarahcuda had blogged her thoughts about those nuances instead of using such a broad brush. (To be fair, she does express her surprise at the stats even given her “reasons”, but the thrust of the blog post is about coming up with generic reasons to explain the reported stats.)

Groupthink (and Centralized Bathrooms with Knowledge Spillovers)

Two recent articles have appeared in respected publications on Groupthink: first is an NYTimes op-ed piece by Susan Cain on January 15th, and second is an article by Jonah Lehrer in the January 30th issue of New Yorker. (see here for a scanned PDF of the full article if you are not a subscriber.) Both authors have upcoming books on group creativity[4,5]. I was intrigued enough to do further readings based on the works cited in these articles. This blog post summarizes and links to those works for anyone interested in digging deeper.

[As an aside, after reading the underlying papers and studies cited in these articles, I was actually disappointed with the liberal oversimplifications that appeared to have been made by these pop writers, routinely papering over the disclaimers and conditions mentioned in the original papers. But I will leave that for another blog post.]

Jonah Lehrer in the New Yorker

The traditional brainstorming mantra prescribes forgoing judgment and criticism during a brainstorming session so as to maximise the quantity of free-flowing ideas and to not get any distractions from judgment. The word “brainstorm” and this mantra was put forth by Alex Osborn in his 1948 book “Your Creative Power.” However, it never stood up to empirical observations! Keith Sawyer, a psychologist at Washington University, is cited by Lehrer as summarizing the findings thus: “Decades of research have consistently shown that brainstorming groups think of far fewer ideas than the same number of people who work alone and later pool their ideas.” Ouch!

Lehrer does not advocate solitary work though. The fact is that it takes teams to produce substantive works. He refers to a study by Prof. Ben Jones which observed that the impact (quantified by the number of citations) of multi-author papers and patents (and especially across Tier-1 universities) are much higher than those of single author ones.

How should then groups maximize creativity?

To develop the answer, Lehrer cites the studies of psychology Prof. Charlan Nemeth at Berkeley and sociologist Brian Uzzi at Northwestern.

Nemeth observes that the groups doing traditional brainstorming consistently underperformed those groups that were given the instruction to engage in debate during brainstorming. Lehrer quotes her: “Maybe debate is going to be less pleasant, but it will always be more productive.” Also “Authentic dissent can be difficult, but it’s always invigorating. It wakes us right up.” I personally observed the waking up part recently after receiving some unexpected but deserved criticism from a respected colleague.

Uzzi has studied for a long time the ideal composition of a team with respect to “Q” – a quantification by him of a team’s existing density of connections and mutual familiarity. Studying the composition of Broadway musical teams, he found that having worked together correlates positively with the success of the musical. But he also found rather surprisingly that a team being too familiar with each other was bad for its work! In other words, the best shows were produced by networks with an intermediate level of social intimacy. Lehrer says: “A team at the bliss point i.e., with the ideal level of Q between 2.4 and 2.6 was 3x more likely to be successful than one with Q lower than 1.4 or higher than 3.2.”

Centralized Bathrooms and Knowledge Spillovers

Lehrer then turns his attention to organization of physical space to maximize productivity. He cites a study by Isaac Kohane from Harvard Medical school on physical distance between collaborating biomedical researchers with in Harvard. Kohane observed that the most cited papers were consistently by collaborators working within ten metres of each other, while the least cited ones tended to emerge from collaborators who were a kilometre or more apart. [This study was not done on a global framework, so should not mislead us to think that across-university collaborations are less impactful. On the contrary, this study by Ben Jones indicates that across-university collaborations have been consistently on the rise and much more impactful than solo. Disappointingly, Lehrer omits the disclaimers in Kohane's original paper as well as the across-university collaborative research.]

So there are two needs that need to be reconciled: (1) collaborators need to work physically closely together, and (2) collaborators need to get diverse perspectives and inject in themselves a moderate amount of novel perspectives. Lehrer talks about Steve Jobs’s deliberate attempts to create such an environment in Pixar that encouraged chance interactions of people from diverse teams with each other, causing various opportunities for “Knowledge Spillovers.” Finally, Lehrer talks about “Building 20″ at MIT that was referred to as “the magical incubator”. Apparently, this building was under-designed and happened to be so “poorly structured” that scientists were forced to mingle. For example, the wings were oddly ordered, a large horizontal layout encouraged more chance interactions and individual scientists were free to reorganize their space as they needed. The lesson, Lehrer says, is that “the most creative spaces are those that hurl us together. It is the human friction that makes the sparks.”

Susan Cain in the New York Times

Susain Cain writes about “The Rise of the New Groupthink” in the January 13th op-ed. She first cites studies by psychologists Mihaly Csikszentmihalyi and Gregory Feist (and quotes many historical famous personalities) according to whom the most spectacularly creative people in many fields are often introverted, and that people produce more and better quality work when alone than in groups. Note that this is in direct contrast to what Keith Sawyer has said that innovation comes from collaboration in his book “Group Genius.” Interestingly, Lehrer cited Sawyer while debunking the traditional theory of brainstorming.

About office plans, she writes: “Studies show that open-plan offices make workers hostile, insecure and distracted. They’re also more likely to suffer from high blood pressure, stress, the flu and exhaustion. And people whose work is interrupted make 50 percent more mistakes and take twice as long to finish it.”

Similar to Lehrer, Cain writes that brainstorming doesn’t work, partly because people instinctively or unconsciously mimic each other. She even cites an Emory University neuroscientist Gregory Berns who found that when we take a stance different from the group’s, we activate the amygdala, a small organ in the brain associated with the fear of rejection.

However, according to Cain, electronic brainstorming works well – perhaps “because the screen protects us from too much groupthink.” This is surprising and counterintuitive to me.

Cain clarifies that she is not suggesting that teams should be abolished and everybody should be a loner, but that office spaces need to be designed such that people can have “casual, cafe like discussions but also be able to easily disappear into personal and private spaces”.

In my humble opinion, Cain might be mixing up individual and team’s productivity and creativity. Indeed, individuals are most productive when working in solitude, more so when we are talking about exceptional geniuses. However, many studies have shown by now that collaborative groups can do far more and better work than individuals working together in isolation. An organization is after all not the sum of each individual’s achievements, it is the sum of each team’s achievements. I guess Cain’s book [4] throws more light into her argument.

Further Reading

[1] Group creativity: music, theater and collaboration. R. Keith Sawyer (2003).

[2] Group Creativity: Innovation through Collaboration.Paul B. Paulus and Bernard A. Nijstad (2003).

[3] Managing innovation: when less is more. Charlan Nemeth (1997).
Nemeth cites previous studies about “visionary” companies that state that such companies develop a cult-like sense of belonging and similarity in thinking that grows more similar over time. While this promotes cohesion and shared goals, the environment needed for creativity is directly the opposite (unless the creativity comes directly from the CEO). Nemeth suggests organizations should not just be tolerant, but especially welcoming of minority views. Interestingly, minority views create value independent of whether they are right or wrong, perhaps simply by breaking conformity and discouraging complacency.

[4] Quiet: The Power of Introverts in a World That Can’t Stop Talking. Susan Cain (2012).

[5] Imagine: How Creativity Works. Jonah Lehrer (coming up in March 2012).

Uses of Twitter as an alert system

I was recently asked by an acquaintance if there is a public compendium of the use of Twitter as a warning or alert system. I asked around within Twitter and several colleagues gave many useful pointers and tidbits. Since I could not find any public compendium, I thought I will create one from these responses. Please note that there is nothing official about this compendium, and it is most likely incomplete right now. Also, it is likely to get stale quickly though I will appreciate comments and help from anyone reading to keep this as inclusive as possible.

One important thing to note at the outset is that Twitter should not be solely relied upon as an emergency alert system. Twitter is still a new platform, and while it aims for high reliability, it is of course not (yet) 100% reliable. Hence, it should at best be used in addition to other warning systems.

Some illustrative Twitter accounts

Here are some examples of Twitter accounts used to provide alerts or warnings.

Other applications of Twitter

Other studies of use of Twitter and social media in crises

  • A report by Red Cross in August 2010. This was produced after a summit held by Red Cross, called “Emergency Social Data Summit”.
  • Analysing tweets was suggested to have been a quicker way of detecting and tracking the deadly cholera outbreak in Haiti than traditional methods, according to a study reported here.
  • Computer science researchers have systematically analyzed the problem of event detection using tweets as sensors. For instance, check out this publication titled “Earthquake shakes Twitter users: real-time event detection by social sensors.”

Work by Twitter itself in the area

Twitter, the company, has an “Ads for Good” program that gives away a quarter of a million dollars every year in pro-bono ads. 10K per month of these are given as both pre-emptive and post emergency critical tweets.

In 2010, Twitter partnered with one of Haiti’s leading wireless carriers, Voila, to allow users to get SMS upates from the @kwawouj twitter account.

Weber-Fechner and power/lognormal laws

The Weber-Fechner relations (from the 19th century) state that the neural representations in the brain of sensory stimuli, objects and time perception vary logarithmically with the intensity of the stimuli, the number of objects and the length of the time interval respectively.

Said mathematically, dP (change in perception) is proportional to dS/S (change in stimulus divided by stimulus magnitude) leading to P ~ O(ln S).

Some examples to illustrate what this means:

  • The higher the image resolution, the higher the delta further needed to make difference perceptible at all
  • The perceived costs/benefits of information production is proportional to the log of the amount of information that already exists (hence why we feel much more excited to work in green areas rather than beaten-up ones).

It so also happens that maximization of information content (entropy) under the above condition generates a power law distribution of the frequency and size of information when the information is dependent on one dimension (such as images that depend on resolution), or a log-normal distribution when the information is dependent on two dimensions (e.g., video/audio that depend on both resolution and time). This is explained in this paper.

Interestingly, if it were just the economic costs involved (and not the neurophysiological), then the distributions would be expected to be exponential, not power or lognormal.

Random walks in symmetric social networks

Most symmetric social networks (such as Facebook, LinkedIn, IM) can be either viewed as directed graphs with bidirectional edges, or more simply as undirected graphs where two nodes connected by an undirected edge. A simple fact about random walks on such networks is the following:

Fact: A random walk on a finite, connected undirected graph with M edges has a stationary distribution π defined by \pi_i = deg(i)/2M. This distribution is unique iff the graph is also aperiodic. An analogous result holds when the graph is weighted if we define deg(i) = \sum_{j \leftrightarrow i} w_{ij}, W = \sum_i deg(i) \Rightarrow \pi_i = deg(i)/W.

This is interesting because instead of computing a very expensive random walk “simulation” (or doing matrix computations) to calculate the stationary distribution, we have a simple, closed formula. For example, the nodes can be “ranked” simply by their degree!

Background

A ‘random walk’ is a useful construct in graph algorithms. It is typically defined on a directed graph where the walk moves from a state i to a state j with probability pij. One such move or transition is sometimes called a step of the walk. Such a random walk is also known as a (homogeneous) Markov chain, which is a chain of random variables {Xj} such that Prob(walk is in node i after step j) = Prob(X_j = i) = \sum_{k}Prob(X_{j-1}=k).p_{ki}

Random walks have many applications in real-life networks. For example, the pagerank algorithm for scoring influence on the web and certain graph based social recommendation algorithms are essentially random walk algorithms.

A good background on random walks and Markov chains is in [1].

Proof

The proof is very simple. A distribution π is stationary if \pi P = \pi where P = (P_{ij}) is the stochastic matrix defining the transition probability of going from state i to state j. Now, with a uniform random walk, each edge out of i can be taken with the same probability. i.e., p_{ij} = 1/deg(i). So,
\pi_i = deg(i)/2M\\  \Rightarrow \pi_j = \sum_{k \leftrightarrow j} \pi_k.p_{kj} = \sum_k deg(k)/2M . 1/deg(k) = deg(j)/2M.

A finite, connected graph is irreducible (every state can be reached from any other state). If the graph is aperiodic as well, the stationary vector is unique, and it is the normalized, eigenvector corresponding to the largest eigenvalue ( = 1 ) of the stochastic matrix P [1].

Real-life networks are certainly finite. They may not be connected or aperiodic, in which case care must be taken to artificially convert the graphs to be so. An example of an undirected graph with period 2 is simply the following graph.

Note that this can be made aperiodic by adding a self-edge from state 0 (or state 1) to itself.

Weighted Graphs

It turns out that the above mentioned fact remains true even when the undirected graph is weighted, i.e., each edge ij has a weight wij. The transition matrix is then defined by p_{ij} = w_{ij}/W_i where W_i = \sum_{j \leftrightarrow i}w_{ij}. Also, let W = \sum_iW_i. The stationary vector is then defined by \pi_i = W_i/W. The proof is very similar to the above.

References

[1] Class notes from Prof. Bob Gallager from MIT