this didn't stop baseball teams from spending a century overindexing on visually obvious criteria such as height and race.
I was reminded of this the other day when, the other day, I saw a thread on Twitter (which I don't want to link to because I don't want to make this about the person or the thread in particular — the opinion is one I frequently see from people who are successful). In the thread, a very successful person talks about how they got started, saying that they were able to talk their way into an elite institution despite being unqualified and use this story to conclude that elite gatekeepers are basically just scouting for talent and that you just need to show people that you have talent.
I've heard this kind of story from other successful people, who tend to come to bimodal conclusions on what it all means. Some conclude that the world correctly recognized their talent and that this is how the world works; talent gets recognized and rewarded. Others conclude that the world is fairly random with respect to talent being rewarded and that they got lucky to get rewarded for their talent when many other people with similar talents who used similar strategies were passed over2.
Another time I was reminded of old baseball scouting reports was when I heard about how a friend of mine who's now an engineering professor at a top Canadian university got there. Let's call her Jane. When Jane was an undergrad at the university she's now a professor at, she was sometimes helpfully asked "are you lost?" when she was on campus. Sometimes this was because, as a woman, she didn't look like was in the right place when she was in an engineering building. Other times, it was because she looked like and talked like someone from rural Canada. Once, a security guard thought she was a homeless person who had wandered onto campus. After a few years, she picked up the right clothes and mannerisms to pass as "the right kind of person", with help from her college friends, who explained to her how one is supposed to talk and dress, but when she was younger, people's first impression was that she was an admin assistant, and now their first impression is that she's a professor's wife because they don't expect a woman to be a professor in her department. She's been fairly successful, but it's taken a lot more work than it would've for someone who looked the part.
Another thing that reminded me of how funny baseball scouting reports are is a conversation I had with Ben Kuhn a while back.
Me: it's weird how tall so many of the men at my level (senior staff engineer) are at big tech companies. In recent memory, I think I've only been in a meeting with one man who's shorter than me at that level or above. I'm only 1" shorter than U.S. average! And the guy who's shorter than me has worked remotely for at least a decade, so I don't know if people really register his height. And people seem to be even taller on the management track. If I look at the VPs I've been in meetings with, they must all be at least 6' tall.
Ben: Maybe I could be a VP at a big tech company. I'm 6' tall!
Me: Oh, I guess I didn't know how tall 6' tall is. The VPs I'm in meetings with are noticeably taller than you. They're probably at least 6'2"?
Ben: Wow, that's really tall for a minimum. 6'2" is 96%-ile for U.S. adult male
When I've discussed this with successful people who work in big companies of various sorts (tech companies, consulting companies, etc.), men who would be considered tall by normal standards, 6' or 6'1", tell me that they're frequently the shortest man in the room during important meetings. 6'1" is just below the median height of a baseball player. There's something a bit odd about height seeming more correlated to success as a consultant or a programmer than in baseball, where height directly conveys an advantage.One possible explanation would be due to a halo effect, where positive associations about tall or authoritative seeming people contribute to their success.
When I've seen this discussed online, someone will point out that this is because height and cognitive performance are correlated. But if we look at the literature on IQ, the correlation isn't strong enough to explain something like this. We can also observe this if we look at fields where people's mental acuity is directly tested by something other than an IQ test, such as in chess, where most top players are around average height, with some outliers in both directions. Even without looking at the data in detail, this should be expected because correlation between height and IQ is weak, with much the correlation due to the relationship at the low end3, and the correlation between IQ and performance in various mental tasks is also weak (some people will say that it's strong by social science standards, but that's very weak in terms of actual explanatory power even when looking at the population level and it's even weaker at the individual level). And then if we look at chess in particular, we can see that the correlation is weak, as expected.
Since the correlation is weak, and there are many more people around average height than not, we should expect that most top chess players are around average height. If we look at the most dominant chess players in recent history, Carlsen, Anand, and Kasparov, they're 5'8", 5'8", and 5'9", respectively (if you look at different sources, they'll claim heights of plus or minus a couple inches, but still with a pretty normal range; people often exaggerate heights; if you look at people who try to do real comparisons either via photos or in person, measured heights are often lower than what people claim their own height is4).
It's a bit more difficult to find heights of go and shogi players, but it seems like the absolute top modern players from this list I could find heights for (Lee Sedol, Yoshiharu Habu) are roughly in the normal range, with there being some outliers in both directions among elite players who aren't among the best of all time, as with chess.
If it were the case that height or other factors in appearance were very strongly correlated with mental performance, we would expect to see a much stronger correlation between height and performance in activities that relatively directly measure mental performance, like chess, than we do between height and career success, but it's the other way around, which seems to indicate that the halo effect from height is stronger than any underlying benefits that are correlated with height.
If we look at activities where there's a fair amount of gatekeeping before people are allowed to really show their skills but where performance can be measured fairly accurately and where hiring better employees has an immediate, measurable, direct impact on company performance, such as baseball and hockey, we can see that people went with their gut instinct over data for decades after there were public discussions about how data-driven approaches found large holes in people's intuition.
If we then look at programming, where it's somewhere between extremely difficult and impossible to accurately measure individual performance and the impact of individual performance on company success is much less direct than in sports, what should our estimate of how accurate talent assessment be?
The pessimistic view is that it seems implausible that we should expect that talent assessment is better than in sports, where it took decades of there being fairly accurate and rigorous public write-ups of performance assessments for companies to take talent assessment seriously. With programming, talent assessment isn't even far enough along that anyone can write up accurate evaluations of people across the industry, so we haven't even started the decades long process of companies fighting to keep evaluating people based on personal opinions instead of accurate measurements.
Jobs have something equivalent to old school baseball scouting reports at multiple levels. At the hiring stage, there are multiple levels of filters that encode people's biases. A classic study on this is Bertrand and Sendhil Mullainathan's paper, which found that "white sounding" names on resumes got more callbacks for interviews than "black sounding" names and that having a "white sounding" name on the resume increased the returns to having better credentials on the resume. Since then, many variants of this study have been done, e.g., resumes with white sounding names do better than resumes with Asian sounding names, professors with white sounding names are evaluated on CVs are evaluated as having better interpersonal skills than professors with black and Asian sounding names on CVs, etc.
The literature on promotions and leveling is much weaker, but I and other folks who are in highly selected environments that effectively require multiple rounds of screening, each against more and more highly selected folks, such as VPs, senior (as in "senior staff"+) ICs, professors at elite universities, etc., have observed that filtering on height is as severe or more severe than in baseball but less severe than in basketball.
That's curious when, in mental endeavors where the "promotion" criteria are directly selected by performance, such as in chess, height appears to only be very weakly correlated to success. A major issue in the literature on this is that, in general, social scientists look at averages. In a lot of the studies, they simply produce a correlation coefficient. If you're lucky, they may produce a graph where, for each height, they produce an average of something or other. That's the simplest thing to do but this only provides a very coarse understanding of what's going on.
Because I like knowing how things tick, including organizations and people's opinions, I've (informally, verbally) polled a lot of engineers about what they thought about other engineers. What I found was that there was a lot of clustering of opinions, resulting in clusters of folks that had rough agreement about who did excellent work. Within each cluster, people would often disagree about the ranking of engineers, but they would generally agree on who was "good to excellent".
One cluster was (in my opinion; this could, of course, also just be my own biases) people who were looking at the output people produced and were judging people based on that. Another cluster was of people who were looking at some combination of height and confidence and were judging people based on that. This one was a mystery to me for a long time (I've been asking people questions like this and collating the data out of habit, long before I had the idea to write this post and, until I recognized the pattern, I found it odd that so many people who have good technical judgment, as evidenced by their ability to do good work and make comments showing good technical judgment, highly evaluated so many people who so frequently said blatantly incorrect things and produced poorly working or even non-working systems). Another cluster was around credentials, such as what school someone went to or what the person was leveled at or what prestigious companies they'd worked for. People could have judgment from multiple clusters, e.g., some folks would praise both people who did excellent technical work as well as people who are tall and confident. At higher levels, where it becomes more difficult to judge people's work, relatively fewer people based their judgment on people's output.
When I did this evaluation collating exercise at the startup I worked at, there was basically only one cluster and it was based on people's output, with fairly broad consensus about who the top engineers were, but I haven't seen that at any of the large companies I've worked for. I'm not going to say that means evaluation at that startup was fair (perhaps all of us were falling prey to the same biases), but at least we weren't falling prey to the most obvious biases.
Back to big companies, if we look at what it would take to reform the promotion system, it seems difficult to do when biased because many individual engineers are biased. Some companies have committees handle promotions in order to reduce bias, but the major inputs to the system still have strong biases. The committee uses, as input, recommendations from people, many of whom let those biases have more weight than their technical judgment. Even if we, hypothetically, introduced a system that identified whose judgments were highly correlated with factors that aren't directly relevant to performance and gave those recommendations no weight, people's opinions often limit the work that someone can do. A complaint I've heard from some folks who are junior is that they can't get promoted because their work doesn't fulfill promo criteria. When they ask to be allowed to do work that could get them promoted, they're told they're too junior to do that kind of work. They're generally stuck at their level until they find a manager who believes in their potential enough to give them work that could possibly result in a promo if they did a good job. Another factor that interacts with this is that it's easier to transfer to a team where high-impact work is available if you're doing well and/or having high "promo velocity", i.e., are getting promoted frequently and harder if you're doing poorly or even just have low promo velocity and aren't doing particularly poorly. At higher levels, it's uncommon to not be able to do high-impact work, but it's also very difficult to separate out the impact of individual performance and biases because a lot of performance is about who you can influence, which is going to involve trying to influence people who are biased if you need to do it at scale, which you generally do to get promoted at higher levels. The nested, multi-level, impact of bias makes it difficult to change the system in a way that would remove the impact of bias.
Although it's easy to be pessimistic when looking at the system as a whole, it's also easy to be optimistic when looking at what one can do as an individual. It's pretty easy to do what Bill Wight (the scout known for recommending "funny looking" baseball players) did and ignore what other people incorrectly think is important5. I worked for a company that did this which had, by far, the best engineering team of any company I've ever worked for. They did this by ignoring the criteria other companies cared about, e.g., hiring people from non-elite schools instead of focusing on pedigree, not ruling people out for not having practiced solving abstract problems on a whiteboard that people don't solve in practice at work, not having cultural fit criteria that weren't related to job performance (they did care that people were self-directed and would function effectively when given a high degree of independence), etc.6
Thanks to Reforge - Engineering Programs and Flatirons Development for helping to make this post possible by sponsoring me at the Major Sponsor tier.
Also, thanks to Peter Bhat Harkins, Yossi Kreinin, Pam Wolf, Laurie Tratt, Leah Hanson, Kate Meyer, Heath Borders, Leo T M, Valentin Hartmann, Sam El-Borai, Vaibhav Sagar, Nat Welch, Michael Malis, Ori Berstein, and Malte Skarupke for comments/corrections/discussion.
This post used height as a running example because it's both something that's easy to observe is correlated to success in men which has been studied across a number of fields. I would guess that social class markers / mannerisms, as in the Jane example from this post, have at least as much impact. For example, a number of people have pointed out to me that the tall, successful, people they're surrounded by say things with very high confidence (often incorrect things, but said confidently) and also have mannerisms that convey confidence and authority.
Other physical factors also seem to have a large impact. There's a fairly large literature on how much the halo effect causes people who are generally attractive to be rated more highly on a variety of dimensions, e.g., morality. There's a famous ask metafilter (reddit before there was reddit) answer to a quesiton that's something like "how can you tell someone is bad?" and the most favorited answer (I hope for ironic reasons, although the answerer seemed genuine) is that they have bad teeth. Of course, in the U.S., having bad teeth is a marker of childhood financial poverty, not impoverished moral character. And, of course, gender is another dimension that people appear to filter on for reasons unrelated to talent or competence.
Another is just random luck. To go back to the baseball example, one of the few negative scouting reports on Chipper Jones came from a scout who said
Was not aggressive w/bat. Did not drive ball from either side. Displayed non-chalant attitude at all times. He was a disappointment to me. In the 8 games he managed to collect only 1 hit and hit very few balls well. Showed slap-type swing from L.side . . . 2 av. tools
Another scout, who saw him on more typical days, correctly noted
Definite ML prospect . . . ML tools or better in all areas . . . due to outstanding instincts, ability, and knowledge of game. Superstar potential.
Another similarly noted:
This boy has all the tools. Has good power and good basic approach at the plate with bat speed. Excellent make up and work-habits. Best prospect in Florida in the past 7 years I have been scouting . . . This boy must be considered for our [1st round draft] pick. Does everything well and with ease.
There's a lot of variance in performance. If you judge performance by watching someone for a short period of time, you're going to get wildly different judgements depending on when you watch them.
If you read the blind orchestra audition study that everybody cites, the study itself seems poor quality and unconvincing, but it also seems true that blind auditions were concomitant with an increase in orchestras hiring people who didn't look like what people expected musicians to look like. Blind auditions, where possible, seem like something good to try.
As noted previously, a professor remarked that doing hiring over zoom accidentally made height much less noticeable than normal and resulted in at least one university department hiring a number of professors who are markedly less tall than professors who were previously hired.
Me on how tech interviews don't even act as an effective filter for the main thing they nominally filter for.
Me on how prestige-focused tech hiring is.
@ArtiKel on Cowen and Gross's book on talent and on funding people over projects. A question I've had for a long time is whether the less-mainstream programs that convey prestige via some kind of talent selection process (Thiel Fellowship, grants from folks like Tyler Cowen, Patrick Collison, Scott Alexander, etc.) are less biased than traditional selection processes or just differently biased. The book doesn't appear to really answer this question, but it's food for thought. And BTW, I view these alternative processes as highly value even if they're not better and, actually, even if they're somewhat worse, because their existence gives the world a wider portfolio of options for talent spotting. But, even so, I would like to know if the alternative processes are better than traditional processes.
Alexy Guezy on where talent comes from.
An anonymous person on talent misallocation.
Thomas Ptacek on actually attempting to look at relevant signals when hiring in tech.
Me on the use of sleight of hand in an analogy meant to explain the importance of IQ and talent, where the sleight of hand is designed to make it seem like IQ is more important than it actually is.
Jessica Nordell on trans experiences demonstrating differences between how men and women are treated.
The Moneyball book, of course. Although, for the real nerdy details, I'd recommend reading the old baseballthinkfactory archives from back when the site was called "baseball primer". Fans were, in real time, calling out who would be successful and generally better greater success than baseball teams of the era. The site died off as baseball teams started taking stats seriously, leaving fan analysis in the dust since teams have access to both much better fine-grained data as well as time to spend on serious analysis than hobbyists, but it was interesting to watch hobbyists completely dominate the profession using basic data anlaysis techniques.
Estimates range from 0 to 0.3, with Teasdale et al. finding that the correlation decreased over time (speculated to be due to better nutrition) and Teasdale et al. finding that, the correlation was significantly stronger than on average in the bottom tail (bottom 2% of height) and significantly weaker than on average at the top tail (top 2% of height), indicating that much of the overall correlation comes from factors that cause both reduced height and IQ.
In general, for a correlation coefficient of
x, it will explain
x^2 of the variance. So even if the correlation were not weaker at the high end and we had a correlation coefficient of
0.3, that would only explain
0.3 = 0.09 of the variance, i.e.,
1 - 0.09 = 0.91 would be explained by other factors.
On the other side of the table, what one can do when being assessed, I've noticed that, at work, unless people are familiar with my work, they generally ignore me in group interactions, like meetings. Historically, things that have worked for me and gotten people to stop ignoring me were doing doing an unreasonably large amount of high-impact work in a small period of time (while not working long hours), often solving a problem that people thought was impossible to solve in the timeframe, which made it very difficult for people to not notice my work; another was having a person who appears more authoritative than me get the attention of the room and ask people to listen to me; and also finding groups (teams or orgs) that care more about the idea than the source of the idea. More recently, some things that have worked are writing this blog and using mediums where a lot of the cues that people use as proxies for competence aren't there (slack, and to a lesser extent, video calls).
In some cases, the pandemic has accidentally caused this to happen in some dimensions. For example, a friend of mine mentioned to me that their university department did video interviews during the pandemic and, for the first time, hired a number of professors who weren't strikingly tall.[return]
When at a company that has biases in hiring and promo, it's still possible to go scouting for talent in a way that's independent of the company's normal criteria. One method that's worked well for me is to hire interns, since the hiring criteria for interns tends to be less strict. Once someone is hired as an intern, if their work is great and you know how to sell it, it's easy to get them hired full-time.
For example, at Twitter, I hired two interns to my team. One, as an intern, wrote the kernel patch that solved the container throttling problem (at the margin, worth hundreds of millions of dollars a year) and has gone on to do great, high-impact, work as a full-time employee. The other, as an intern, built out across-the-fleet profiling, a problem many full-time staff+ engineers had wanted to solve but that no one had solved and is joining Twitter as a full-time employee this fall. In both cases, the person was overlooked by other companies for silly reasons. In the former case, there was a funny combination of reasons other companies weren't interested in hiring them for a job that utilized their skillset, including location / time zone (Australia). From talking to them, they clearly had deep knowledge about computer performance that would be very rare even in an engineer with a decade of "systems" experience. There were jobs available to them in Australia, but teams doing performance work at the other big tech companies weren't really interested in taking on an intern in Australia. For the kind of expertise this person had, I was happy to shift my schedule to a bit late for a while until they ramped up, and it turned out that they were highly independent and didn't really need guidance to ramp up (we talked a bit about problems they could work on, including the aforementioned container throttling problem, and then they came back with some proposed approaches to solve the problem and then solved the problem). In the latter case, they were a student who was very early in their university studies. The most desirable employers often want students who have more classwork under their belt, so we were able to hire them without much competition. Waiting until a student has a lot of classes under their belt might be a good strategy on average, but this particular intern candidate had written some code that was good for someone with that level of experience and they'd shown a lot of initiative (they reverse engineered the server protocol for a dying game in order to reimplement a server so that they could fix issues that were killing the game), which is a much stronger positive signal than you'll get out of interviewing almost any 3rd year student who's looking for an internship.
Of course, you can't always get signal on a valuable skill, but if you're actively scouting for people, you don't need to always get signal. If you occasionally get a reliable signal and can hire people who you have good signal on who are underrated, that's still valuable! For Twitter, in three intern seasons, I hired two interns, the first of whom already made "staff" and the second of whom should get there very quickly based on their skills as well as the impact of their work. In terms of ROI, spending maybe 30 hours a year on the lookout for folks who had very obvious signals indicating they were likely to be highly effective was one of the most valuable things I did for the company. The ROI would go way down if the industry as a whole ever started using effective signals when hiring but, for the reasons discussed in the body of this post, I expect progress to be slow enough that we don't really see the amount of change that would make this kind of work low ROI in my lifetime.[return]