Bald.

The (yet to be identified) link between hair and performance in the NBA

Motivation: wait, what?

The idea behind making this website was to goof around a bit with a new set of skills that I recently acquired and have been itching to actually put into test. There is nothing really special about facial hair that would otherwise warrant a serious investigation into its influence on basketball skills. I just like basketball and the NBA website is a good resource for player stats.

Game plan

The plan is to get player images, quantify the hair on their heads somehow and see whether this hair value correlates with any of their game stats. The road to fame from there is fairly obvious. Well, sort of.
  1. Web scraping: use Beautiful Soup to get player stats and images from the official NBA website
  2. Image analysis: identify hair (with PIL and possibly Scikit-learn) and quantify it
  3. Exploration: look for correlations between player stats and hair value using Pandas
  4. Prediction: use parametric and non-parametric fitting with SciPy and PyQt to make predictions based any correlations that were found
  5. Plotting: make figures using Matplotlib
  6. Web design: make this website

Images

Getting the images was pretty simple really - Beautiful Soup did most of the heavy lifting. I got the soup, identified the links to player pages and used httplib to get the headshots (I only looked at the list of active players for simplicity). I then converted each image to greyscale with PIL, since all I really care about is the contrast between hair features and the face.

List of players Carlos Boozer color image Carlos Boozer greyscale image

Hair detection

This part was a little tricky. Initially, I thought that I could train a neural network to identify hair in the images but the small sample size and the simplistic approach that I followed (not worry about other facial features) made this unfeasible. Instead, I took a more empirical route. I smoothed the images using a median filter (with kernel size of 5 pixels) and then identified contiguous regions in the resulting array. For this I only considered pixels whose value was above some threshold that I determined from trial and error. Finally, a given region was assumed to be "hair" if it extended spatially as far as the top of the head.

The slider below shows the images, stats and detected hair for the first 30 player on the list. Overall, hair detection is pretty good, with only a few misidentified cases (mostly due to deep shadows):

Results

Not too surprisingly, the data show no correlation between the derived hair value and any of the other stats. This is pretty obvious from the figure below (made with matplotlib), which shows hair value as a function of a few key stats. For reference (and for somewhat fancier plotting), the player with the highest hair value on the list, Jordan Hill, is also noted:

Results figure

 

Non-parametric fitting and unexpected finds

It is actually pretty remarkable how many of the stats correlate linearly with one another, and more importantly, with time on the court. The following figure (made with pandas) shows this fairly clearly:
Plot of everythin against everything

There are, however, a couple of correlations that are not best fitted by a simple line. It seems like players that are on the court the longest, also score more points minute than players that play fewer minutes. Those same players also commit more turnovers per minute than their peers. The two following figures show this clearly. The curves are two types of non-parametric fits to the data.
These correlations are of course driven in large by selection effect - players stay longer on the court if they are on average more efficient in scoring points. On the other hand, such players are more likely to be tired and make mistakes, thus losing the ball more frequently:

Minutes vs. points Minutes vs. turnovers

Conclusion

Well, you should certainly cut your hair, but only if you like to. I guess expecting it to make you better (or worse) at basketball will not be based on evidence from here...

Thoughts, ideas or other complaints? Please let me know at: