Brief analysis of the Gawker password dump

UPDATE 1: We’ve updated our analysis with approximately 200k additional cracked passwords.

UPDATE 2: We’ve launched a site that allows you to easily check if your username or email address was included in the Gawker password dump:http://didigetgawkered.com

UPDATE 3: Due to popular demand, we’ve posted the top 250 most common cracked passwords.

If you haven’t heard yet, the Gawker Media network, which includes popular websites such as Lifehacker, Gizmodo, Jezebel, io9, Jalopnik, Kotaku, Deadspin, Fleshbot, and of course Gawker, was compromised yesterday. The hacker group Gnosis posted a torrent containing a full dump of Gawker’s source code as well as the entire user database consisting of ~1.3 million usernames, email addresses, and DES-based crypt(3) password hashes. While this dump is not nearly on the scale of the RockYou incident, it is certainly a serious exposure.

As a two-factor authentication provider, situations like the Gawker hack are key illustrations of why strong auth is a necessity. While users may not care about an attacker having access to their Gawker account, the danger of password sharing across websites and services poses a much bigger threat. Services that lack a strong secondary authentication and host users who are sharing passwords (which, let’s be honest, most users probably do) face the greatest risk. Attackers will undoubtedly be testing the cracked passwords against both personal and corporate services such as email accounts, online banking sites, VPN remote access logins.

As it’s not very often that we get a glimpse into the human psychology of password selection, let’s dig deeper into the password dump!

John the Ripper

The defacto tool for cracking password hashes is John the Ripper (also known as JtR), written by Solar Designer. If possible, I’d highly recommend using the available patches for JtR, allowing the parallelization of the cracking process using OpenMP. I ran our cracking session on a 8-core Xeon box:

b0x ~ # uname -a
Linux b0x 2.6.36-gentoo Sat Dec 4 20:11:03 EST 2010 x86_64 Intel(R) Xeon(R) CPU X5460 @ 3.16GHz GenuineIntel GNU/Linux

This puppy can crank out a decent number of cracks/second:

b0x ~ # john -test
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 20465K c/s real, 2562K c/s virtual
Only one salt: 16003K c/s real, 1999K c/s virtual

I’d also recommend using a comprehensive wordlist to assist in the cracking process. I compiled a wordlist of ~1.9M entries from a number of different sources (including datasets from Openwall and Skull Security):

b0x ~ # wc -l wordlist.txt
18966068 wordlist.txt

Cracking Results

Before getting started, we filtered the ~1.3M entires in the database dump down to ~748k crackable password hashes:

Loaded 748039 password hashes with 3844 different salts (Traditional DES [128/128 BS SSE2-16])

In just under an hour of cracking on a single 8-core machine, we had successfully cracked 190k passwords. Allowing JtR to continue to run yielded an additional 200k cracked passwords, resulting in a total of almost 400k cracked passwords representing over 50% of the total password hashes:

b0x ~ # wc -l output.txt
399380 output.txt

Password Analysis

As with any password dump, one of the most interesting outcomes is the most popular/common passwords chosen by users. The top 25 most common passwords from our cracking results were:

   2516 123456
   2188 password
   1205 12345678
    696 qwerty
    498 abc123
    459 12345
    441 monkey
    413 111111
    385 consumer
    376 letmein
    351 1234
    318 dragon
    307 trustno1
    303 baseball
    302 gizmodo
    300 whatever
    297 superman
    276 1234567
    266 sunshine
    266 iloveyou
    262 fuckyou
    256 starwars
    255 shadow
    241 princess
    234 cheese

The vast majority (99.45%) of the cracked passwords were alphanumeric and did not contain any special characters or symbols:

b0x ~ # cat pws.txt | egrep "^[a-zA-Z0-9]+$" | wc -l
397198

Of the passwords that were alphanumeric, about 61% were composed of strictly lowercase alphabetic characters, 9% were strictly numeric, less than 1% were strictly uppercase alphabetic characters, and the rest were mixed alphanumeric:

b0x ~ # cat pws.txt | egrep "^[a-z]+$" | wc -l
241208

b0x ~ # cat pws.txt | egrep "^[0-9]+$" | wc -l
34703

b0x ~ # cat pws.txt | egrep "^[A-Z]+$" | wc -l
2868

One interesting property of the dataset is that there are a large number of unique passwords. There are a total of 202k unique passwords in the set of 400k cracked passwords. Of those unique passwords, approximately 155k (77%) are used by only a single user (eg. they’ve selected a password that no one else has). Similarly, 24k (12%) are passwords that are shared by only two users and 8k (4%) are shared by only three users. The occurrence of unique passwords observed here will surely decrease as the more passwords are cracked by JtR and the odds of collisions between users increases.

Domain Analysis

Besides the cracked passwords, we can also take a look at the email addresses contained in the database dump. The top 25 most common email domains are as follows:

 173942 gmail.com
 101959 yahoo.com
  72847 hotmail.com
  20551 aol.com
   8106 comcast.net
   6078 msn.com
   5835 mac.com
   4341 sbcglobal.net
   3397 hotmail.co.uk
   2531 verizon.net
   2204 cox.net
   2174 live.com
   2113 yahoo.co.uk
   2050 earthlink.net
   1939 yahoo.co.in
   1851 aim.com
   1626 mail.ru
   1619 bellsouth.net
   1490 googlemail.com
   1045 charter.net
    995 optonline.net
    990 yahoo.ca
    892 me.com
    888 rediffmail.com
    806 att.net

Perhaps more interesting are some of the accounts that belong to government officials with domains ending in .gov. The following is some of the .gov accounts contained in the Gawker dump and the number of occurrences of each domain:

     15 nasa.gov
      9 va.gov
      9 mail.house.gov
      7 usps.gov
      7 irs.gov
      7 cdc.gov
      6 ssa.gov
      6 dhs.gov
      5 michigan.gov
      5 mail.nih.gov
      4 usdoj.gov
      4 panynj.gov
      4 edd.ca.gov
      4 boe.ca.gov
      4 bls.gov
      3 ky.gov
      3 fnal.gov
      3 ed.gov
      3 dol.gov
      3 dc.gov
      3 cabq.gov
      2 wisconsin.gov
      2 whitehouse.gov
      2 utah.gov
      2 state.gov

Wrap-Up

We’ll be continuing to update this post with more statistics and analysis as the results come in!

If you’re an end user and think you may have registered an account with Gawker or one of its affiliated sites, be sure to change your passwords on any sites that may have the same or similar password as your Gawker account. In general, incidents like these are a good time to revisit your existing password schemes and ensure you are protecting your online accounts adequately.

If you’re an administrator who runs a website or service where your users are logging in with only a password, now is the time to beef up your security with some strong two-factor authentication. If your users happen to be sharing a password contained in the Gawker dump, their accounts could be at risk. Feel free to drop us a line at Duo to learn how easy it is to integrate two-factor authentication into your website, server, or remote access VPN!