88   88    88888888ba   88888888ba   88  88b           d88  88                          88         
  88   88    88      "8b  88      "8b  ""  888b         d888  88                          88         
aa88aaa88aa  88      ,8P  88      ,8P      88`8b       d8'88  88                          88         
""88"""88""  88aaaaaa8P'  88aaaaaa8P'  88  88 `8b     d8' 88  88   ,adPPYba,  ,adPPYYba,  88   ,d8   
aa88aaa88aa  88""""""8b,  88""""""'    88  88  `8b   d8'  88  88  a8P_____88  ""     `Y8  88 ,a8"    
""88"""88""  88      `8b  88           88  88   `8b d8'   88  88  8PP"""""""  ,adPPPPP88  8888[      
  88   88    88      a8P  88           88  88    `888'    88  88  "8b,   ,aa  88,    ,88  88`"Yba,   
  88   88    88888888P"   88           88  88     `8'     88  88   `"Ybbd8"'  `"8bbdP"Y8  88   `Y8a  
Found German secret Internet censorship list as hashes and recovered >99% of the URLs.

tl;dr: Germany has a censorship federal agency called BPjM which maintains a secret list of about 3000 URLs. To keep the list secret it is distributed in the form of md5 or sha1 hashes as the "BPJM-Modul". They think this is safe. This leak explains in detail that it is in fact very easy to extract the hashed censorship list from home routers or child protection software and calculate the cleartext entries. It provides a first analysis of the sometimes absurd entries on such a governmental Internet censorship list.

Introduction to the BPjM

The Federal Department for Media Harmful to Young Persons (German: "Bundesprüfstelle für jugendgefährdende Medien" or BPjM) is an upper-level German federal agency subordinate to the Federal Ministry of Family Affairs, Senior Citizens, Women and Youth. It is responsible for examining media works allegedly harmful to young people and entering these onto an official list – a process known as Indizierung (indexing) in German. The decision to index a work has a variety of legal implications. [...] Germany is the only western democracy with an organization like the BPjM. The rationales for earlier decisions to add works to the index are, in retrospect, incomprehensible reactions to moral panics.
Quote by Wikipedia
The censorship list ("index") is split into various sublists:
The sublists A, B and E contain about 3000 movies, 400 games, 900 printed works and 400 audio recordings. That sublists are quarterly published in the magazine "BPjM-aktuell" which can be read in any major library in Germany.

The sublists C and D were as well published in BPjS-aktuell (now BPjM-aktuell) up to edition 2003-01.
Since then the list of indexed virtual media is considered secret. As of July 2014 it contains more than 3000 URLs.

In order to make use of a secret censoring list the BPjM offers the "BPjM-Modul", which is a list of cryptographic hashes representing the censored URLs. The list is distributed about once per month to more than 27 companies who offer child protection software or DSL/Cable routers (for example AVM FRITZ!Box Router, Draytek Vigor Router, Telekom Kinderschutz Software, Salfeld Kindersicherung and Cybits JusProg and Surfsitter). This companies usually implement the blocklist as opt-in – users have to enable it by choice to filter the websites. Additionally, the major search engines like Google, Bing or Yahoo agreed to filter their results in Germany based on the list. They can download the (cleartext) list from a server of the FSM (Freiwillige Selbstkontrolle Multimedia-Diensteanbieter e. V.). In comparison to the opt-in approach by the router manufacturers the search engines filter all results served to German users, it is not possible to opt-out.

In 2011, "porno lawyer" Marko Dörre requested access to the list in order to do his work. This was denied two years later in curt decision VG Köln, 2013-07-04 – 13 K 7107/11 stating publication of the list could harm public safety. The curt further justifies its decision by stating that there are agreements with the 27 companies which have access to the hashed blacklist in place to ensure the list stays secret. This methods could be considered safe as there is no unauthorized use of the module data known since its creation in 2005.

This leak proves that the BPjM-Modul is not a secure way to distribute a secret Internet censorship list. It is not difficult at all to extract the list from different sources and calculate the cleartext URLs of the hashes. It proves as well that secret Internet censorship lists are of bad quality, with many outdated and absurd entries harming legitimate businesses.

BPjM-Modul implementations

There are at least three different technical implementations of the BPjM-Modul currently in use: Both implementations using hashes will be described in detail below.

BPjM-Modul implementation with separate md5 hashes for domain and path

This format is for example used by AVM on the FRITZ!Box cable/DSL routers.

Each entry consists of 3 hex values:
  • domain – md5 hash of the domain of the entry. The cleartext always starts with "http://" and never contains the www subdomain (but may contain other subdomains like www3). For example d7d6c7dd3e6592ab4d2c88b7305d6f20 is the md5 hash of "http://youporn.com".
  • path – md5 hash of the URL path of the entry without a slash in the beginning, in most cases it is d41d8cd98f00b204e9800998ecf8427e for an empty string (=complete domain blocked). Another example would be eacf331f0ffc35d4b482f1d15a887d3b for "index.html".
  • depth – Two bits representing the "path length" of the entry. Mostly it's 00 for no depth, which means the complete domain is blocked. The value 00 is used as well if the entry represents a certain filename but no directory, like "index.html". 01 stands for an entry with at least one slash, like "directory/". The highest depth seen so far is 04 for an entry like "dir/foo/bar/bla/".

  • BPjM-Modul implementation with salted sha1 hash of the URL

    The child protection software "Telekom Kindersicherung" includes a BPjM-Modul which is quite different to the (apparently older) md5 implementation. The software ships with the 143kb file BPjMInspect.dll which downloads new blacklists from the t-online.de webserver as a XML file. The structure of the XML file is as follows:
    <?xml version="1.0" encoding="utf-8" ?> <bpjmencodedlist> <table_a> <entry>00168D58328DF6363331B6CD944F2B9EC14A9DF366E9</entry> ... <entry>000EAEA17218F15DCDEC54752360A91C7CBFF96BC1E9</entry> </table_a> <table_b> <entry>000EB30D02BE3A08A34D75271E66DC3B4804E80292FC</entry> ... <entry>0020CDCBB0EE01AD4989FD299659BB22B202C4963CDF</entry> </table_b> <table_c> <entry>001A23D76FDFD2C50B58ECC48DA200864DB6309E8230</entry> ... <entry>003539FE72A1CBE73A2E97537A893293D82B76CAC260</entry> </table_c> </bpjmencodedlist>
    Each entry is a 44 bit hexadecimal upper case string. The first 4 bit represent the size of the cleartext string in binary notation. The other 40 bit are the sha1 hash of the domain or URL with the appended salt "To200-X" and without "http://" or the www subdomain. Table A contains 2816 entries of just domains, table B contains 115 domains with one depth value, for example "yildizporn.com/tube.htm" or "tubetubetube.com/tube". Finally, table C contains 85 domains with more than one depth value, for example "youtube.com/user/Saifulhaakim" or "vidyotup.com/video/126690/Kafa-Kesme-18".
    Example: The first entry of table A is 00168D58328DF6363331B6CD944F2B9EC14A9DF366E9 which stands for the domain "06111960.over-blog.com". The first 4 bit represent the size of the cleartext string in bytes, in this case the cleartext string has 22 characters which is 0x16 in unsigned binary. The sha1 hash of the string "06111960.over-blog.comTo200-X" is 8D58328DF6363331B6CD944F2B9EC14A9DF366E9.
    According to the HTTP headers the file that was served in June 2014 was last modified on 2013-12-20. According to the filenames used by AVM the list 2013-12 was released on that day: 20131220_bpjm-modul_12_13.txt.

    Get the BPjM-Modul blacklist

    The easiest way to obtain the BPjM-Modul blacklist is by just downloading the ones from SourceForge or the Openschoolserver project. These lists are quite old.

    You can download the last few lists here as well:

    They are extracted from a AVM FRITZ!Box. AVM is a German company producing mainly DSL/Cable routers. About half of all DSL/Cable routers in Germany are AVM FRITZ!Boxes. They support the BPjM-Modul and update the list about once per month even if you don't opt-in to use the filter. The firmware of the FRITZ!Boxes is based on Linux and telnet access can be easily activated. AVM ships an older BPjM blocklist in the file /etc/bpjm.data and saves updated versions to /var/bpjm.data or /var/media/ftp/FRITZ/bpjm.data (depending on the firmware).

    # Enable telnet on the FRITZ!Box by dialing #96*8* with a connected phone (wait for the peep)

    # Open a local netcat server on port 1234 in the terminal of your computer to receive the file
    netcat -l -p 1234 > /tmp/bpjm.data

    # make a telnet connection to your FRITZ!Box in another terminal window
    telnet fritz.box 23

    # Transfer the current BPjM-Modul database to your computer. If the file is not found, try /var/media/ftp/FRITZ/bpjm.data instead of /var/bpjm.data
    cat /var/bpjm.data | nc [YOUR-LOCAL-IP] 1234

    # Convert the database from binary to hex (ignoring first 64 bytes) and save it with the original filename
    od -t x1 -An -j 64 /tmp/bpjm.data | tr -d '\n ' > `strings /tmp/bpjm.data | head -n 1`

    # Split each entry into a separate line:
    sed -i -e 's/.\{66\}/&\n/g' 20140701_bpjm-modul_06_14.txt

    # Split each entry to domain, path, depth
    sed -i 's/.\{32\}/& /' 20140701_bpjm-modul_06_14.txt
    sed -i 's/.\{65\}/& /' 20140701_bpjm-modul_06_14.txt

    BPjM-Modul blacklist from "Telekom Kindersicherung" (sha1 implementation)

    If the free child protection software Telekom Kindersicherung is installed on the PC, the BPJM-Modul blocklist is located at C:\ProgramData\T-Online\BPJM\bpjmlist.xml

    # Alternatively, download the BPjM blacklist from Telekom in the same way the software does
    wget -d --header="Range: bytes=0-204799" --user-agent="BPjMModule" --header="Cache-Control: no-cache" http://www.t-online.de/bpjm/bpjmlist.xml
    # Apparently T-Online now serves an empty file on that URL, the last version of this file is mirrored here: bpjmlist.xml.

    # Select all hashes, remove the tabs and convert to lower case
    grep entry bpjmlist.xml | tr -d '\t' | tr [:upper:] [:lower:] > bpjmlist-sha1-telekom.txt

    # Remove the XML tags
    sed -i 's/<entry>//' bpjmlist-sha1-telekom.txt
    sed -i 's/<\/entry>//' bpjmlist-sha1-telekom.txt

    # remove the first 4 bit representing the size of the cleartext to get the plain sha1 hashes
    sed -i 's/^....//' bpjmlist-sha1-telekom.txt

    Calculate the cleartext

    To calculate the cleartext of the md5 (or sha1) hashes on the list the tool hashcat was used. Brute-forcing hashes is possible, but since it is known what kind of cleartext to expect, it is much more effective to use lists of domains as a wordlist. There are many ways to collect huge wordlists/dictionaries of domains: Just with the dnscensus2013 data it is possible to recover 2694 of the 3106 unique domains (86.74%) of the current BPjM-Modul blacklist.

    For calculating the cleartext of the BPjM-Modul blacklist the md5 implementation was mostly used, since calculating md5 hashes is faster than calculating sha1 hashes, the segmentation of domains and URL paths in the md5 implementation made it easier to calculate the cleartext and the available lists are newer than the Telekom sha1 list. Recovering the md5 hashes of the URL path was most effective using plain brute-force for the short paths and manual examination of the sitemap.xml for the more complex ones.
    Hashcat was used like this:
    ./hashcat-cli64.bin -r rules.txt md5hashes.txt dictionary.txt -o results.txt
    with different rules depending on the format of the used dictionary file, the most important one being ^/^/^:^p^t^t^h for prepending http://

    Currently the cleartext of 3280 unique md5 hashes has been recovered, see bpjm-md5hashes-plaintext.txt.
    Additionally, 2889 sha1 hashes could be calculated, see bpjm-sha1hashes-plaintext.txt.
    Because of the wide array of sources for lists of domains it is believed that the missing md5 hashes of domains are either typos, domains which are not registered anymore or domains with very low traffic.

    Analysis of the list entries

    Most entries on the list can be categorized as either: normal porn, animal porn, child/teen porn, violence, suicide, nazi or anorexia. On only about 50-60% of the domains on the list the questionable content is still accessible: About 10% of the domains are not registered at all, another 10% are parked domains, and about 20% don't provide any content at all (either no DNS A record, no webserver on port 80 or a redirect to another domain).
    It would be great to analyze this list similar to the work of Matti Nikki on the list from Finnland and AK-Zensur on the lists from Denmark and Sweden.

    Some noteworthy findings:

    About and contact

    Extracting the list and calculating the cleartext was no rocket science. Anyone interested in Internet censorship with a few hours of free time could have done this and several probably already did it. This leak is just to proof that the implementation of the BPjM-Module is not as safe as expected and that maintaining a secret Internet censorship list is wrong. Truly disgusting entries on the list, like child pornography, should be completely deleted from the Internet instead of filtered. The great analysis of the Danish censorship list by AK-Zensur proves that this is indeed possible.

    I'd prefer to stay anonymous for now, but if you feel the need to contact me you can send an email to bpjmleak@riseup.net