Introduction to the BPjM
The Federal Department for Media Harmful to Young Persons (German: "Bundesprüfstelle für jugendgefährdende Medien" or BPjM) is an upper-level German federal agency subordinate to the Federal Ministry of Family Affairs, Senior Citizens, Women and Youth. It is responsible for examining media works allegedly harmful to young people and entering these onto an official list – a process known as Indizierung (indexing) in German. The decision to index a work has a variety of legal implications. [...] Germany is the only western democracy with an organization like the BPjM. The rationales for earlier decisions to add works to the index are, in retrospect, incomprehensible reactions to moral panics.The censorship list ("index") is split into various sublists:
Quote by Wikipedia
- Sublist A: Works that are harmful to young people
- Sublist B: Works whose distribution is prohibited under the Strafgesetzbuch (German Criminal Code) (in the opinion of the BPjM)
- Sublist E: Entries prior to April 1, 2003
- Sublist C: All indexed virtual works harmful to young people whose distribution is prohibited under Article 4 of the Jugendmedienschutz-Staatsvertrag
- Sublist D: All indexed virtual works, which potentially have content whose distribution is prohibited under the Strafgesetzbuch.
The sublists C and D were as well published in BPjS-aktuell (now BPjM-aktuell) up to edition 2003-01.
Since then the list of indexed virtual media is considered secret. As of July 2014 it contains more than 3000 URLs.
In order to make use of a secret censoring list the BPjM offers the "BPjM-Modul", which is a list of cryptographic hashes representing the censored URLs. The list is distributed about once per month to more than 27 companies who offer child protection software or DSL/Cable routers (for example AVM FRITZ!Box Router, Draytek Vigor Router, Telekom Kinderschutz Software, Salfeld Kindersicherung and Cybits JusProg and Surfsitter). This companies usually implement the blocklist as opt-in – users have to enable it by choice to filter the websites. Additionally, the major search engines like Google, Bing or Yahoo agreed to filter their results in Germany based on the list. They can download the (cleartext) list from a server of the FSM (Freiwillige Selbstkontrolle Multimedia-Diensteanbieter e. V.). In comparison to the opt-in approach by the router manufacturers the search engines filter all results served to German users, it is not possible to opt-out.
In 2011, "porno lawyer" Marko Dörre requested access to the list in order to do his work. This was denied two years later in curt decision VG Köln, 2013-07-04 – 13 K 7107/11 stating publication of the list could harm public safety. The curt further justifies its decision by stating that there are agreements with the 27 companies which have access to the hashed blacklist in place to ensure the list stays secret. This methods could be considered safe as there is no unauthorized use of the module data known since its creation in 2005.
This leak proves that the BPjM-Modul is not a secure way to distribute a secret Internet censorship list. It is not difficult at all to extract the list from different sources and calculate the cleartext URLs of the hashes. It proves as well that secret Internet censorship lists are of bad quality, with many outdated and absurd entries harming legitimate businesses.
BPjM-Modul implementations
There are at least three different technical implementations of the BPjM-Modul currently in use:- the search engines receive the URL list of the BPjM-Modul encrypted via OpenPGP which they can decrypt to the cleartext
- a list with separate md5 hashes for domain and path part of the URL and two bits for indicating the depth of the URL, as used by by the Openschoolserver, AVM FRITZ!Box and an unknown implementation uploaded to SourceForge
- a BPjMInspect.dll file which downloads a bpjmlist.xml with salted sha1 hashes as used by the Telekom Kinderschutzsoftware
BPjM-Modul implementation with separate md5 hashes for domain and path
This format is for example used by AVM on the FRITZ!Box cable/DSL routers.Each entry consists of 3 hex values:
BPjM-Modul implementation with salted sha1 hash of the URL
The child protection software "Telekom Kindersicherung" includes a BPjM-Modul which is quite different to the (apparently older) md5 implementation. The software ships with the 143kb file BPjMInspect.dll which downloads new blacklists from the t-online.de webserver as a XML file. The structure of the XML file is as follows:Each entry is a 44 bit hexadecimal upper case string. The first 4 bit represent the size of the cleartext string in binary notation. The other 40 bit are the sha1 hash of the domain or URL with the appended salt "To200-X" and without "http://" or the www subdomain. Table A contains 2816 entries of just domains, table B contains 115 domains with one depth value, for example "yildizporn.com/tube.htm" or "tubetubetube.com/tube". Finally, table C contains 85 domains with more than one depth value, for example "youtube.com/user/Saifulhaakim" or "vidyotup.com/video/126690/Kafa-Kesme-18".
Example: The first entry of table A is 00168D58328DF6363331B6CD944F2B9EC14A9DF366E9 which stands for the domain "06111960.over-blog.com". The first 4 bit represent the size of the cleartext string in bytes, in this case the cleartext string has 22 characters which is 0x16 in unsigned binary. The sha1 hash of the string "06111960.over-blog.comTo200-X" is 8D58328DF6363331B6CD944F2B9EC14A9DF366E9.
According to the HTTP headers the file that was served in June 2014 was last modified on 2013-12-20. According to the filenames used by AVM the list 2013-12 was released on that day: 20131220_bpjm-modul_12_13.txt.
Get the BPjM-Modul blacklist
The easiest way to obtain the BPjM-Modul blacklist is by just downloading the ones from SourceForge or the Openschoolserver project. These lists are quite old.You can download the last few lists here as well:
20130822_bpjm-modul_08_13.txt
20131220_bpjm-modul_12_13.txt
20140203_bpjm-modul_01_14.txt
20140221_bpjm-modul_02_14.txt
20140403_bpjm-modul_03_14.txt
20140513_bpjm-modul_04_14.txt
20140530_bpjm-modul_05_14.txt
20140701_bpjm-modul_06_14.txt
They are extracted from a AVM FRITZ!Box. AVM is a German company producing mainly DSL/Cable routers. About half of all DSL/Cable routers in Germany are AVM FRITZ!Boxes. They support the BPjM-Modul and update the list about once per month even if you don't opt-in to use the filter. The firmware of the FRITZ!Boxes is based on Linux and telnet access can be easily activated. AVM ships an older BPjM blocklist in the file /etc/bpjm.data and saves updated versions to /var/bpjm.data or /var/media/ftp/FRITZ/bpjm.data (depending on the firmware).
# Enable telnet on the FRITZ!Box by dialing #96*8* with a connected phone (wait for the peep)
# Open a local netcat server on port 1234 in the terminal of your computer to receive the file
netcat -l -p 1234 > /tmp/bpjm.data
# make a telnet connection to your FRITZ!Box in another terminal window
telnet fritz.box 23
# Transfer the current BPjM-Modul database to your computer. If the file is not found, try /var/media/ftp/FRITZ/bpjm.data instead of /var/bpjm.data
cat /var/bpjm.data | nc [YOUR-LOCAL-IP] 1234
# Convert the database from binary to hex (ignoring first 64 bytes) and save it with the original filename
od -t x1 -An -j 64 /tmp/bpjm.data | tr -d '\n ' > `strings /tmp/bpjm.data | head -n 1`
# Split each entry into a separate line:
sed -i -e 's/.\{66\}/&\n/g' 20140701_bpjm-modul_06_14.txt
# Split each entry to domain, path, depth
sed -i 's/.\{32\}/& /' 20140701_bpjm-modul_06_14.txt
sed -i 's/.\{65\}/& /' 20140701_bpjm-modul_06_14.txt
BPjM-Modul blacklist from "Telekom Kindersicherung" (sha1 implementation)
If the free child protection software Telekom Kindersicherung is installed on the PC, the BPJM-Modul blocklist is located at C:\ProgramData\T-Online\BPJM\bpjmlist.xml
# Alternatively, download the BPjM blacklist from Telekom in the same way the software does
wget -d --header="Range: bytes=0-204799" --user-agent="BPjMModule" --header="Cache-Control: no-cache" http://www.t-online.de/bpjm/bpjmlist.xml
# Apparently T-Online now serves an empty file on that URL, the last version of this file is mirrored here: bpjmlist.xml.
# Select all hashes, remove the tabs and convert to lower case
grep entry bpjmlist.xml | tr -d '\t' | tr [:upper:] [:lower:] > bpjmlist-sha1-telekom.txt
# Remove the XML tags
sed -i 's/<entry>//' bpjmlist-sha1-telekom.txt
sed -i 's/<\/entry>//' bpjmlist-sha1-telekom.txt
# remove the first 4 bit representing the size of the cleartext to get the plain sha1 hashes
sed -i 's/^....//' bpjmlist-sha1-telekom.txt
Calculate the cleartext
To calculate the cleartext of the md5 (or sha1) hashes on the list the tool hashcat was used. Brute-forcing hashes is possible, but since it is known what kind of cleartext to expect, it is much more effective to use lists of domains as a wordlist. There are many ways to collect huge wordlists/dictionaries of domains:- leaked Internet censorship blacklists from other countries, like the ones from Denmark, Finnland, Thailand, Norway, Australia, India (2), Turkey, Russia, Italy and Sweden
- free and commercial proxy blacklists like URLblacklist.com, ut-capitole.fr, squidblacklist.org and shallalist.de
- current and old Alexa top million and Quantcast top million lists
- several registrars provide access to their complete zone files, for example: .com/.net/.org/.info/.name/.xxx/.travel/.asia/.ru/.su/.рф/.pro/.aero/.mobi/.cat/.jobs and .coop
- passive DNS databases like dnscensus2013, OpenDNS Umbrella Labs, "whois -h sim.cert.ee example.com", ISC DNSDB and CIRCL pDNS
- rDNS scans like the ones from scans.io, internetcensus2012 and deepmagic.com
- DNS AXFR Zone transfers where available
- BPjS-aktuell 2003-01, the last magazine issue with the list of indexed websites
- Internetatlas 2013 by Verfassungsschutz Sachsen, a huge list of nationalist websites
- archive.today "alldomains" list
For calculating the cleartext of the BPjM-Modul blacklist the md5 implementation was mostly used, since calculating md5 hashes is faster than calculating sha1 hashes, the segmentation of domains and URL paths in the md5 implementation made it easier to calculate the cleartext and the available lists are newer than the Telekom sha1 list. Recovering the md5 hashes of the URL path was most effective using plain brute-force for the short paths and manual examination of the sitemap.xml for the more complex ones.
Hashcat was used like this:
./hashcat-cli64.bin -r rules.txt md5hashes.txt dictionary.txt -o results.txt
with different rules depending on the format of the used dictionary file, the most important one being ^/^/^:^p^t^t^h for prepending http://
Currently the cleartext of 3280 unique md5 hashes has been recovered, see bpjm-md5hashes-plaintext.txt.
Additionally, 2889 sha1 hashes could be calculated, see bpjm-sha1hashes-plaintext.txt.
Because of the wide array of sources for lists of domains it is believed that the missing md5 hashes of domains are either typos, domains which are not registered anymore or domains with very low traffic.
Analysis of the list entries
Most entries on the list can be categorized as either: normal porn, animal porn, child/teen porn, violence, suicide, nazi or anorexia. On only about 50-60% of the domains on the list the questionable content is still accessible: About 10% of the domains are not registered at all, another 10% are parked domains, and about 20% don't provide any content at all (either no DNS A record, no webserver on port 80 or a redirect to another domain).It would be great to analyze this list similar to the work of Matti Nikki on the list from Finnland and AK-Zensur on the lists from Denmark and Sweden.
Some noteworthy findings:
- the domain "homo.com" offers a wildcard domain which echoes anything that is entered as a subdomain on the website, eg. visiting "Fritz.homo.com" results in a webpage "Haha, Fritz is gay!". On the BPjM list there is a entry irgend.ein.name.homo.com – the German "Irgend ein Name" stands for "any name". Contrary to the belief of the BPjM public servants this doesn't work as a wildcard – just this specific domain will be blocked
- there are some domains with upper case letters on the list (ExtremeAdultSex.com, FUQQER.com, HQBoys.com and painGate.com). This implies that eiter the calculation of the md5 hash is in fact case sensitive, which would mean that only "youporn.com" is filtered but "YouPorn.com" or "youporn.COM" are not. However, it is more likely that domains are always converted to lower cases before calculating the md5 hash which would mean this 4 domains will never get filtered. For more details on URL normalization see Wikipedia and RFC3986
- the listing of the XBOX 360 game "Dead Island" on amazon.co.uk is blocked
- the complete sell list of leading online music database Discogs. Probably at one point in time there was a listing of a music album which is forbidden in Germany – this was enough to block access to the "eBay of music" for years
- the domain beyondthedot.com is blocked, where FairWinds Partners, LLC, a domain name consulting firm explains the new generic top level domains. According to archive.org this was a porn website up to about 2008
- the free website hosting on mywebpage.netscape.com was shut down at least 5 years ago but there is still a URL on the list
- besides all the porn domains the trustworthy looking domain bible.org stands out. One article glorifying beating up kids for "education" is blocked
- lyrics of severals songs are blocked:
- Frei.Wild – Rache muss sein
- Landser – Zigeunerpack
- Normahl – Bullenschweine
- Weiße Wölfe – Ruhm und Ehre
- Swiss – Der letzte Schultag
- a German "Call of Duty 4" gameserver clan with a German .de domain: deutschefront.de – since it is a German domain hosted in Germany it should be possible to take other measures besides just blocking access to the domain
- many entries appear more than once on the list. The YouTube user pages of saberrien and Saifulhaakim are 4 times respectively 5 times on the list. However, there is not one single YouTube video itself blocked.
- several URLs with a wrong trailing slash:
- Death.html/
- welcome.htm/
- free/index.html/
- freecontent.html/
- voy.com provides free message boards where each board has a numeric id, like "voy.com/123456/". The BPjM wants to block five of those URLs with suicide forums. Only two of the entries contain the (in this case) correct trailing slash, the voy.com webserver redirects requests without trailing slash to those with a trailing slash. The other three entries will never be blocked if they are requested with the correct trailing slash.
- a website from the 1990s selling rusty old helmets: germanhelmetsinc.com
- the French counterpart of German "Jochen Schweizer" adventure coupons: happytime.com - apparently this used to be a porn website several years ago
- since many years switched off website of 17(!) year old game "Shadow Warrior" 3drealms.com/catalog/sw/
- website of a 10+ year old game which since long redirects to the publisher: postal2.com
- the domain kartoffelkanone.org (German for "spud gun"), which is not active since at least 2008
- splashdamage.com/content/wolfenstein-enemy-territory-barracks, the website of the free-to-play game "Wolfenstein: Enemy Territory" which is rated "PEGI: 16" by the Pan European Game Information system
- audio books download portal hoerbuch.in is the only warez entry besides the XXX section of defunct movie2k.com. One explanation for this might be the porn advertisements on the website
- bananenbar.com is the website of an Amsterdam strip bar with only modest content
- according to archive.org the domain facegoo.com is since at least 3 years not an porn website anymore. Now it is the website of an iPhone App for fun picture manipulation. The startup has no chance to be listed in German search engine results at all
- ety.com/tell/ according to archive.org used to be a Swiss nazi website "last updated: 14.04.2001" but now presents a startup beta invite screen
- several websites of the artist Alex Dirk Freyling are blocked (e.g. alexd.net, dieetwasandereart.net, macassar-art.com and neuexpressionismus.com). But other similar domains of the same artist are not blocked: derfreikuenstler.com, dualismus.net, kulturreportonline.net, adffilm.com, dfreyling.com etc.
- not on the current list, but older versions of the BPjM-Blacklist had plain typos: http://bilola, http://hot-soccer-moms-info and http://www-gangbang-squad.com
About and contact
Extracting the list and calculating the cleartext was no rocket science. Anyone interested in Internet censorship with a few hours of free time could have done this and several probably already did it. This leak is just to proof that the implementation of the BPjM-Module is not as safe as expected and that maintaining a secret Internet censorship list is wrong. Truly disgusting entries on the list, like child pornography, should be completely deleted from the Internet instead of filtered. The great analysis of the Danish censorship list by AK-Zensur proves that this is indeed possible.I'd prefer to stay anonymous for now, but if you feel the need to contact me you can send an email to bpjmleak@riseup.net