Categories
How It Works

Torrenting Cache

Caching with torrents. It flows messily.

There’s a new cache in town, CacheP2P.

The basic concept is that you can use BitTorrent to seed your cache across the world, making it even faster for everyone. Setting it up is fairly simple. Configuring it is not. At least not in an automated fashion.

Traditional web browsing is a direct connection between user and server. Traditional caching is done by having the server (or a proxy of the server) create a static copy of the page and display that. In the case of WordPress and any other dynamic CMS, that works by taking the load off of PHP and MySQL having to generate a new page on every visit.

By using BitTorrent, this is changed so that you would instead be getting a cached copy not from a server but from someone else’s computer. If you and I were on the same network, I might get the page from you instead of the server. That sounds really weird, doesn’t it? Via two javascript files combine to signal the torrent’s API, and a third file uses the unique page hash to determine freshness. Keep your eye on that last part, it’s what makes the idea of a plugin for WordPress such a pain.

To get the content for that last file, you have to look at your page in dev tools to grab the security hash:

[CacheP2P] this page's security hash: (2)
"c72d19b8ed03be98ceebd06f7c93dc06410b4de4"
"(http://www.cachep2p.com/api.html)"

On Safari it looks like this:

Example of what the hash looks like

Now if it works, and you can see an example on the cachep2p.com domain, it would show results similar to this:

Example of the cache working

This did not actually work for me on Safari. At all. It was fine on Chrome, but Safari never served up the cache which is odd.

My first concern was about cache contamination. That is, if someone downloads the page and messes with it, could they have my site show content I didn’t want it to show? By using hashes, this is minimized. I have a file that defines the valid hashes, and if the copy doesn’t match, it downloads my content, not the bad one.

However the greater concern is that of accidentally releasing content I shouldn’t. Take this example. I accidentally publish something I shouldn’t, like the plan to tear down the Berlin Wall. Without caching, I can quickly redact it and if Google didn’t scrape my page, it’s like it never happened. With caching (and Google…) the bad content (my destruction plans) remain out there unless I visit the cache provider and flush things. If you’ve ever used a 3rd party proxy like Cloudflare to cache your content, this is the situation when you update your CSS files and have to go force them to refresh.

With the BitTorrent situation this becomes worse, because the cache is in the hands of the masses. If you were a politician and I your rival, I would have someone constantly visiting your site and saving the cache. Then I could go through it and look for accidental leaks.

Now of course this could happen today. I could set up a simple content scraper and have it ping your site every so often to save the data. You could, in turn, block my IP, and I would retaliate by setting up a Tor connection to do it from obfuscated IPs. The difference here is that you’re actually encouraging me to cache your data with this plugin.

An additional concern is the dynamic aspect of WordPress. The only way to grab the hash right now is to view the page. That hash will change when I save a page. In fact, it might change on every page load, in some situations. I didn’t get too far into testing at this point, since I realized that in order for this to work I would have to load a page, grab a hash, edit a file, save that file up on the server, and then it would cache…

That would be terrible on WordPress. For this to work on any large site, the generation of that hash file would have to be automated. No matter if the site is dynamic or not, to make people manually do that is preposterous. A vaguely WordPress solution I dreamed up was to somehow catch the cache has as the page is saved, store it in a post-meta value, and then use WordPress to generate a ‘fake’ page with the URL and the hash for the cache tool to use.

It might be easier to do that via something like WP Super Cache of W3TC, and have it save the file as it saves the cached page (and point to the static page instead of the dynamic one) but even then, the rapid changing of WordPress content would make it difficult for a cache to seed far enough out.

Right now, I think this is something that might only be useful for a small, mostly static, site.