Download PDF or ODT file. This page on IPFS.

Preparing A Website For Web 3.0 Publication

The First Step Toward Becoming Virtually Unstoppable

“Everything should be as simple as it can be, but not simpler.” - Albert Einstein

Updated 20 Feb 2021

This document is available on IPFS at https://ipfs.io/ipns/k2k4r8mlcz4sn7rvh45zq2uu1b883mrs6a9hmhn4dzl6mobpumbiruhv/FreedomPosts/Preparing-A-Website-For-Web3-Publication.html

 

Preface

Websites not conforming to the crown’s narrative may have their domain names disabled any day. Do you want yours to survive the coming purge? If so, this article is urgent and important for you.

This is written to be understood by as many as possible without getting off into the technical details any more than necessary.

Regardless of the website type, these are the necessary first steps for any website migrating to the virtually unstoppable Web 3.0 (aka Web3), the fully distributed web.

Constructive and positive feedback is welcome. We need solutions, not negativity.

Why Distributed Is The Most Robust Network

IPFS (Inter-Planetary File System) is currently the only fully functional means of publishing a complete website, with changing content, at an unchanging address, on a widely distributed network. Distributed networks are far more robust against attack than merely decentralized or federated networks. Consider how many nodes would need to be disabled in the following illustration to to prevent communication between an information source and you.

With more nodes in a network and higher numbers of nodes connected to each, a distributed network becomes more robust against attack. Distributed and decentralized/federated networks are distinguished from each other in Web3 Only Podcast number 4, on the only fully decentralized media platform today, LBRY, at
open.lbry.com/@Web3Only:d/Why-Distributed-Networks-Are-Important:b

This document addresses preparation for publishing a website on IPFS because today it’s by far the most robust distributed network with this capability.

The Simplest Solution

When a website is published on IPFS, the simplest manifestation is with everything in the same folder. An unchanging IPNS (Inter-Planetary Name System) hash address is established for that folder, replacing the domain name. The folder accessed at the IPNS address is a mirror of the original website. If there were no other considerations, that would be the simplest solution and this paper could end here.

Faster Website Access

To provide quicker, more assured access to content on IPFS, it needs to be stored in more places, aka nodes, operated by others. This can happen naturally from people simply accessing it, although the first few may find this intolerably slow. As more access the files, bits of them are left along the various paths the data followed through the distributed network from the source (aka seeding), making access quicker and quicker as more do the same. More people accessing a file actually makes it faster to do so.

To speed up initial access so more people will find it tolerable, a publisher may cooperate with other node operators who also save their content. This is a process known as “pinning” which prevents the data from being overwritten or deleted. For this pre-seeding of the distributed network to happen quickly, the biggest obstacle is the size of the folder containing the contents. As a folder becomes larger, the pinning process slows until it becomes difficult to impossible. To avoid intolerably slow viewing for the first website visitors after an update, the folder containing website’s content needs to be small for quick pinning by other nodes.

Having an entire website in a single, sufficiently small folder is often not practical, causing the above “simplest solution” to become simpler than it can be. Larger websites using this approach would require a large number of folders, each small enough to facilitate quick pinning. The website modifications required to make this happen would be prohibitively difficult and time consuming for most websites. So, a solution that’s not simpler than it can be is needed for most existing websites.

As Simple As It Can Be

To arrive at a simple and optimal solution, let’s consider the case of a generic very large website. It contains some unchanging content which we will call “archive”. Then there is the often changing portion we will call “active”. After the archive content is dealt with in the next section, then the need arises for a “new” content folder for additional content which won’t change. So, we have the website segregated into three portions: archive, active and new. These have three different conditions to address for Web3 publishing and are addressed separately below.

Initial Archiving

Fortunately, it’s possible to not divide the archive and still have it distributed across many nodes, even if nobody has accessed any of the content. The archive can simply be the first published in an IPFS node of the entire website one operation. The only exception to this would be content exceeding the amount which can be published in a single node and remain tolerably accessible, which is at least 1Tb and possibly a good bit more. Excessive archive size will be addressed in Size Matters.

As mentioned earlier, pinning large amounts of content from another location can be difficult to impossible. The solution is a combination of new and old tech. It requires fast direct access to the folder where the entire website is stored.

Here are the basic steps, without getting off into any more technical detail than necessary for this paper. Required hardware, software and helpers are in bold.

  1. Create a sitemap and an index from it using software and settings common to all Pirate Box Project publishers to facilitate search capabilities. The software and its settings will soon be determined and shared.

  2. Existing unchanging archive content may be left where it is. Folders below the website’s parent folder which contain initial archive content must not be added to. See the new and active content sections for their needs.

  3. An empty storage device (USB3 or better) is connected by the publisher to the computer where the entire website’s source folder can be accessed with an internal or mapped fast connection.

  4. IPFS Desktop installation software is put in the top/root directory/folder of the empty storage device. This assures the IPFS node’s data repository is located on that device in the default location.

  5. Run the IPFS Desktop installation to create the node and IPFS repository to contain the archive.

  6. Run the IPFS Desktop installation to create the node and IPFS repository to contain the archive.

  7. Add the entire website contents to the IPFS node using the fast connection to the source folder.

  8. Duplicate the storage device, including all hidden content, identify it as “website publishing” and set it aside for first publishing of the website.

  9. Delete the IPFS private key uniquely identifying the node and repository.

    1. Save this storage device with no IPFS private key to make duplicates as needed and identify it as the “archive duplicate original”.

  10. Make as many duplicates of the “archive duplicate original” as needed for the next step.

  11. Send your archive duplicates to as many people as possible who have a PC which can be dedicated to this purpose.

    1. It must always remain on (aside from power outages and such) and connected to the internet behind a firewall or router.

      • Note a VPN will slow access and they’ve already been blocked at the internet backbone, even in the USSA.

    2. Archive PC minimum hardware spec: quad core processor, operating system compatible with the archive storage device, USB3 or better port for the storage device and 8Gb RAM (4Gb with lite Linux versions such as Raspberry Pi OS).

  12. Those receiving the duplicates will plug them in and run the IPFS Desktop setup software which will automatically find the archive repository on the drive and create a new private key uniquely identifying their node.

  13. Add the entire website folder’s IPFS hash, provided by the publisher, to the IPFS Desktop’s GUI (graphical user interface)

  14. Use the duplicate saved from step 5, with the private key intact, to initially publish the website, preserving all internal links.

    1. An IPNS address is created so that people will always be able to find the website mirror on IPFS, even after changes and additions published later. That address can be shared and saved in browser favorites.

    2. The initial publishing preserves the website’s entire archive, plus the latest active content which will be superseded by subsequent website publishing.

  15. Those with the duplicates containing the archive will verify their node’s access to the website using the published IPNS address.

That completes the initial distribution/seeding of your website archive across the IPFS network. More nodes will increase the assurance of access and initial speed. So, send out as many as you can.

Size Matters

Excessively large websites may need to be reduced in size. Otherwise, it may be necessary to split it up into multiple archives and send out multiples of drives to multiples of node operators and/or operators each setting up multiple nodes. This will require significantly more time and money.

A more graceful and less expensive approach can be reducing the size of the archive. Generally speaking, large archives contain a lot of media, especially video. Quite often these media files were created using very high resolution which requires a lot of storage for each. Considered objectively, the video resolution and quality can likely be smaller and still nicely convey the message. In fact, an even smaller audio will often suffice.

Media files in excessively large websites need serious assessment by the content creator to determine the amount of file size reduction which can be acceptably accomplished. Then the size of files should be reduced in the website’s source folder using a batch process. First backup the site to save the larger originals.

Another option is to create a duplicate of the website’s folder to make all necessary changes in. This will then become the version which is published on IPFS, leaving the original for traditional access through the domain name. This approach will cause more work each time the website is published on IPFS, to the extent that new and updated content require additional attention.

Smaller websites can also benefit from media file size reductions. It increases the speed of access for those viewing them.

Going forward, all publishers will benefit from smaller new media files. This will facilitate newer content residing on more nodes and downloading faster.

New Content

A new folder (in the first directory level under the website’s parent folder) is needed for new content items which will not be changing. They will accumulate in this folder until it reaches some maximum practical size limit, before other nodes can’t quickly enough pin over the internet. Then, another new content folder will begin, leaving the prior one in archive status, without any further additions. This process continues as the website grows.

Each “new” content folder, and those which came before, will be simply published using code which we won’t go into here. It will be treated almost exactly the same as the active content portion of the website, but with a different publishing key to facilitate pinning by other nodes. After a new content folder reaches its final archive status, not receiving any new content, it won’t require any further publishing, or pinning by others.

Active Content

All websites have a small portion, like the home page, which will continue to change as long as the website is being maintained, referred herein to as “active content”. It simplifies publishing if all active content is in the root directory of the website with the home page. Other publishers’ pinning of website contents can be either recursive (capturing all subfolders) or not. New content folders are pinned recursively by other publishers to support more site structure.

Conversely, the active content in the website folder’s top level cannot be pinned recursively by others who don’t have the archive duplicate original. Otherwise, attempted downloading of the entire archive will likely freeze up the node. So, it’s best if all active content must reside in the top level of the website’s folder.

If you must have active content in a segregated folder, it will require additional publish and pin operations when the website is published.

There will be some later additions to the active content for support of searching on IPFS and coordination with other publishers for mutual pinning and pre-seeding of content.

Server Side Considerations

All files and processing on the web hosting server, outside of the websites top level folder, will become inaccessible on IPFS. Any necessary files and processing, such as site search, need to be operable from within the folder without any connection to the server or internet.

Link Considerations

Any absolute link including a domain name will be unchanged, not connecting to IPFS. Use of absolute links to within the website will not function on IPFS.

Relative links entirely within the website will be automatically retained during publishing.

Bottom Line

It’s important and urgent that websites begin preparing their site structure for Web 3.0 publishing on IPFS. The the website can then quickly be put on IPFS to assure preservation and access. People will be able to see all of your hard work, even after the crown tries to block access by scrubbing your website’s name from the DNS (domain name servers) internet phone book, which could happen very soon.

Updates

20Feb 21: Added step 1 in Initial Archiving to “Create a sitemap...”
Added step 5 in Initial Archiving to “Change the IPFS communications port...”

19 Feb 2021: added a paragraph in the Size Matters section regarding: “Another option is to create a duplicate of the website’s folder...”

 

Another document will present the Pirate Box plan from here forward. This one is to expedite getting the ball rolling for publishers.

For Freedom,
Pirate Mike