banner
DIYgod

Hi, DIYgod

写代码是热爱,写到世界充满爱!
github
twitter
bilibili
telegram
email
steam
playstation
nintendo switch

The Crash and Rebirth of a Six-Year-Old Open Source Project

I have an open-source project that I have maintained for six years - RSSHub, and it is facing a collapse.

Background#

On the surface, it has nearly 30k stars, over 900 contributors, over 300 million requests per month, countless users, monthly sponsorship of tens of dollars, a continuous stream of issues and pull requests, and code updates almost every day. It appears to be very healthy and vibrant. However, behind the scenes, the project has been facing high maintenance costs for years, server costs of over a thousand dollars per month, and the accumulation of tedious and repetitive maintenance work, which has pushed it to the brink of collapse.

The project was developed six years ago, and many trendy Node.js technologies and dependencies that were popular at the time, touted as the "Next Generation," have become outdated. Many popular new technologies today, such as JSX, TypeScript, and Serverless, cannot be applied. The architecture of the project is also very unreasonable, with information about each route scattered in multiple places. Making changes to a route requires modifications in multiple places - registering the route, writing the route script, writing Radar rules, and writing documentation. This increases the workload and makes it prone to errors. It wasn't a problem when there were only a few routes, but now it has become unbearable.

Maintaining the current state of the project on such a poor foundation is already a challenge. Developing new features would only increase the difficulty of future updates. Therefore, it is difficult to implement the novel ideas that occasionally come to mind.

The only solution to these problems is to rewrite the core using a modern framework and a newly designed architecture. However, as the number of routes increases, the cost of transformation also increases. Each fundamental change may require several months of work. So, although the problem is becoming more serious, I have been postponing it based on the principle that it is still usable.

But this is something that must be done, so I took some time to redesign and rewrite it over the past few months.

Technology Stack Updates#

koa -> Hono#

The first and most fundamental step was to replace the previous web framework, koa, which was popular six years ago as the next generation web framework. The author of koa had long abandoned it, so after researching, I decided to switch to Hono, which has the best support for JSX, TypeScript, and Serverless.

The API differences between them are significant, and I needed to rewrite all middleware and replace the koa API used in all routes.

Main changes:
https://github.com/DIYgod/RSSHub/pull/14295

image

The author of Hono also liked this transformation.

JavaScript -> TypeScript#

Switching to TypeScript can avoid many type-related issues and low-level errors. The most important thing is to ensure that the hundreds of contributors maintain consistent and error-free route code quality.

Main changes:

image

CommonJS -> ESM#

ESM is a specification that was strongly recommended by some Node.js core developers a few years ago. It has some advantages, but the most significant issue is the ecosystem fragmentation caused by the incompatibility with CommonJS and the criticism caused by the simplification of functionality.

After several years of development, it can now be said that it is barely usable in most scenarios. tsx also provides support for mixed usage of CommonJS and ESM.

Although I have made every effort, there are still some CommonJS code that is difficult to migrate temporarily. As a result, it can only be run using tsx, which is not compatible with some serverless platforms like Vercel. However, there is an opportunity to gradually resolve this in the future.

Main changes:

image

image

art-template -> JSX#

art-template is a template engine that supports koa. I remember there was a more popular template engine six years ago, but I don't remember its name. I chose art-template because I couldn't understand the more popular one at the time, and this one is very simple.

Hono comes with JSX support, and JSX needs no introduction. It is a syntax extension of JavaScript, which is widely used, and is equivalent to using React.

Main changes:

Jest -> Vitest#

Jest was a popular testing framework, but it has become less effective since the advent of ESM. Its support for ESM has always been "experimental support." Now, Vitest is more popular.

Main changes:
https://github.com/DIYgod/RSSHub/commit/38e42156a0622a2cd09f328d2d60623813b8df28

Got -> ?#

The currently used Got is also no longer actively maintained, and I have not found a good alternative yet. In the future, it may be replaced with native Fetch or a self-encapsulated Fetch, but I haven't started working on it yet.

New Routing Standard#

I am not capable enough on my own, so I have learned and improved a lot through discussions with community developers. The process has been very interesting: https://github.com/DIYgod/RSSHub/issues/14685

Main changes:
https://github.com/DIYgod/RSSHub/pull/14718

image

History#

The new standard is mainly aimed at solving the problem of scattered route information. This should be considered the third version.

The first version came from the development stage of RSSHub. At that time, I did not anticipate that there would be so many routes, so there was almost no planning. All routes were registered in the same file, and then route scripts and documentation were added. Later, this file became larger and more prone to conflicts. Additionally, all route scripts would be loaded during the startup phase, resulting in poor program performance.

The second version came from the period when it was maintained by NeverBehave. It introduced namespaces, split router.js and radar.js, and routes with the same namespace were centralized in the same folder and one or more Markdown documents. It also implemented lazy loading, greatly improving maintainability and performance. However, the information was still scattered in multiple files, and inconsistencies between different files could lead to errors.

Now#

This time, the route files are divided into two categories: namespace.ts and route files with arbitrary names.

namespace.ts defines namespace information by exporting an object named "namespace."

import type { Namespace } from '@/types';

export const namespace: Namespace = {
    // ...
};

The fields contained in the namespace object are restricted by TypeScript to:

interface Namespace {
    name: string;
    url?: string;
    categories?: string[];
    description?: string;
}

This information is used by the compiled code, documentation, and RSSHub Radar.

Route files define route information by exporting an object named "route."

import { Route } from '@/types';

export const route: Route = {
    // ...
};

The fields contained in the route object are restricted by TypeScript to:

interface Route {
    path: string | string[];
    name: string;
    url?: string;
    maintainers: string[];
    handler: (ctx: Context) => Promise<Data> | Data;
    example: string;
    parameters?: Record<string, string>;
    description?: string;
    categories?: string[];

    features: {
        requireConfig?: string[] | false;
        requirePuppeteer?: boolean;
        antiCrawler?: boolean;
        supportRadar?: boolean;
        supportBT?: boolean;
        supportPodcast?: boolean;
        supportScihub?: boolean;
    };
    radar?: {
        source: string[];
        target?: string;
    };
}

The information that was previously scattered across route.js, mantainer.js, radar.js, and the documentation is now centralized in this one file, reducing the likelihood of errors caused by multiple definitions.

Implementation#

The implementation logic is that in the development environment, the entire route folder is traversed to find all namespace.ts and route files, read the information, and load the routes. In the production environment, pre-compiled path lists are used to avoid unnecessary traversal and loading processes. The code can be found here: https://github.com/DIYgod/RSSHub/blob/master/lib/registry.ts

The documentation is also generated by traversing the route folder, finding all the necessary information, and synthesizing a series of Markdown files. It no longer needs to be manually maintained. The code can be found here: https://github.com/DIYgod/RSSHub/blob/master/scripts/workflow/build-routes.ts

Of course, routes developed using the previous standard need to be migrated to the new standard instead of being abandoned. They have been replaced after batch information scraping and organization through scripts. However, the documentation is quite messy and contains many errors, so there are also many errors in the scraped information. They can only be gradually corrected manually in the future.

Future#

With these improvements, RSSHub can finally get rid of its historical burdens and focus on developing new features. Here are some ideas I have accumulated to spark inspiration:

  • Since RSSHub is a data aggregator, its purpose is not limited to RSS. Enhancements can be made to the JSON output functionality to serve as a general RESTful API. For example, it can provide an interface to get the next page or output non-feed data similar to Twitter follower count.
  • User system and user-customized configurations, allowing users to generate their private subscription URLs #14706
  • Route error notification and health check #14712
  • Integration with RSS3 nodes and cryptocurrency revenue sharing https://twitter.com/rss3_/status/1731822029199094012
  • AI translation and summarization
  • More detailed analysis of example data and automatic recommendation of Radar rules based on reverse deduction
  • RSSHub instances bound to local browsers or clients, with the hope of truly solving the anti-crawling problem
  • ...

Finally, open source is an expensive endeavor, and RSSHub would not have survived without the help of these developers.

image

And these kind sponsors:

image

If RSSHub is helping you, I hope you can actively participate and contribute to the future of information freedom.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.