Tencent Cloud API Docset for Dash

· 2 min read · 373 Words · -Views -Comments

I often have to look up Tencent Cloud APIs for work. To speed up searches, I decided to build a Dash docset—searching docs through Dash is hard to beat.

Here’s the process I followed.

Result

https://static.1991421.cn/2022/2022-06-19-230436.gif

How It Works

  1. Crawl the Tencent Cloud API docs and save the pages as HTML.
    • Every Dash docset, whether official or community-built, essentially bundles fetched HTML pages or Markdown sources. Markdown is easier to process because of its structure.
    • Tencent Cloud doesn’t publish Markdown files, so crawling HTML is the only route.
  2. Walk through all downloaded pages and extract interfaces, functions, titles, and hierarchy information, which can be inferred from the original URLs.
    • As long as the markup is consistent, parsing isn’t difficult.
  3. Store the extracted data in Dash’s prescribed SQLite schema and package it as a docset.

Once the direction was clear, implementation was straightforward.

Build Steps

  1. Use wget to mirror the site—the command supports recursive downloads.
    • curl can’t recurse, so it’s out.
  2. Create the docset folder structure and SQLite tables that Dash expects.
  3. Parse the HTML files with Node.js to extract metadata and insert rows into the database.
    • I first considered parsing everything in memory and bulk-inserting, but with so many files it risked running out of memory. Instead, I parse one page at a time and insert after each batch.
    • Dash can host local HTML or link out to the live site. Local copies give you full control over layout, while remote URLs keep things simple but inherit the site’s styling. I opted for remote URLs: both the index and the API pages point to Tencent Cloud’s live docs.
  4. Tar the directory into the .docset bundle.
  5. Write a feed.xml so others can subscribe.
    • Dash doesn’t sync locally added docsets across devices. Packaging a feed lets other machines subscribe, or you can submit it to Dash’s official repo.

All scripts and the docset itself live here: dash-docset-tcapi

Final Thoughts

  1. This approach focuses on fast keyword lookup that redirects to the live doc URLs. Serving local HTML is possible, but it risks going stale and requires extra styling work; you’d also have to decide whether to keep URLs local or remote.
  2. Any API documentation lacking an existing docset can use the same playbook.

References

Authors
Developer, digital product enthusiast, tinkerer, sharer, open source lover