Shipping 700KiBs of compressed HTML isn't viable?


>> Implementing pagination and a navbar for me.l-m.dev.

Posted on | 1489 words | ~7 minute read


The Website

I use it daily. It’s slow. Why?

Whilst creating it, iterating and developing on my local machine was easy. For one, latency was low over a localhost connection, and another, there wasn’t 600 posts with half of them containing images and videos. Every time the website was loaded, the site would push 700KiBs of HTML after GZIP (yes! AFTER gzip), crippling performance.

Now if your device wasn’t brought to it’s knees already, wait till you need to load the content (not all browsers lazy load when you specify), expect to wait 30 seconds of straight loading.

You know who I feel bad for? Mobile Users.

Unfortunately I don’t have to guess what it does to their browsers, I have an issue right here.

https://github.com/l1mey112/me.l-m.dev/issues/2

Obviously this needs to change. There is no reason to load every single thing at once, we need pagination.

Put 25-50 posts per page, and we’re done.

It wasn’t simple however. It did take me a single afternoon to implement, but my brain was on problem solving overdrive.

Migrations Are Hard

Okay, the website dumps all the posts at once on a single page load.

Ignoring performance downsides, there are some advantages. For one, it’s much simpler. Second, it allows you to create jump links for every single post without worrying about anything on the server. Simply use the browser to navigate you to the page you need.

You see, when you create a website WITHOUT JavaScript, you void the ability to do anything useful on the client side. The only thing the client can do it click URLs, submit forms, and be redirected (by the server).

When you build a website like that, it makes things easier when it is stateless. For the average request, the server doesn’t have to know anything about the client, it can just serve.

The Anchor

https://gnu.org/about#important-header

The anchor part of the URL is used to “jump” to a section of the page when it is loaded.

Think of it as a bookmark, representing the ID of a HTML element. If found, on page load, the browser will jump to it. It’s used in a wide range of places, particularly page headers.

The part after the # is never sent to the server. NEVER.

Never? The issue is this, the way posts are linked to.

https://me.l-m.dev/#1688263569

Take a look at this HTML. (very semantic right?)

This is generated by the backend for every single post.

<article id="1688263569"> <!-- /#1688263569 -->
	<header>
		<time datetime="2023-07-02T02:06:09.000Z">Sun, 2 Jul 2023 02:06:09 UTC</time>
		<p class="s">#1688263569</p>
		[ cs | vlang | web | github | open_source ]
		<div class="r">
			<a href="/#1688263569">[share]</a>
			<a href="/?edit=1688263569">[e]</a>
			<a href="/backup">[b]</a>
			<a href="/delete/1688263569">[x]</a>
		</div>
	</header>
	<main>
		<!-- content -->
	</main>
</article>

Note the <article id="1688263569">.

A simple link with an anchor can be used to jump to any single part of the page with a corresponding ID.

Now, if we want to add pages…

  1. The browser cannot see all of the posts and their IDs, as they’re filed away in pages.
  2. The server cannot see the post the user is targeting, as it’s stored away in an invisible anchor.

Big problem.

How I Solved It

The anchor part of a URL is strictly used for sharing posts.

The server needs that post ID to locate the page it in, then construct a jump inside an anchor to navigate the browser over.

One fix I initially came up with, using a ?p= query string.

https://me.l-m.dev/?p=1688263569#1688263569

No need for a duplicate anchor. The server can understand that this is a direct post link, and generate a custom ID for the anchor to target.

https://me.l-m.dev/?p=1688263569##

Ditch generating a custom ID for all posts on the page, it’s not needed anymore.

  1. Request /?p=1687975246##
  2. Server sees p=1687975246
  3. Server locates the page needed that includes the target post.
  4. Server collects all posts in that page and renders them.
  5. On the post with that ID, pass it <article id="#">.
  6. The browser accepts the page.
  7. The browser sees ##, then navigates to id="#".
  8. Post found.

JavaScript averted.

Meta?

Infact, I’ve known about this problem long before.

In the [share] URL, it provided a /?meta=000 instead of a /#000.

Why? For the same reason. The server needs to know the post being targeted to generate meta information so that it can be embedded nicely.

<!-- /?meta=1687472800 -->
<meta content="me.l-m.dev | #1687472800" property="og:title">
<meta content="[ music | x0o0x_ | deco27 | hatsune_miku ]" property="og:description">

It would then use another meta tag to perform an instant redirect inside browsers.

<meta http-equiv="refresh" content="0; url=/#1687472800"> <!-- instant redirect -->

This doesn’t need to be done anymore. The meta information can just be stored in the page.

And so, meta URLs are deprecated. Accessing one will throw you a 301 Moved Permanently redirect.

The Implementation

I may have solved it on paper, but how does it work in practice?

Using a combination of two SQL database queries, one to locate the page containing the post, and another to prepare the page, it can be done. It’s essentially a drop in fix, preceeding the rest of the database code.

const posts_per_page = 25

In main SQL query to prepare the post, the limit and offset, simple right?

page := /* computation */
db_query += " limit ${posts_per_page} offset ${posts_per_page * page}"

This is how it works in general, there are more at play here.

Queries

There are two kinds of requests the backend can handle.

Take this search. It will create the URL below, which the backend can accept. Keep in mind, pages are zero indexed.

  1. Search for keyword compiler
  2. Get the third page
  3. Tag vlang
  4. Tag optimisation
https://me.l-m.dev/?search=compiler&page=2&tag_vlang=on&tag_optimisation=on

This is a SearchQuery.

It is the second other type of query other than a post query. You cannot mix these two types of queries together however.

A root query, /, is just a blank SearchQuery.

fn get_search_query(req string) SearchQuery
type Query = SearchQuery | PostQuery

struct SearchQuery {
	search string
	tags []string
	page u64
}

struct PostQuery {
	post i64
}

The request url passed to the server is checked and stored in req for parsing.

A query can only store a search query or a post query, not both. If a request starts with a /?p=, it’s a post request, else, a search.

Also, if you didn’t catch on, the website uses UNIX epoch timestamps as post IDs. Pretty cool right?

// request code
mut query := unsafe { Query{} }

if req.starts_with('/?p=') {
	unix := req[4..].i64()
	query = PostQuery{unix}
} else {
	query = get_search_query(req)
}

The backend continues, building up a SQL db_query to be passed to the database to extract the data we need. However, for a post query, the page it is located in must be calculated first. It must be exact, so the user doesn’t see incorrect posts when jumping from the next page and back.

mut db_query := "select * from posts"
mut page := 0

match query {
	SearchQuery {
		if /* search */ {
			db_query += " where (content glob '*${search}*' collate nocase)"
		}
		if /* tags */ {
			db_query += " tags like '%${tag}%' escape '\\'"
		}

		page = query.page
	}
	PostQuery {
		posts_from_start := app.raw_query("select count(*) from posts where ...")

		page = (posts_from_start - 1) / posts_per_page
	}
}

db_query += " limit ${posts_per_page} offset ${posts_per_page * page}"

posts := app.raw_query(db_query).map(Post{ /* logic */})

As you saw above, the current page needs to be calculated regardless of query type. Once done, the posts can be rendered easily with a simple…

tmpl := $tmpl('tmpl/tmpl.html') // compile time templating!

After that, handle caching, then send the request over to the client.

The Navbar

When we serve content to the user, it’s on a specific page. We want to give the user the ability to navigate to the next or previous page if they exist.

Using the current query, generate links that show the next and previous pages, without destroying the query. We don’t want the server to store the query! We need stateless.

Here are some examples of queries, and it’s “next” URL. This is what we want.

  1. /?p=123123123 -> /?page=4

    The next page should be the page after the post is located on.

  2. /?search=text -> /?page=1&search=text

    Without a page query, it’s zero, increment it without destroying the search.

  3. /?page=5&tag_self=on -> /?page=6&tag_self=on

    Simply reconstruct the entire query as a URL, but only increment the page count.

This is the API used.

fn construct_next(query Query, post_page u64, no_next bool) ?string
fn construct_previous(query Query, post_page u64) ?string
<nav>
@if prev := construct_previous(query, page)
	<a href="/@{prev}">← Previous</a>
@else
	<div></div>
@end
@if next := construct_next(query, page, no_next)
	<a href="/@{next}">Next →</a>
@else
	<div></div>
@end
</nav>

Returning an none option signifies that there isn’t a previous or next page.

Back inside V templating, If statements are used to unwrap the options, then generates the HTML needed.

nav {
	display: flex;
	justify-content: space-between;
}

Works perfectly.

construct_next() == none

The end. I implemented it. You get something nice to read.

Enjoy!

https://me.l-m.dev/?p=1688263569##