I don't know exactly what the website was, but if it's just HTML, CSS, some JS and some images, why would you ever host that on a "pay per visit/bandwidth" platform like
AWS? Not only is AWS traffic extra expensive compared to pretty much any alternative, paying for bandwidth in that manner never made much sense to me. Even shared hosting like we did early 00s would have been a better solution for hosting a typical website than using AWS.
Especially when Cloudflare Pages is free with unlimited bandwidth, if you don't need any other backend. The only limit is 100 custom domains and 500 builds per month in their CI/CD, the latter of which you can bypass by just building everything in Github Actions and pushing it to Pages.
Because Cloudflare pages has this doofy deploy limit hanging over your head. Even if you won't reasonably run into it, it's still weird. R2's free limits are significantly higher.
> AWS CloudFront pricing seems pretty competitive with other CDNs
Sure, but it's unlikely you actually have to place a CDN in front of your manual, it's mostly text with few images. People default to using CDNs way too quickly today.
Because caddy/nginx/apache (any web server really) can serve that content as well as any other? Better question is; why default to using more things before you actually need them?
Personally, software engineering for me is mostly about trying to avoid accidental complexity. People obsessing about "web scale" and "distributed architecture" before they even figured out if people actually want to use the platform/product/tool they've used tends to add a lot of complexity.
> Because caddy/nginx/apache (any web server really) can serve that content as well as any other?
That's not really true if you care about reliability. You need 2 nodes in case one goes down/gets rebooted/etc, and then you need a way to direct traffic away from bad nodes (via DNS or a load balancer or etc).
You'll end up building half of a crappy CDN to try to make that work, and it's way more complicated than chucking CloudFlare in front of static assets.
I would be with you if this was something complicated to cache where you're server-side templating responses and can't just globally cache things, but for static HTML/CSS/JS/images it's basically 0 configuration.
> That's not really true if you care about reliability
While reliability is always some concern, we are talking about a website containing docs for a nerdy tool used by a minuscule percentage of developers. No one will complain if it goes down for 1h daily.
With the uptime guarantees AWS, GCS, DO, etc. provide - it will probably 1h per 365 days accumulative (@ 99.99% uptime). 2 nodes for a simple static site is just overkill.
But, honestly, for this: just use github pages. It's OSS and GitHub is already used. They can use a separate repository to host the docs if repo size from assets are a concern (e.g. videos).
How is setting up a web server not using more things than you need when you could just drag and drop a folder using one of many different CDN providers? (Or of course set up integrations as you want)
Just because you're using a UI doesn't mean it isn't more complicated. I'm thinking "complexity" in terms of things intertwined, rather than "difficult for someone used to use GUIs".
It's really a tradeoff of saving time by paying more money. A lot of people chose it when they'd rather not pay more money and end up unhappy
A lot of other people also pick it for very narrow use cases where it wouldn't have been that much more time to learn and do it themselves and end up paying a lot of money and also aren't happy
It's pretty nice for mid-size startups to completely ignore performance and capacity planning and be able to quickly churn out features while accumulating tech debt and hoping they make it long enough to pay the tech debt back
A year ago I researched this topic for a static website of my own. All providers I looked at were $5 and I want to say the cheapest I found was slightly lower. By comparison, I am still within free tier limits of AWS S3 and cloudfront (CDN) since I am not getting much traffic. So my website is on edge locations all over the world as part of their CDN for free, but if I host on a single server in Ohio it costs $5/month.
An idle site that receives no hits still costs around $1.50/month with NearlyFreeSpeech.net since the change that limits the number of "non-production" sites that was instituted from around the time when Cloudflare decided to kick out the white supremacists.
Also worth pointing out that the author of Magit has made the unusual choice to make a living off developing Emacs packages. I've been happy to pitch in my own hard earned cash in return for the awesomeness that Magit is!
Unless something has changed recently, all you can do is set budget alerts on billing updates. Runaway costs for people simply testing AWS is common. (On the bright side, again unless something has changed recently, asking them in support to scrap them works.)
Both of these were problems for the author. I don't mean to "blame the victim" but the choice of AWS here had a predictable outcome. Static documentation is the easiest content to host and AWS is the most expensive way to host it.
Really high bandwidth costs in general. I've never worked anywhere large enough to hit them, but I've heard inter-AZ traffic in the same region can become quite expensive once you're big enough
This is true. There are services that force use of multi-AZ deployment, like their version of Kafka, or basically anything that creates autoscaling groups between AZs (like EKS). Without tight monitoring stuff can get out of hand fast.
What surprised me is you get charged both ways. $0.01/GB egress out of the source AZ and $0.01/GB ingress into the destination AZ. So it's easy to underestimate the billing impact by half.
Uggghhhh! AI crawling is fast becoming a headache for self-hosted content. Is using a CDN the "lowest effort" solution? Or is there something better/simpler?
Nah, just add a rate limiter (which any public website should have anyways). Alternatively, add some honeypot URLs to robots.txt, then setup fail2ban to ban any IP accessing those URLs and you'll get rid of 99% of the crawling in half a day.
I gave up after blocking 143,000 unique IPs hitting my personal Forgejo server one day. Rate limiting would have done literally nothing against the traffic patterns I saw.
2 unique IPs or 200,000 shouldn't make a difference, ban the ones that make too many requests automatically and you basically don't have to do anything.
Are people not using fail2ban and similar at all anymore? Used to be standard practice until I guess before people started using PaaS instead and "running web applications" became a different role than "developing web applications".
It makes a difference if there's 143,000 unique IPs and 286,000 requests. I think that's what the parent post is saying (lots of requests but also not very many per IP since there's also lots of IPs)
Even harder with IPv6 considering things like privacy extensions where the IPs intentionally and automatically rotate
Yes, this is correct. I’d get at most 2 hits from an IP, spaced minutes apart.
I went as far as blocking every AS that fetched a tripwire URL, but ended up blocking a huge chunk of the Internet, to the point that I asked myself whether it’d be easier to allowlist IPs, which is a horrid way to run a website.
But I did block IPv6 addresses as /48 networks, figuring that was a reasonable prefixlen for an individual attacker.
I mean, people are held hostage by "professionals" that will set up some overcomplicated backend or vercel stuff instead of a static single html page with opening hours and the menu.
It will give one. Will it work? No. You seem to not understand that AI crawlers mask as multiple clients to avoid rate limiting and are quite skilled at that.
Depending on the content and software stack, caching might be a fairly easy option. For instance, Wordpress W3 Total Cache used to be pretty easy to configure and could easily bring a small VPS from 6-10req/sec to 100-200req/sec.
Also some solutions for generating static content sites instead of "dynamic" CMS where they store everything in a DB
If it's new, I'd say the easiest option is start with a content hosting system that has built-in caching (assuming that exists for what you're trying to deploy)
I cant get why people link to some random github issue instead of writing down a short TLDR and giving the option to read more. Guess its just lazyness.
I don't know exactly what the website was, but if it's just HTML, CSS, some JS and some images, why would you ever host that on a "pay per visit/bandwidth" platform like AWS? Not only is AWS traffic extra expensive compared to pretty much any alternative, paying for bandwidth in that manner never made much sense to me. Even shared hosting like we did early 00s would have been a better solution for hosting a typical website than using AWS.
Especially when Cloudflare Pages is free with unlimited bandwidth, if you don't need any other backend. The only limit is 100 custom domains and 500 builds per month in their CI/CD, the latter of which you can bypass by just building everything in Github Actions and pushing it to Pages.
Because Cloudflare pages has this doofy deploy limit hanging over your head. Even if you won't reasonably run into it, it's still weird. R2's free limits are significantly higher.
AWS CloudFront pricing seems pretty competitive with other CDNs, at least for sites that are not very high traffic.
https://aws.amazon.com/cloudfront/pricing/
> AWS CloudFront pricing seems pretty competitive with other CDNs
Sure, but it's unlikely you actually have to place a CDN in front of your manual, it's mostly text with few images. People default to using CDNs way too quickly today.
But why host my own static content when cloudflare r2 buckets will give me a million reads a month for free?
The whole point of CDNs is to host static assets, why wouldn't you use one? They are dead simple to use.
Because caddy/nginx/apache (any web server really) can serve that content as well as any other? Better question is; why default to using more things before you actually need them?
Personally, software engineering for me is mostly about trying to avoid accidental complexity. People obsessing about "web scale" and "distributed architecture" before they even figured out if people actually want to use the platform/product/tool they've used tends to add a lot of complexity.
> Because caddy/nginx/apache (any web server really) can serve that content as well as any other?
That's not really true if you care about reliability. You need 2 nodes in case one goes down/gets rebooted/etc, and then you need a way to direct traffic away from bad nodes (via DNS or a load balancer or etc).
You'll end up building half of a crappy CDN to try to make that work, and it's way more complicated than chucking CloudFlare in front of static assets.
I would be with you if this was something complicated to cache where you're server-side templating responses and can't just globally cache things, but for static HTML/CSS/JS/images it's basically 0 configuration.
> That's not really true if you care about reliability
While reliability is always some concern, we are talking about a website containing docs for a nerdy tool used by a minuscule percentage of developers. No one will complain if it goes down for 1h daily.
With the uptime guarantees AWS, GCS, DO, etc. provide - it will probably 1h per 365 days accumulative (@ 99.99% uptime). 2 nodes for a simple static site is just overkill.
But, honestly, for this: just use github pages. It's OSS and GitHub is already used. They can use a separate repository to host the docs if repo size from assets are a concern (e.g. videos).
How is setting up a web server not using more things than you need when you could just drag and drop a folder using one of many different CDN providers? (Or of course set up integrations as you want)
Just because you're using a UI doesn't mean it isn't more complicated. I'm thinking "complexity" in terms of things intertwined, rather than "difficult for someone used to use GUIs".
And configuring a web server as well as the server it is running on is not intertwined complexity?
You're welcome to set up the CDN with a CLI...
I'm thinking of complexity as "shit I have to pay for"
Far more expensive than just having a dumb server somewhere at some normal host.
People simply do not understand how expen$ive AWS is, and how little value it actually has for most people.
It's really a tradeoff of saving time by paying more money. A lot of people chose it when they'd rather not pay more money and end up unhappy
A lot of other people also pick it for very narrow use cases where it wouldn't have been that much more time to learn and do it themselves and end up paying a lot of money and also aren't happy
It's pretty nice for mid-size startups to completely ignore performance and capacity planning and be able to quickly churn out features while accumulating tech debt and hoping they make it long enough to pay the tech debt back
A year ago I researched this topic for a static website of my own. All providers I looked at were $5 and I want to say the cheapest I found was slightly lower. By comparison, I am still within free tier limits of AWS S3 and cloudfront (CDN) since I am not getting much traffic. So my website is on edge locations all over the world as part of their CDN for free, but if I host on a single server in Ohio it costs $5/month.
Did you check NearlyFreeSpeech.net? If you are getting little traffic it costs practically nothing for a static site.
An idle site that receives no hits still costs around $1.50/month with NearlyFreeSpeech.net since the change that limits the number of "non-production" sites that was instituted from around the time when Cloudflare decided to kick out the white supremacists.
Most people will never make it past the free tier on any of CloudFront, Cloudflare, Netlify, Render, etc.
You can just drag and drop a folder and have a static site hosted in a few minutes.
Seems like the maintainer just tried AWS out of curiosity, and never needed to optimize hosting until scrapers suddenly slammed the site.
I love Magit, it's the only git gui I can stomach (it helps that I use Emacs already).
I donated a bit of money to help tarsius offset the cost of AWS LLM abuse, well deserved for the value I've gotten from his tools.
This is a good reminder of the real financial costs incurred by maintainers of your favorite Emacs packages.
Here's a nice repo with details on how to support them!
https://github.com/tarsius/elisp-maintainers
Also worth pointing out that the author of Magit has made the unusual choice to make a living off developing Emacs packages. I've been happy to pitch in my own hard earned cash in return for the awesomeness that Magit is!
>I immediately burned down my account with that hosting provider1, because they did not allow setting a spending limit.
Is this true? He mentions the provider being AWS, surely some sort of threshold can be set?
If it's AWS, yes it's true. All the billing is async and some as slow as daily (although it can be very granular/accurate).
In addition, it's a pay-per-use platform
Unless something has changed recently, all you can do is set budget alerts on billing updates. Runaway costs for people simply testing AWS is common. (On the bright side, again unless something has changed recently, asking them in support to scrap them works.)
As far as I am aware, there is not. It’s been a long standing complaint about the platform.
There are two widely understood downsides of AWS:
1. High egress costs
2. No hard spending limits
Both of these were problems for the author. I don't mean to "blame the victim" but the choice of AWS here had a predictable outcome. Static documentation is the easiest content to host and AWS is the most expensive way to host it.
Really high bandwidth costs in general. I've never worked anywhere large enough to hit them, but I've heard inter-AZ traffic in the same region can become quite expensive once you're big enough
This is true. There are services that force use of multi-AZ deployment, like their version of Kafka, or basically anything that creates autoscaling groups between AZs (like EKS). Without tight monitoring stuff can get out of hand fast.
What surprised me is you get charged both ways. $0.01/GB egress out of the source AZ and $0.01/GB ingress into the destination AZ. So it's easy to underestimate the billing impact by half.
Watch out for NAT Gateway, too. An extra $0.045/GB in both directions.
Magit was my favorite terminal/TUI way to interact with Git, until I found GitUI.
"Thanks to LLM scrapers, hosting costs went up 5000% last month"
Uggghhhh! AI crawling is fast becoming a headache for self-hosted content. Is using a CDN the "lowest effort" solution? Or is there something better/simpler?
Nah, just add a rate limiter (which any public website should have anyways). Alternatively, add some honeypot URLs to robots.txt, then setup fail2ban to ban any IP accessing those URLs and you'll get rid of 99% of the crawling in half a day.
I gave up after blocking 143,000 unique IPs hitting my personal Forgejo server one day. Rate limiting would have done literally nothing against the traffic patterns I saw.
2 unique IPs or 200,000 shouldn't make a difference, ban the ones that make too many requests automatically and you basically don't have to do anything.
Are people not using fail2ban and similar at all anymore? Used to be standard practice until I guess before people started using PaaS instead and "running web applications" became a different role than "developing web applications".
It makes a difference if there's 143,000 unique IPs and 286,000 requests. I think that's what the parent post is saying (lots of requests but also not very many per IP since there's also lots of IPs)
Even harder with IPv6 considering things like privacy extensions where the IPs intentionally and automatically rotate
Yes, this is correct. I’d get at most 2 hits from an IP, spaced minutes apart.
I went as far as blocking every AS that fetched a tripwire URL, but ended up blocking a huge chunk of the Internet, to the point that I asked myself whether it’d be easier to allowlist IPs, which is a horrid way to run a website.
But I did block IPv6 addresses as /48 networks, figuring that was a reasonable prefixlen for an individual attacker.
If only it were that easy.
And for many people, "easy" is hardly the word to describe that.
No wonder small businesses just put their information on Facebook instead of trying to manage a website.
I mean, people are held hostage by "professionals" that will set up some overcomplicated backend or vercel stuff instead of a static single html page with opening hours and the menu.
The poison's also the cure! Just ask AI for a haproxy rate limit config
It will give one. Will it work? No. You seem to not understand that AI crawlers mask as multiple clients to avoid rate limiting and are quite skilled at that.
Depending on the content and software stack, caching might be a fairly easy option. For instance, Wordpress W3 Total Cache used to be pretty easy to configure and could easily bring a small VPS from 6-10req/sec to 100-200req/sec.
Also some solutions for generating static content sites instead of "dynamic" CMS where they store everything in a DB
If it's new, I'd say the easiest option is start with a content hosting system that has built-in caching (assuming that exists for what you're trying to deploy)
I quit emacs 10 years ago. But i have fond menories from magit. Why was the manual taken offline?
https://github.com/magit/magit/issues/5472
Why didn't you just follow the link and find out?
he's waiting for the magit-transient shortcut :fff
Link did not show any data on the why. Im not reading a long blamegame on the why on some random issue.
it took me a minute to figure it out from the link. likely less time than you spent on this comment thread.
I cant get why people link to some random github issue instead of writing down a short TLDR and giving the option to read more. Guess its just lazyness.