We cut our Mongo DB costs by 90% by moving to Hetzner

257 points by arbol 3 days ago

> Here's how we managed to cut our costs by 90%

You could cut your MongoDB costs by 100% by not using it ;)

> without sacrificing performance or reliability.

You're using a single server in a single datacenter. MongoDB Atlas is deployed to VMs on 2-3 AZs. You don't have close to the same reliability. (I'm also curious why their M40 instance costs $1000, when the Pricing Calculator (https://www.mongodb.com/pricing) says M40 is $760/month? Was it the extra storage?)

> We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers

This means you're going to have multiple outages, AND incur more cross-internet costs. How does going to Hetzner make you more resilient to outages? You have one server in one datacenter. Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB). You do not have a centralized or single point of failure design with AWS. They're not dummies; plenty of their services are operated independently per region. But they do expect you to use their infrastructure intelligently to avoid creating a single point of failure. (For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.)

I get it; these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense. There's only a few circumstances where you really have to transfer out a lot of traffic, or need very large storage, where cloud pricing is just too much of a premium. The whole point of using the cloud is to use it as a competitive advantage. Giving yourself an extra role (sysadmin) in addition to your day job (developer, data scientist, etc) and more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.

toast0 3 days ago

> Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB).
If traffic cost is relevant (which it is for a lot of use cases), Hetzner's price of $1.20/TB ($0.0012 / GB) for internet traffic [1] is an order of magnitude less than what AWS charges between AWS locations in the same metro. If you host only at providers with reasonable bandwidth charges, most likely all of your bandwidth will be billed at less than what AWS charges for inter-zone traffic. That's obscene. As far as I can tell, clouds are balancing their budgets on the back of traffic charges, but nothing else feels under cost either.
> For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
This doesn't always work out. During the GCP outage, my service was running fine, but other similar services were having trouble, so we attracted more usage, which we would have scaled up for, except that the GCP outage prevented that. Cloud makes it very expensive to run scaled beyond current needs and promises that scale out will be available to do just in time...
[1] https://docs.hetzner.com/robot/general/traffic/
- precommunicator 2 days ago
  
  keep in mind, for dedicated servers, traffic is free and unlimited - see the page you've linked
  - toast0 2 days ago
    
    Not if you're running at 10G...
canucktrash669 2 days ago

At some point our cross-AZ traffic for Elasticsearch replication at AWS was more expensive than what we'd pay to host the whole cluster replicated across multiple baremetal Hetzner servers.
Could we have done better with more sensible configs? Was it silly to cluster ES cross-AZ? Maybe. Point is that if you don't police every single detail of your platform at AWS/GCP and the like, their made-up charges will bleed your startup and grease their stock price.
- canucktrash669 2 days ago
  
  turns out cross AZ is recommended for ES. perhaps our data team was rewritting the indices too often. but it was an internal requirement. so I think the data schema could have been more efficient to append deltas instead of reindexing all. but none of that will inflate your bill significantly at Hetzner. of course it will at AWS as that's how they incentivise clients to optimize and reduce their impact. and that's how you cut your runway by 3-6 months in compute heavy startups
goastler 3 days ago

> you're going to have multiple outages us: 0, aws: 1. Looking good so far ;)
> AND incur more cross-internet costs hetzner have no bandwidth traffic limit (only speed) on the machine, we can go nuts.
I understand you point wrt the cloud, but I spend as much time debugging/building a cloud deployment (atlas :eyes: ) as I do a self-hosted solution. Aws gives you all the tools to build a super reliable data store, but many people just chuck something on us-east-1 and go. There's you single point of failure.
Given we're constructing a many-node decentralised system, self-hosted actually makes more sense for us because we've already had to become familiar enough to create a many-node system for our primary product.
When/if we have a situation where we need high data availability I would strongly consider the cloud, but in the situations where you can deal with a bit of downtime you're massively saving over cloud offerings.
We'll post a 6-month and 1-year follow-up to update the scoreboard above
- runako 3 days ago
  
  > many people just chuck something on us-east-1 and go
  Even dropping something on a single EC2 node in us-east-1 (or at Google Cloud) is going to be more reliable over time than a single dedicated machine elsewhere. This is because they run with a layer that will e.g. live migrate your running apps in case of hardware failures.
  The failure modes of dedicated are quite different than those of the modern hyperscaler clouds.
  - chubot 3 days ago
    
    It's not an apples-to-apples comparison, because EC2 and Google Cloud have ephemeral disk - persistent disk is an add-on, which is implemented with a complex and frequently changing distributed storage system
    On the other hand, a Hetzner machine I just rented came with Linux software RAID enabled (md devices in the kernel)
    ---
    I'm not aware of any comparisons, but I'd like to see see some
    It's not straightforward, and it's not obvious the cloud is more reliable
    The cloud introduces many other single points of failure, by virtue of being more complex
    e.g. human administration failure, with the Unisuper incident
    https://news.ycombinator.com/item?id=40366867
    https://arstechnica.com/gadgets/2024/05/google-cloud-acciden... - “Unprecedented” Google Cloud event wipes out customer account and its backups
    Of course, dedicated hardware could have a similar type of failure, but I think the simplicity means there is less variety in the errors.
    e.g. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable - Leslie Lamport
    
    travisgriggs 2 days ago
    
    > by virtue of being more complex
    I just wish there was a way to underscore this more and more. Complex systems fail in complex ways. Sadly, for many programmers, the thrill or ego boost that comes with solving/managing complex problems lets us believe complex is better than simple.
    
    antod 2 days ago
    
    One side effect of devops over the last 10-15yrs I've noticed as dev and ops converged is that infrastructure complexity exploded as the old school pessimistic sysadmin culture of simplicity and stability gave way to a much more optimistic dev culture. Also better tooling also enabled increased complexity in a self fulfilling feedback loop as more complexity also demanded better tooling.
    It's kept me employed though...
    
    Anonyneko 2 days ago
    
    Anecdotal, but a year ago we lost the whole RAID array in a rented Hetzner server to some hardware failure.
    In a way, I think it doesn't matter what you use as long as you diversify enough (and have lots of backups), as everything can fail, and often the probability of failure doesn't even matter that much as any failure can be one too many.
  - jabwd 2 days ago
    
    The internet was designed to survive nukes.
    Lets host it all with 2 companies instead and see how it goes.
    Anyway random things you will encounter: Azure doesn't work because frontdoor has issues (again, and again) A webapp in Azure just randomly stops working, its not live migrated by any means, restarts don't work. Okay lets change SKU, change it back, oop its on a different baremetal cluster and now it works again. Sure there'll be some setup (read, upsell) that'll prevent such failures from reaching customers, but there is just simply no magic to any of this.
    Really wish people would stop dreaming up reasons that hyperscalars are somehow magical places where issues don't happen and everything is perfect if you justtt increase the complexity a little bit more the next time around.
  - wongarsu 2 days ago
    
    Hardware failures on server hardware at the scale of 1 machine are far less common than us-east-1 downtime
    The typical failure mode of AWS is much better. Half the internet is down, so you just point at that and wait for everything to come back, and your instances just keep running. If you have one server you have to do the troubleshooting and recovery work. But you need to run more than one machine to get fewer nines of reliability
    
    runako 2 days ago
    
    > Hardware failures on server hardware at the scale of 1 machine are far less common than us-east-1 downtime
    A couple pieces of gentle pushback here:
    - if you chose a hyperscaler, you should use their (often one-click) geographic redundancy & failover.
    - All of the hyperscalers have more than one AZ. Specifically, there's no reason for any AWS customer to locate all/any* of their resources in us-east-1. (I actively recommend against this.)
    * - Except for the small number of services only available in us-east-1, obviously.
    
    wongarsu 2 days ago
    
    Hetzner also offers more than one datacenter, which you should obviously use if you want geographic redundancy. But the comment I was replying was saying "Even dropping something on a single EC2 node in us-east-1", and for a single EC2 node in us-east-1 none of the things you are mentioning are possible without violating the premise
- MobileVet 3 days ago
  
  Thanks for sharing the story and committing to a 6-month and 1 year follow up. We will definitely be interested to hear further how it went over time.
  In the mean time, I am curious where the time was spent debugging and building Atlas deployments? It certainly isn't the cheapest option, but it has been quite a '1 click' solution for us.
- kdazzle 3 days ago
  
  I’m curious about the resilience bit. Are you planning on some sort of active-active setup with mongo? I found it difficult on AWS to even do active-passive (i guess that was docdb), since programatically changing the primary write node instance was kind of a pain when failing over to a new region.
  Going into any depth with mongo mostly taught me to just stick with postgres.
dspillett 3 days ago

> You're using a single server in a single datacenter.
This is a common problem with “bare metal saved us $000/mo” articles. Bare metal is cheaper than cloud by any measure, but the comparisons given tend to be misleadingly exaggerated as they don't compare like-for-like in terms of redundancy and support, and after considering those factors it can be a much closer result (sometimes down as far as familiarity and personal preference being more significant).
Of course unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you, and that single point of failure might not really matter if there are several others throughout your systems anyway, as is sometimes the case.
- Aeolun 2 days ago
  
  I think the problem is that the multi-az redundancy in AWS setups has saved me exactly zero times. The problem is nearly always some application issue.
- PenguinCoder 3 days ago
  
  Premature optimization. Not every single service needs or require 5 nines.
  - bastawhiz 2 days ago
    
    What does that mean, though?
    If I'm storing data on a NAS, and I keep backups on a tape, a simple hardware failure that causes zero downtime on S3 might take what, hours to recover? Days?
    If my database server dies and I need to boot a new one, how long will that take? If I'm on RDS, maybe five minutes. If it's bare metal and I need to install software and load my data into it, perhaps an hour or more.
    Being able to recover from failure isn't a premature optimization. "The site is down and customers are angry" is an inevitability. If you can't handle failure modes in a timely manner, you aren't handling failure modes. That's not an optimization, that's table stakes.
    It's not about five nines, it's about four nines or even three nines.
    
    ffsm8 2 days ago
    
    You're confusing backup with high availability.
    Backups are point in time snapshots of data, often created daily and sometimes stored on tape.
    It's primary usecase is giving admins the ability to e.g restore partial data via export and similar. It can theoretically also be used to restore after you had a full data loss, but that's beyond rare. Almost no company has had that issue.
    This is generally not what's used in high availability contexts. Usually, companies have at least one replica DB which is in read only and only needs to be "activated" in case of crashes or other disasters.
    With that setup you're already able to hit 5 nines, especially in the context of b2e companies that usually deduct scheduled downtimes via SLA
    
    bcrl 2 days ago
    
    I know one company that strove for five sixes.
  - chasd00 2 days ago
    
    you have to look at all the factors, a simple server in a simple datacenter can be very very stable. When we were all doing bare metal servers back in the day server uptimes measured in years wasn't that rare.
  - dspillett 2 days ago
    
    This is true. Also some things are just fine, in fact sometimes better (better performing at the scale they actually need and easier to maintain, deploy, and monitor), as a single monolith instead of a pile of microservices. But when comparing bare metal to cloud it would be nice for people to acknowledge what their solution doesn't give, even if the acknowledgement comes with the caveat “but we don't care about that anyway because <blah>”.
    And it isn't just about 9s of uptime, it is all the admin that goes with DR if something more terrible then a network outage does happen, and other infrastructure conveniences. For instance: I sometimes balk at the performance we get out of AzureSQL given what we pay for it, and in my own time you are safe to bet I'll use something else on bare metal, but while DayJob are paying the hosting costs I love the platform dealing with managing backup regimes, that I can do copies or PiT restores for issue reproduction and such at the click of the button (plus a bit of a wait), that I can spin up a fresh DB & populate it without worrying overly about space issues, etc.
    I'm a big fan of managing your own bare metal. I just find a lot of other fans of bare metal to be more than a bit disingenuous when extolling its virtues, including cost-effectiveness.
  - dpkirchner 2 days ago
    
    It's true, but I'm woken up more frequently if there are fewer 9s, which is unpleasant. It's worth the extra cost to me.
  - hdgvhicv 2 days ago
    
    Hence you can use AWS to host them.
  - withinboredom 2 days ago
    
    and each additional nine increases complexity geometrically.
- celsoazevedo 2 days ago
  
  It doesn't have to be one server in a single datacenter, though. It adds some complexity, but you could have a backup server ready to go at a different cheap provider (Hetzner and OVH, for example) and still save a lot.
- dvfjsdhgfv 2 days ago
  
  > unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you
  Unfortunately it's not guaranteed hat paying for multi-region replication will save you.
mnutt 3 days ago

> we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
I think it was just luck of the draw that the failure happened in this way and not some other way. Even if APIs falling over but EC2 instances remaining up is a slightly more likely failure mode, it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace.
- 0xbadcafebee 2 days ago
  
  > it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace
  Yes, this is part of designing for reliability. If you use spot or autoscaling, you can't assume you will have high availability in those components. They're optimizations, like a cache. A cache can disappear, and this can have a destabilizing effect on your architecture if you don't plan for it.
  This lack of planning is pretty common, unfortunately. Whether it's in a software component or system architecture, people often use a thing without understanding the implications of it. Then when AWS API calls become unavailable, half the internet falls over... because nobody planned for "what happens when the control plane disappears". (This is actually a critical safety consideration in other systems)
  - mnutt 2 days ago
    
    Sure, you can only use EC2, not use autoscaling or spot and instead just provision to your highest capacity needs, and not use any other AWS service that relies on dynamo as a dependency.
    We still take some steps to mitigate control plane issues in what I consider a reasonable AWS setup (attempt to lock ASGs to prevent scale-down) but I place the control plane disappearing on the same level as the entire region going dark, and just run multi-region.
lxe 2 days ago

I think you underestimate how reduction in complexity can increase reliability. becoming a sysadmin for a single inexpensive server instance carries almost the same operational burden as operating an unavoidably very complicated cluster using a cloud provider.
- hdgvhicv 2 days ago
  
  Nowhere near the same. Admining a few servers is far easier than a mix of AWS cloud services, especially when they are either metal as a service or plain VMs.
- gervwyk 2 days ago
  
  not if you are using Atlas. Its as simple as it can be with way more functionality you can ever admin in yourself.
  As others have said unless the scale of the data is the issue, if your switching because of cost, perhaps you should be going back to your business model instead.
whatever1 2 days ago

AWS and Azure were down for a full day in the past month.
No way I cannot spin up my infra in a full day even if the current datacenter burns to the ground.
So we have the same reliability.
- hsbauauvhabzb 2 days ago
  
  Not if you don’t have hot replicated user data etc, assuming that matters, which it will unless you outsource auth and if you do that you’re back at square 1
  - walletdrainer 2 days ago
    
    Just have backups, you could literally sync everything to rsync.net unless you have a ridiculous amount of users
koakuma-chan 3 days ago

> You could cut your MongoDB costs by 100% by not using it ;)
I cut my Mongo DB costs by 100% by piping my data to /dev/null.
- philote 3 days ago
  
  https://github.com/dcramer/mangodb
- diffuse_l 2 days ago
  
  At least it's ACID compliant
celsoazevedo 2 days ago

> You have one server in one datacenter.
It doesn't have to be only one server in one datacenter though.
It's more work, but you can have replicas ready to go at other Hetzner DCs (they offer bare metal at 3 locations in 2 different countries) or at other cheaper providers like OVH. Two or three $160 servers is still cheaper than what they're paying right now.
port11 2 days ago

These types of posts make for excellent karma farming, but this one does present all the issues you've mentioned. Heck, Scaleway has managed Mongo for a bit more money and with redundancy and multi-AZ to boot. Were they trying to go as cheap as possible?
raxxorraxor 2 days ago

I don't buy it. It really depends on your service, but I don't believe the reliability story. All large providers have had outages and I do host services on a single server that didn't have an outage in a few years.
Depends on the service and its complexity. More complexity means more outages. In most instances a focus on easy recoverability is more productive than preemptive "reliability". As I have said, depends on your service.
And prices get premium very fast if you have either a lot of traffic or low traffic but larger file interchange. And you have more work to do if you use the cloud, because it uses non-standard interfaces. Today a well maintained server is a few clicks away. Even for managed servers you have maintenance and configuration. Plus, your provider probably changes the service quite often. I had to accommodate beanstalk while my application was just running on its own, free of maintenance needs.
hsbauauvhabzb 2 days ago

> For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
Naïve. If the network infrastructure is down, your computer goes down, it just happens that the functionality that went down you didn’t rely on. You could not rely on any functions at all by turning the server off, too.
- tekno45 2 days ago
  
  yeah loadbalancers definitely removed instances during that outage. Watched it happen.
KaiserPro 2 days ago

> MongoDB Atlas is deployed to VMs on 2-3 AZs
I've not actually seen an AZ go down in isolation, so whilst I agree its technically a less "robust" deployment, in practice its not that much of a difference.
> these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense.
We moved away from atlas because they couldn’t cope with the data growth that we had(4tb is the max per DB). Turns out that its a fuck load cheaper even hosting on amazon (as in 50%). We haven't moved to hertzner because that would be more effort than we really want to expend, but its totally doable, with not that much extra work.
> more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.
Depends right, firstly its not that much of an overhead, and if it saves you significant cash, then it increases your run rate.
- korkybuchek 2 days ago
  
  > I've not actually seen an AZ go down in isolation
  Counterpoint: I have. Maybe not completely down, but degraded, or out of capacity on an instance type, or some other silly issue that caused an AZ drain. It happens.
  - dvfjsdhgfv 2 days ago
    
    While I agree, I remember we once had cross-region replication for some product but when AWS was down the service was down anyway because of some dependency. Things were working fine during our DR exercises, but when the actual failure arrived, cross-region turned out useless.
winrid 2 days ago

At FastComments we have Mongo deployed across three continents and four regions on dedicated servers with full disk encryption, across two major providers just incase. It was setup by one person. Replication lag is usually under 300ms.
navigate8310 2 days ago

Usually AWS is pretty good at hiding all the reliability and robustness that goes onto into making a customer's managed service. Customers are not made aware what it takes.
- chasd00 2 days ago
  
  An interesting experiment would be doing the equivalent at the scale of the median saas company.
  Setup mongodb (or any database) so that you have geographically distributed nodes with replication+whatever else and maintain the same SLA as one of the big hyperscalers. Blog about how long did it take to setup, how hard is it to maintain, and how much are the ongoing costs.
  My hunch is a setup on the scale of the median saas company is way more simple and cost effective than you'd think.
torginus 2 days ago

How often do the AZs matter? - I feel like there's a major global outage on every cloud provider of choice, at least every other year, yet I don't remember any outage where only a single AZ went down (I'm on AWS).
Fighting said outages is often made harder is that the providers themselves just don't admit to anything being wrong, everything's green on the dashboard yet 4 out of 5 requests are timing out.
rajman187 2 days ago

> You could cut your MongoDB costs by 100% by not using it ;)
Came here to say exactly this
Glamklo 3 days ago

[dead]

speedgoose 2 days ago

I wonder how many companies are running databases on non-encrypted storage on Hetzner.

Their bare-metal servers don't have storage encryption by default, and I don't know for sure about the VM hosts, as I don't have access, but Hetzner never claims that it is encrypted at rest. And there is no mention of storage encryption in their data protection agreement. https://www.hetzner.com/AV/DPA_en.pdf

Also, their data privacy FAQ mentions "you as the customer are responsible for both the data that is stored on your rented server and for the encryption of that data." https://docs.hetzner.com/general/general-terms-and-condition...

I would recommend, just in case, to set up LUKS on your server. You will find many guides for Hetzner.

If you don't do that, seeing your data in the wild is a real scenario. A few years ago, a Youtuber bought some used hard-drives in the hope to recover data to illustrate the risks of not erasing a hard-drive correctly. He eventually bought a hard-drive containing non-encrypted VM disks from Scaleway, a Hetzner competitor. My guess is that some hard drives disappeared before destruction after being decommissioned. Some customers got their shitty source code exposed on a 1.4M views video. Here is the first one: https://www.youtube.com/watch?v=vt8PyQ2PGxI

So, use LUKS.

immibis 2 days ago

Indeed, this is something to be aware of.
They are not selling you a magical SaaS solution. They are renting to you a particular physical server on a particular physical shelf. In principle, you could break into the DC and steal your server and lay it on your desk at home and it would operate exactly as it did inside the DC.
Some people accustomed to cloud expect magic from their dedicated servers, which does not exist.

mads_quist 3 days ago

OK guys, running on a single instance is REALLY a BAD IDEA for non-pet-projects. Really bad! Change it as fast as you can.

I love Hetzner for what they offer but you will run into huge outages pretty soon. At least you need two different network zones on Hetzner and three servers.

It's not hard to setup, but you need to do it.

MaKey 3 days ago

I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.
- mads_quist 2 days ago
  
  You'll have planned downtime just for upgrading MongoDB version or rebooting the instance. I don't think that this is sth you'd want to have. Running MongoDB in a replica set is really easy and much easier than running postgres or MySQL in an HA setup.
  No need for SREs. Just add 2 more Hetzner servers.
  - spwa4 2 days ago
    
    The sad part of that is that 3 Hetzner servers are still less than 20% of the price of equivalent AWS resources. This was already pretty bad when AWS started, but now it's reaching truly ridiculous proportions.
    from the "Serverborse": i7-7700 with 64GB ram and 500G disk.
    37.5 euros/month
    This is ~8 vcpus + 64GB ram + 512G disk.
    585 USD/month
    It gets a lot worse if you include any non-negligible internet traffic. How many machines before for your company a team of SREs is worth it? I think it's actually dropped to 100.
    
    mads_quist 2 days ago
    
    Sure, I am not against Hetzner, it's great. I just find that running sth in HA mode is important for any service that is vital to customers. I am not saying that you need HA for a website. Also, I run many applications NOT in HA mode but those are single customer applications where it's totally fine to do maintenance at night or on the weekend. But for SaaS this is probably not a very good idea.
- lewiscollard 3 days ago
  
  Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.
  - smartbit 2 days ago
    
    Normal Accidents https://en.wikipedia.org/wiki/Normal_Accidents
  - immibis 2 days ago
    
    How bad are the effects of an interruption for you? Google has servers running every day, but you with one server can afford to gamble on it, since it probably won't fail for years - no matter the hardware though, keep a backup, because data loss is permanent. Would you lose millions of dollars a minute, or would you just have to send an email to customers saying "oops"?
    Risk management is a normal part of business - every business does it. Typically the risk is not brought down all the way to zero, but to an acceptable level. The milk truck may crash and the grocery store will be out of milk that day - they don't send three trucks and use a quorum.
    If you want to guarantee above-normal uptime, feel free, but it costs you. Google has servers failing every day just because they have so many, but you are not Google and you most likely won't experience a hardware failure for years. You should have a backup because data loss is permanent, but you might not need redundancy for your online systems. Depending on what your business does.
- PunchyHamster 2 days ago
  
  HA can be hard to get right, sure, but you have to at least have (TESTED) plan for what happens
  "Run a script to deploy new node and load last backup" can be enough, but then you have to plan on what to tell customers when last few hours of their data is gone
badestrand 2 days ago

I have a website with hundreds of thousands of monthly visitors running on a single Hetzner machine since >10 years (switched machines inside Hetzner a few times though).
My outage averages around 20 minutes per year, so an uptime of around 99.996%.
I have no idea where you see those "huge outages" coming from.
freefaler 2 days ago

We have used Hetzner for 15+ years. There were some outages with the nastiest being the network ones. But they're usually not "dramatically bad" if you build with at least basic failover. With this we had seen less than 1 serious per 3 years. Most of the downtime is because of our own stupidity.
If you know what you're doing Hetzner is godsend, they give you hardware and several DCs and it's up to you what you can do. The money difference is massive.
notTooFarGone 2 days ago

There are so many applications the world is running on that only have one instance that is maybe backupped. Not everything has to be solved by 3 reliability engineers.
antoniojtorres 3 days ago

agree on single instance, but for hetzner, I run 100+ large bare metal servers in hetzner, have for at least 5 years and there’s only been one significant outage they had, we do spread across all their datacenter zones and replicate, so it’s all been manageable. It’s worth it for us, very worth it.
raxxorraxor 2 days ago

Tell me about a service that needs this reliability please. I cannot think of anything aside perhaps some financial transaction systems, which all have some fallback message queue.
Also, all large providers had outages of this kind as well. Hell, some of them are partially so slow that you could call it an outages as well.
Easy config misstep and your load balancer goes haywire because you introduced unnecessary complexity.
I did that because I needed a static outgoing IP on AWS. Not fun at all.

zkmon 3 days ago

Atlas is plain robbery. I see companies paying 600K USD/month on a few clusters, mostly used for testing. The problem is they got locked into this, by doing a huge migration of their apps and switching to a different tech would easily take 2 to 5 years.

nuschk 2 days ago

Would a company paying 600k per month not also be able to employ a couple of devs to improve the situation? Sure, effort is required, but with the right people they could save a ton and have a very good ROI.
I think it's just more complicated than that. No hostage situation, just good old incentives.
mathattack 2 days ago

I’ve seen this happen many times. It looks cheap and easy to spin up, then it grows out of hand and they kill you on the renewals.

tracker1 3 days ago

As much as I like MongoDB as a developer, the last thing I ever want to do is manage a deployment again.

I feel like some of these articles miss a few points, even in this one. The monthly cost of the MongoDB hosting was around $2k... that's less than a FT employee salary, and if it can spare you the cost of an employee, that's not a bad thing.

On the flip side, if you have employee talent that is already orchestrating Kubernetes across multiple clouds, then sure it makes sense to internalize services that would otherwise be external if it doesn't add too much work/overhead to your team(s).

In either case, I don't think the primary driver in this is cost at all. Because that 90% quoted reduction in hosting costs is balanced by the ongoing salary of the person or people who maintain those systems.

rmoriz 3 days ago

I‘m a big fan of owning the stack but why not spend the money on redundancy? At least a couple of machines in a different data center at Hetzner or another provider (OVH, Scaleway, Vultr, …) can easily fit your budget.

arbol 3 days ago

We will be adding additional db servers and running our own replica set eventually. We're just not there yet. Thanks for reading!
- hinkley 3 days ago
  
  But then you’ll be tripling your costs.
  Business people are weird about numbers. You should have claimed 70% even if the replicas do nothing and made them work later on. This is highly likely to bite you on the ass.
  - mystifyingpoi 2 days ago
    
    +1 this is so true. You've lost, you've already publicly praised yourself that you saved 90%. They won't like the idea of tripling the costs, even if it is still below the previous costs.
  - rmoriz 2 days ago
    
    Exactly, this is junior mistake I made too many times. There is a wisdom: Never tell anyone, when you’ve won the lottery.
    In technical terms you need to plan ahead. The legacy mistakes are caused by actions in the past and will likely be made again, when you can’t change the strategy or approach to problems. You won‘t get budget for this AFTER you successfully made a change. „It‘s all solved now, we are good“. No.

kachapopopow 3 days ago

Always consider if 12 hours of lost revenue is worth the savings. Recently hetzner has been flakey with minimum or no response for support or even status updates that anything was wrong. My favorite was them blaming an issue on my side just to have a maintenance status update the day after about congestion.

arbol 3 days ago

Atlas wasn't giving us any support for $3K per month. Hetzner at least have some channel to contact them, which is an improvement. That said, if their uptime is rubbish them we'll probably migrate again. Moving back to Atlas is not an option as we were getting hammered by the data transfer costs and this was only going to increase due to our architecture. Thanks for reading!
- kosherhurricane 3 days ago
  
  500GB isn't a lot of data, and $3K/month seems like an extortion for that little data.
  Having said that, MongoDB pricing page promises 99.995% uptime, which is outstanding, and would probably be hard to beat that doing it oneself, even after adding redundancy. But maybe you don't need that much uptime for your particular use case.
  - buster 2 days ago
    
    That's all fine and such, but i suppose the SLAs aren't covering your revenue loss.
    In fact after looking at https://www.mongodb.com/legal/sla/atlas/data-federation#:~:t... it makes me wonder how much worth the SLA is. 10% Service Credit after all the limitations?
    Atlas can take their 10% Service Credit, i wouldn't care. Save the money and chose a stable provider.
  - arbol 3 days ago
    
    Its more like 700GB now on the new server and we were about to have to migrate to a higher tier on Atlas.
    > maybe you don't need that much uptime for your particular use case.
    Correct. Thanks for reading!
    
    tecleandor 3 days ago
    
    Yep, we just migrated to Atlas, and the disk size limitation of the lower instance tiers pushed us to do a round of data cleaning before the migration.
    Also, we noticed that after migration, the databases that were occupying ~600GB of disk in our (very old) on premise deployment, were around 1TB big on Atlas. After talking with support for a while we found that they were using Snappy compression with a relatively low compression level and we couldn't change that by ourselves. After requesting it through support, we changed to zstd compression, rebuilt all the storage, and a day or two later our storage was under 500GB.
    And backup pricing is super opaque. It doesn't show concrete pricing on the docs, just ranges. And depending on the cloud you deployed, snapshots are priced differently so you can't just multiply you storage by the number of the snapshots, and they aren't transparent about the real size of the snapshots.
    All the storage stuff is messy and expensive...
    
    gervwyk 2 days ago
    
    Did you have a very aggressive backup schedule?
  - gizzlon 2 days ago
    
    > Having said that, MongoDB pricing page promises 99.995% uptime
    Or.. what? That's the important part
- zamalek 3 days ago
  
  OVH is allegedly pretty good. I host all my personal stuff on Hetzner right now so I can't speak to it personally.
  - arbol 3 days ago
    
    We also use OVH and have so far not had any downtime in about 6 months.
izacus 3 days ago

My Hetzner instances all have higher reliability and uptime than AWS deployments. For years now.
That was an interesting surprise.
- jabwd 2 days ago
  
  Curious what kind of deployments you are running with them? I only have personal stuff with Hetzner; but never had issues so far (bare metal in my case coz cheap for what I get and need).
mbesto 2 days ago

If I understand correctly, the author's company provides a CAPTCHA alternative, which presumably means that if their service goes down, all of their customer's logins, forms, etc. either become inoperable or don't provide the security the company is promising by using their service.
This makes me want to use the company's service less because now I know they can't survive an outage in a consistent and resilient way.
0x073 3 days ago

Using hetzner since 5 years never had issues and only 1 downtime in one data center.
- kachapopopow 2 days ago
  
  I think the issue stems from their poor cloud infrastructure since that's where I've had the most issues the dedicated servers seem fine, that being said 2 years prior I had no issues either so it's definitely something recent.

cpursley 3 days ago

Why in the world do people choose Mongo over Postgres? I'm legit curious. Is it inexperience? Javascript developers who don't know backend or proper data modeling (or about jsonb)? Is this type of decision coming down from non-technical management? Are VCs telling their portfolio companies what to use so they have something to burn their funding on? It's just really confounding, especially when there's even mongo-api compatible Postgres solutions now. Perhaps I'm just not webscale and too cranky.

arbol 2 days ago

Personally I've found it faster to build using mongo cause you don't need to worry about schemas. You get 32mb per document and you can work out your downstream processing later, e.g. cleanup and serve to postgres, file, wherever. This data is a big data dump that's feeding ML models so relational stuff is not that important.
- Eikon 2 days ago
  
  You definitely do have to worry about a schema. Except it’s ill defined and scattered across your business logic.
- debazel 2 days ago
  
  I used to build personal projects like this, but after Postgres got JSONB support I haven't found any reason to not just start with Postgres. There's usually a couple of tables/columns you want a proper schema for, and having it all in Postgres to begin with makes it much easier to migrate the schemaless JSONB blobs later on.
  - winrid 2 days ago
    
    Their JSONB impl is not equivalent in terms of write isolation.
tracker1 3 days ago

It depends on your use case, and RDBMS isn't the best option for all needs. Mongo's approach is pretty useable. That said, there are alternatives, you can get very similar characteristics, though a more painful devex out of say CockroachDB with (key:string, value: JSONB) tables.
The only thing I really don't care for is managing Mongo... as a developer, using it is pretty joyous assuming you can get into the query mindset of how to use it.
Also, if you're considering Mongo, you might also want to consider looking at Cassandra/ScyllaDB or CockroachDB as alternatives that might be a better fit for your needs that are IMO, easier to administer.
stopthe 2 days ago

We've been using mongodb for the past 8 years. What we like:
- schema-less: we don't have to think about DDL statements at any point.
- oplog and change streams as built-in change data capture.
- it's dead simple to setup a whole new cluster (replica set).
- IMO you don't need a designated DBA to manage tens of replica sets.
- Query language is rather low-level and that makes performance choices explicit.
But I have to admit that our requirements and architecture play to the strength of mongodb. Our domain model is neatly described in a strongly typed language. And we use a sort of event sourcing.
nalekberov 3 days ago

IMHO it's because so many people take decisions in rush. e.g. let's not design database, put whatever data shape we came ip in alpha version and see where it goes. Sometimes people favor one particular technology because every other startup chose it.
To be quite honest today's software engineering sadly is mostly about addressing 'how complex can we go' rather than 'what problem are we trying to solve'.
aranw 2 days ago

> Why in the world do people choose Mongo over Postgres?
I'm using on a project not by choice. It was chosen already when I joined the project and the more we develop the project the more I feel Postgres would be a better fit but I don't think we can change it now
riku_iki 2 days ago

> Why in the world do people choose Mongo over Postgres?
Postgrtes distributed story is more complicated.
tgv 3 days ago

I'll repeat it again: you don't always want a relational database. Sometimes you need a document-oriented one. It matches quite a lot of use cases, e.g. when there aren't really interesting relations, or when the structures are very deep. That can be really annoying in SQL.
> when there's even mongo-api compatible Postgres solutions
With their own drawbacks.
- kdazzle 2 days ago
  
  I'd probably use a jsonfield in postgres for data that i knew was going to be unstructured. meanwhile, other columns can join and have decent constraints and indexes.
- KaiserPro 2 days ago
  
  > Sometimes you need a document-oriented one.
  Like a file system?
a13n 3 days ago

maybe instead of communicating how dumb you think people are for choosing mongo, communicate why you think it’s so dumb
- williamdclt 3 days ago
  
  I've read a lot more about "how dumb it is to use mongo over PG" than the opposite, I think the burden of proof is on the mongo-lovers these days (not that anyone has to prove anything to randos on the internet)
- cpursley 3 days ago
  
  Why mongo is dumb has been written up about ad nauseam - from data modeling and quality issues, out of control costs, etc. It's been a known toxic dumpsterfire for well over a decade...

PeterZaitsev 3 days ago

Note, if you're looking for MongoDB Enterprise features you can find many of them with Percona Server for MongoDB, which you can use for free the same way as MongoDB Community

arbol 3 days ago

Nice, thanks for the tip!

Too 2 days ago

The dump, restore and custom scripts to synchronize the new instance sound a bit odd. You could just add the instance as a secondary to your cluster and mongo itself handles synchronization. Then removing the old instances automatically promotes the new to primary.

euph0ria 3 days ago

You probably want to store the backup somewhere else, ie. not Hetzner.

They are known to just cancel accounts and cut access.

sdoering 3 days ago

Any proof of that? I am a Hetzner customer and had never heard of this before. Would be good to know what I got into.
- ch2026 3 days ago
  
  A few years back I launched an io game and used hetzner as my backend. an hour into launch day they null routed my account because their anti-abuse system thought my sudden surge in websocket connections was an attack (unclear if they thought it was inbound or outbound doing the attacking).
  I had paid for advertising on a few game curation sites plus youtubers and streamers. Lovely failure all thanks to Hetzner. Took 3 days and numerous emails with the most arrogant Germans you’ve ever met before my account was unlocked.
  I switched to OVH and while they’re not without their own faults (reliability is a big one), it’s been a far better experience.
  - __turbobrew__ 3 days ago
    
    OVH also null routes, it has happened to me.
    It seems like you have to go to one of the big boys like hurricane electric where you are allowed to use the bandwidth you paid for without someone sticking their fingers in it.
- arcanemachiner 3 days ago
  
  There are a lot of such stories if you go digging around HN and reddit threads. Haven't seen a lot of these stories in a while, so it may be happening less now.
arbol 3 days ago

Good shout. I think we'll also run replicas on other providers. We've got some complex geo-fencing stuff to do with regards to data hence why we're just on Hetzner right now.

rglover 2 days ago

I love MongoDB's query language (JS/Node.js developer so the syntax fits my mental model well), but running a production replica set without spending tons of cash is a nightmare. Doubly so if you have any unoptimized queries (it's easy to trick yourself into thinking throwing more hardware at the problem will help). Lord help you if you use a hosted/managed service.

Just fixed a bug on my MongoDB instance last night that, due to a config error w/ self-signed certs (the hostname in the replica set config has to match the CN on the cert), that caused MongoDB to rocket to 400% CPU utilization (3x, 8GB, 4VCPU dedicated boxes on DO) due to a weird election loop in the replica set process. Fixing that and adding a few missing indexes brought it down to ~12% on average. Simple mistakes, sure, but the real-world cost of those mistakes is brutal.

CodesInChaos 3 days ago

MongoDB Atlas is so overpriced that you can probably save already 90% by moving to AWS.

computerfan494 3 days ago

Most of the cost in their bill wasn't from MongoDB, it was cost passed on from AWS
- CodesInChaos 3 days ago
  
  I don't remember the numbers (90% is probably a bit exaggerated) but our savings of going from Atlas to MongoDB Community on EC2 several years ago were big.
  In addition to direct costs, Atlas had also expensive limitations. For example we often spin up clone databases from a snapshot which have lower performance and no durability requirements, so a smaller non-replicated server suffices, but Atlas required those to be sized like the replicated high performance production cluster.
- CodesInChaos 3 days ago
  
  Was it? Assuming an M40 cluster consists of 3 m6g.xlarge machines, that's $0.46/hr on-demand compared to Atlas's $1.04/hr for the compute. Savings plans or reserved instances reduce that cost further.
  - computerfan494 2 days ago
    
    There's definitely MongoDB markup, but a full 33% of their bill was AWS networking costs that have nothing to do with Atlas.
- darth_avocado 3 days ago
  
  Highly doubt that. MongoDB has 5000 well paid employees and is not a big loss making enterprise. If most of the cost was pass through to AWS, they’d not be able to do that. Their quarterly revenue is $500M+ but also spend $200M in sales and marketing and $180M in R&D. (All based on their filings)
  - computerfan494 2 days ago
    
    You can look at this particular bill and observe that more than 50% of the cost was going to AWS.
    
    MagicMoonlight 2 days ago
    
    If they’re a reseller of AWS, which they will be, they decide the rates that get charged.
    
    computerfan494 2 days ago
    
    Yes, and my point is that this customer switching to running their own MongoDB instances on EC2 like Atlas does would reduce the bill by less than 50% because the rates that they are charging mean that their cut is less than what AWS is getting from this customer.
KaiserPro 2 days ago

we saved 50% by moving from atlas to a three node cluster. Thats for a 6tb db (we moved because of size rather than cost, but its been a nice bonus)

mathattack 2 days ago

I experienced some cutthroat commercial behavior from MongoDB. It scared us enough to avoid Atlas, and ultimately move to Cosmos on Azure. Massive savings.

I moved to another employer that was using Atlas, and the bill rivaled AWS. Unfortunately it was too complex to untangle.

lunias 3 days ago

Just host on a server in your basement. Put another instance in someone else's basement. I'm only half joking - track the downtime.

dvfjsdhgfv 2 days ago

As much as I love Hetzner, the article is misleading. Using a single server today makes no sense whatsoever unless it's for hobby projects. It will fail. My servers at Hetzner routinely fail every few years (4-5 maybe), usually it's a hard drive, but sometimes motherboard or PSU. If it's a drive, you need to take it offline to rebuild the array, it can take a few hours. Like honestly, this article blew up my mind. I'd never use such setup in production. Just add the damn second server (or two), it's dirt cheap!

raxxorraxor 2 days ago

I can deal with an outage every 4-5 years. I doubt you will get around that in a managed server environment, because you will fail configuration at some point when the service will inevitably change in the same timeframe.

bmcahren 2 days ago

MongoDB Atlas was around 500% more expensive than in-house every time I evaluated it (at almost every scale they offer as well).

They also leaned too heavily on sharding as a universal solution to scaling as opposed to leveraging the minimal cost of terabytes of RAM. The p99 latency increase, risk of major re-sharding downtime, increased restore times, and increased operational complexity weren't worth it for ~1 TB datasets.

winrid 2 days ago

That's because sharding is way more likely to make them more money with their licensing model.

anonymid 2 days ago

$2700/mo is about 1/3 of an engineers' salary (cost to the business of a mid-level engineer in the UK)...

But, there's the time to set all of this up (which admittedly is a one-time investment and would amortize).

And there's the risk of having made a mistake in your backups or recovery system (Will you exercise it? Will you continue to regularly exercise it?).

And they're a 3-person team... is it really worth your limited time/capacity to do this, rather than do something that's likely to attract $3k/mo of new business?

If the folks who wrote the blog see this, please share how much time (how many devs, how many weeks) this took to set up, and how the ongoing maintenance burden shapes up.

tester756 a day ago

You can get decent eastern eu engineer for 2700$ (after tax) salary

CodesInChaos 3 days ago

How long does mongodump take on that database? My experience was that incremental filesystem/blockdevice snapshots were the only realistic way of backing up (non sharded) mongodb. In our case EBS snapshots, but I think you can achieve the same using LVM or filesystems like XFS and ZFS.

goastler 3 days ago

It takes ~21hrs to dump the entire db (~500gb), but I'm limited by my internet speed (100mbps, seeing 50-100mbps during dump). Interestingly, the throughput is faster than doing a db dump from atlas which used to max around 30mbps

ianberdin 3 days ago

I’m starting to worry about this Hetzner trend. It can end up to get the price skyrocketing.

goastler 3 days ago

There's other providers (OVH, etc) so I'm sure the price will remain competitive
arbol 3 days ago

Hopefully not. Their console is pretty bad so I reckon that will put a lot of people off.
- the_duke 3 days ago
  
  The cloud console is pretty good though? Even does live sync!
  The old one for dedicated servers (robot) is horribly outdated though.
  - arbol 3 days ago
    
    Ah right, we're on robot so I've not seen the cloud one. Robot is old! :)
- patrickmcnamara 3 days ago
  
  The new console is completely fine.
dehrmann 3 days ago

EC2 is sort of a ceiling price.
dzonga 2 days ago

prices just dropped. :)
righthand 3 days ago

We’re just going to end up with everyone moving from Amazon to Hetzner and the same issue will remain. High prices, lockin, etc will appear.
We need an American “get off American big tech” movement.
Differentiate people! Reading “we moved from X to Y” does not mean everyone move from X to Y, it means start considering the Y values and research other Y’s around you.
- arbol 3 days ago
  
  We also use OVH, Contabo, Hostwinds... Architect so you can be multi-provider and reduce internet centralisation!
  - righthand 3 days ago
    
    Nice, if you write an article about it, try to leave the focus off of a single hosting provider. Encouraging the differentiation is important too (next time! I’m not dogging the movement or your efforts in this article, I love to see reduced reliance of Amazon in general).
- cmrdporcupine 3 days ago
  
  > We need an American “get off American big tech” movement.
  As a non-American, I use Hetzner precisely to have my projects not hosted anywhere near the US.
- zzzeek 3 days ago
  
  Hetzner is German?
  - Lapel2742 3 days ago
    
    > Hetzner is German?
    Yes. Hetzner is a German company from Gunzenhausen.
    https://en.wikipedia.org/wiki/Hetzner
- k4rnaj1k 3 days ago
  
  Pretty sure hetzner is still a lot less in terms of provided features. There are reasons people get "amazon certified". So, aws alternatives are few and require a lot more resources to create and maintain, while alternatives to hetzner would be a lot easier to create, keeping original Hetzner prices in-check with the market.

ritcgab 2 days ago

Looking at the root server hardware page on Hetzner [1], it is not clear if their server is using ECC memory. It would be pretty bad if not.

[1] https://docs.hetzner.com/robot/dedicated-server/general-info...

walletdrainer 2 days ago

https://www.hetzner.com/dedicated-rootserver/ax42/
Go to the product pages
(Yes all the normal Hetzner servers use ECC)
- ritcgab 2 days ago
  
  > Yes all the normal Hetzner servers use ECC
  This is wrong - or define "normal"? In their current product line, at least the default configuration of AX52/EX44/EX63/GEX44 doesn't have ECC. It is an upgrade option only.
  The blog post says their server has "8 cores Intel Xeon W-2145", which is PX92 or its variant and its base configuration can be without ECC.
  - walletdrainer 2 days ago
    
    Apparently I was wrong yeah, just looked at a lower specced server and foolishly extrapolated from there.

cnkk 3 days ago

Are you sure you went with RAID1 with 4x disks instead of RAID10?

arbol 3 days ago

Good spot - this is wrong. It should've been 4 x 3.84 TB NVMe SSD RAID 5. My colleague set this bit up so I'm not entirely up to speed on the terminology.

poszlem 3 days ago

As in so many of these stories, what gets glossed over is just how much complexity there is in setting up your own server securely.

You set up your server. Harden it. Follow all the best practices for your firewall with ufw. Then you run a Docker container. Accidentally, or simply because you don’t know any better, you bind it to 0.0.0.0 by doing 5432:5432. Oops. Docker just walked right past your firewall rules, ignored ufw, and now port 5432 is exposed with default Postgres credentials. Congratulations. Say hello to Kinsing.

And this is just one of many possible scenarios like that. I’m not trying to spread FUD, but this really needs to be stressed much more clearly.

EDIT. as always - thank you HN for downvoting instead of actually addressing the argument.

isaacvando 3 days ago

There are also an enormous number of ways to build insecure apps on AWS. I think the difficulty of setting up your own server is massively overblown. And that should be unsurprising given that there are so many companies that benefit from developers thinking it's too hard.
mkesper 3 days ago

I don't see the point of using ufw at all as Hetzner provides an external firewall.
- tracker1 3 days ago
  
  UFW doesn't add much overhead given the implementation in Linux is already in place, it's mostly just a convenient front-end. That said, you also need to be concerned with internal/peer threats as well as external ones...
  Clearly defining your boundaries is important for both internal and external vectors of attack.
- poszlem 3 days ago
  
  If you use a dedicated hetzner machine you only get a stateless firewall. That would be one reason.

urbandw311er 2 days ago

Can somebody explain to me how you would retain vector indexes in this migration to Hetzner? We recently began using Atlas Cloud and I’m concerned about these rising costs — but my understanding was that, if you self-host, you lose the ability to create vector indexes.

stopthe 2 days ago

Available in community edition 8.2+
https://www.mongodb.com/company/blog/product-release-announc...
- urbandw311er 2 days ago
  
  Thank you!

CodeCrusader 2 days ago

It does look like Hetzner is getting a lot more popular, I see such articles every month

game_the0ry 2 days ago

I feel like we see stories like this more and more. Makes you wonder just how durable the revenue of cloud providers are when self hosting on VMs has never been easier and more cost effective.

Then again, nextjs + vercel + gihtub are awfully convenient.

loloquwowndueo 2 days ago

Self hosting requires a different skill set - full blown sysadmin / SRE vs. “The application starts, just deploy it and the PaaS takes care of load balancing, scaling, healing, observability etc”.
I’m not defending cloud-esque paas here (I would totally prefer to manage VMs directly) but it should be recognized that it gives some value depending on what you’re comfortable doing with infrastructure.
- game_the0ry 2 days ago
  
  True, but with the right web framework, much of this comes out of the box.

tuhgdetzhh 3 days ago

I recently did a total cost of ownership analysis for moving off AWS to Hetzner: https://beuke.org/hetzner-aws/

niffydroid 2 days ago

Having run a small mongo database and having it hosted in 3 different places at one point. The last point was atlas, yes it was expensive but we got replication, we could have an analytical node, we even had data residency. If I remember correctly you can have your replicas in different providers at the same time.

One of the biggest issues was cost, but we were treated like first class citizens, the support was good, we saw constant updates and features. Using atlas search was fantastic because we didn't have to replicate the data to another resource for quick searching.

Before atlas we were on Compose.io and well mongo there just withered and we were plagued by performance issues

petcat 3 days ago

> The more keen eyed among you will have noticed the huge cost associated with data transfer over the internet - its as much as the servers! We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers.

I mean, you're connecting to your primary database potentially on another continent? I imagine your costs will be high, but even worse, your performance will be abysmal.

> When you migrate to a self-hosted solution, you're taking on more responsibility for managing your database. You need to make sure it is secure, backed up, monitored, and can be recreated in case of failure or the need for extra servers arises.

> ...for a small amount of pain you can save a lot of money!

I wouldn't call any of that "a small amount of pain." To save $3,000/month you've now required yourself to become experts in a domain that maybe is out of your depth. So whatever cost saved is now tech debt and potentially having to hire someone else to manage your homemade solution for you.

However, I self-host, and applaud other self-hosters. But sometimes it really has to make business sense for your team.

arbol 3 days ago

> I mean, you're connecting to your primary database potentially on another continent?
Atlas AWS was actually setup in Ireland. The data transfer costs were coming from extracting data for ML modelling. We don't get charged for extracting data with the new contract.
> experts in a domain that maybe is out of your depth
We're in the bot detection space so we need to be able to run our own infra in order to inspect connections for patterns of abuse. We've built up a fair amount of knowledge because of this and we're lucky enough to have a guy in our team who just understands everything related to computers. He's also pretty good at disseminating information.
Thanks for reading!
- goastler 3 days ago
  
  aww shucks ;)

lightningspirit 2 days ago

In that case, can you reduce pricing for customers, too?

MagicMoonlight 2 days ago

You could cut it even more by moving to on-prem instead. Personally I’d rather have it in AWS so that I’m not responsible for keeping it working 24/7.

stego-tech 2 days ago

What most of the commenters here are missing is the reality that not every system, function, or business needs the sort of uptime that AWS offers - and that's fine. It's something a lot of newer entrants into the technology field fail to grasp, because they've never had to actually deal with an outage before - or a time when the internet itself was ephemeral and temporary, available only as long as your connection remained active.

The number one thing people poo-pooing these "We saved $XXX by getting off public cloud" posts is that each business has different calculus for its risk tolerances, business needs, and opportunity costs. Once a function reaches some form of stability or homeostasis, then hosting it in the public cloud can become a net liability rather than a net asset. Being able to make those decisions impartially is what separates the genuinely good talent from those who conflate TC with wisdom.

Even when public cloud is the right decision, using managed services increasingly isn't. MongoDB Atlas is a managed service with a corresponding price tag to match. Running it in a VPS like Hetzner may shift some of the maintenance and support tasks onto your team, but let's be real - modern databases are designed to be bulletproof, and huge companies operated just fine with a single database instance on bare metal for decades, even with the odd downtime along the way. We ran a MongoDB CE database at a PriorCo on a single VM in a single datacenter for nearly a decade, and it underpinned a substantial chunk of our operations - operations we could do by hand, if needed, during downtime or outages (that never happened). We eventually moved it to AWS DocumentDB not out of cost-savings or necessity, but because a higher-up demanded we do so.

If anything, the visceral rebuke of anyone daring to move off public cloud feels very reminiscent of my own collegial douchebagginess in the 2000s, loudly mocking Linux stans and proclaiming closed source (Microsoft) would run the planet. Past-me was a douchebag then, and the same applies to the AWS-stans of today.

ribtoks 2 days ago

Keeping time series in Mongo…

pdyc 3 days ago

hetzner routinely refuses to accept you as customer so while u can cut costs its a privilege.

TZubiri 3 days ago

I hope you don't mind if I hijack this post to ask:

Is there a provider similar to Hetzner but US based?

christophilus 3 days ago

I’ve never heard of or used them, but this was linked in a previous Hetzner thread: https://ioflood.com/

jjwiseman 2 days ago

  Docker/TypeS
  cript Node

CyanLite2 3 days ago

"We replaced a cluster of virtualized servers with a single bare metal server. Nothing has gone wrong, yet."

usrnm 3 days ago

There are many cases when some downtime is perfectly ok. Or, at least, worth the savings
- tayo42 3 days ago
  
  They saved a little under 3k and were motivated by the aws outage
NorwegianDude 3 days ago

To be fair, a single server is way more reliable than cloud clusters.
Just look at the most recent many hour long Azure downtime where Microsoft could not even get microsoft.com back. With that much downtime you could physically move drives between servers multiple times each year, and still have less downtime. Servers are very reliable, cloud software is not.
I'm not saying people should use a single server if they can avoid it, but using a single cloud provider is just as bad. "We moved to the cloud, with managed services and redundancy, nothing has gone wrong...today"
arbol 3 days ago

Lol yep that could've been the headline. We plan to add replica servers at some point. This DB is not critical to our product hence the relaxed interim setup.

lisbbb 2 days ago

Hetnzer, lol. Reminds me of all those weird Chinese brands for nvme and sdd hard drive enclosures on Amazon--Amaloo, Ugreen, Fideco, Orico, and on and on. All trash.

zzzeek 3 days ago

it's getting hard to ignore Hetzner (as a Linode user).

Thing is, Linode was great 10-15 years ago, then enshittification ensued (starting with Akamai buying them).

So what does enshittification for Hetzner look like? I've already got migration scripts pointed at their servers but can't wait for the eventual letdown.

tracker1 3 days ago

IMO, virtual servers and dedicated server hosting is really commoditized at this point. So you have a lot of options... assuming you have appropriate orchestration and management scripted out, with good backup procedures in place, you should be able to shift to any other provider relatively easily.
The pain points are when you're also intwined with specific implementations for services from a given provider... Sure, you can shift from PostgreSQL on a hosted provider to another without much pain... but say SQS to Azure Simple Queues or Service Bus is a lot more involved. And that is just one example.
The is a large reason to keep your services to those with self-hosted options and/or self-hosting from the start... that said, I'm happy to outsource things that are easier to (re) integrate or replace.

trustmeimhuman 3 days ago

[dead]

Glamklo 3 days ago

[dead]

cmsbobcatsx2025 2 days ago

[flagged]

mv4 3 days ago

"I cut my healthcare costs by 90% by canceling insurance and doctor visits."

In all seriousness, this is a recurring pattern on HN and it sends the wrong message. It's almost as bad as vibecoding a paid service and losing private customer data.

There was a thread here awhile ago, 'How We Saved $500,000 Per Year by Rolling Our Own “S3' [1]. Then they promptly got hacked. [2]

[1] https://engineering.nanit.com/how-we-saved-500-000-per-year-...

[2] https://www.cbsnews.com/colorado/news/colorado-mom-stranger-...

raxxorraxor 2 days ago

You just need to vibe configure your server too so that it matches your application.
Seriously, I think for most services Hetzner is the better option. No provider lock-in, easier configuration (you cannot tell me AWS/Azure configuration is easier than system administration, these services change every 3 months and use non-standard tools).
Most services can stomach a technical fault. Recoverability is more important. There are some exceptions to this and that highly depends on the nature of the service. Nobody here described the nature of their services, so we can only speculate.
csunoser 3 days ago

Even after reading the source, it doesn’t seem like they were hacked? Or if they were, they were not accused of such.
I do think hand rolling your own thing is fraught. But it is very confusing to equate one mother’s complaint to “they have been hacked”.
PS: The people who made their own s3 rans a baby monitor company. News article is about a mother reporting hearing a weird voice from the baby monitour.
- mv4 3 days ago
  
  Multiple reports on reddit suggest people making this baby cam do not understand security.
  https://www.reddit.com/r/NewParents/comments/1ocgmoi/nanit_c... https://www.reddit.com/r/Nanit/comments/1ffc051/nanit_hacked... https://www.reddit.com/r/Nanit/comments/1dyaph6/heard_a_voic...