Good to hear, glad you like it.
Nitpicks are fine!
I'd be inclined to disagree with your reasoning here.
Redundancy doesn't prevent failure, it's meant to mitigate the consequence
if something fails.
Say you lose connection on one link, then you need a secondary link to take over.
Since you have two links the chance that either fails is twice as likely than having one because we just doubled the potential points of failure.
So failure rate, downtime and redundancy are all related, but not the same thing.
You can have five points of failure every day, but if you have redundancies in place to take over for the equipment/software or whatever that failed, then you will prevent downtime.
Thank you.
Sorry if my intended meaning wasn't clear, I completely agree that redundancy doesn't prevent failures of individual devices in the system, and agree that failure rate of devices is thus increased the more devices you have - exactly as you say, the number of potential points of failure increases - and redundancy is there to mitigate device failures, hence the term 'High Availability'.
You hit the nail on the head when you state that downtime & failure rate are not the same thing
- precisely - in the absence of context, and an assumption of downtime=system level and failure rate=individual devices, they are not the same thing at all.
My point was that when your job is service provision of multiple systems, the concept of 'downtime' is equivalent to the concept of 'overall system failure rate' as opposed to 'device failure rate', which the 'work' question didn't make clear.
This is most likely a question of mindset and framing in terms of Enterprise scale Service Level Agreements dealing with multiple separate clients and hundreds of servers, where system failure rate/downtime in a given period is written into the contract, and thus reporting and metrics are based on system failure/downtime, rather than device failure rate - although it is often framed as a minimum uptime % guarantee in the high 9x% range, allowing for scheduled maintenance windows.
Device failure rate is typically removed from service contract metrics (out of mind out of sight) because that is a downstream issue between 3rd party vendors and service provider.
There is also the question of whether the redundancy is cold standby or hot standby or always-on, in which device failure rate does not increase for cold standby in periods where the backup devices are never switched on,
and in the context of virtualised systems, the device failure rate can be further de-coupled from system failure rate, adding a layer of abstraction between the level of dependency and redundancy any given virtualised device has on an underlying physical device.
I mistakenly suggested the rewording of the question to be 'individual device' failure rate, but 'device failure rate' would have been better, because going in the other direction, each individual device doesn't fail more often, but rather, the total failure count of all devices in the system goes up.
I feel bad now having dragged the semantics of this issue out to the extreme, it really is nitpicking on a minor point no-one else would pick up on... and you've got waaaaaay better things to do with your time!
Keep up your excellent work