First: They're called renders because the game assets are pre-rendered image rather than the models and instructions on how to position them (like for example a Unity-based VN would use). The complexity (or simplicity) of the process involved is irrelevant.
Second: That's like saying writing a book is just typing words. Getting the poses, facial expressions, lighting, and everything else right is non-trivial work. It's not technically complex work, but there are enough bad VNs on this site to show what happens when you lack those creative skills.
You'd be surprised how long people can muddle along without reading the documentation (or even finding it). Making something that works does not mean learning how to do it well. Far more common is to get stuck in bad habits that land you in trouble down the line as your technical debt comes due. This leads to big delays as you go back and hopefully do things the right way.
The root of ES's problems is that BC is a creative type with big dreams (any software developer can tell you all about those people with "an amazing app idea!!!"), who is learning the hard way that ideas are only half the battle. It's pretty clear BC doesn't have the technical skills to make the game they want, and may not even be able to comprehend the progress (or lack thereof in this case) of those actually working on it.