ok, just to make it a little easier to understand for people without knowledge of creating a render.
You create a scene with actors, items, lights etc. like in photography (one pic for merchandise photos can take a day). But your actor is not a person, it is a puppet and that means you have to move every single part of the actor until it looks ok for you (doesn't mean that it is ok) and same for the items, lights, landscape etc. Because it is a puppet you have to put them in clothes and have to look that they fit. It's not a click and it fits, it's a click and it looks like a toddler dressed himself. You have to fix that by hand. You have to look from every angle if it fits or if some parts poke through each other. You do not see the effect of the lights and the real surface of every part of your pic. You have to assume the effect. The more parts your pic have the longer it takes to place everything until you think you can try to render. Sometimes it takes you a few hours until you think you could render it, but it can even take days.
How long it takes to render depends on the render engine and how complex the render is. An actor has a skeleton and this skeleton gets a covering, a wire mesh that gives the skeleton an outer form like our skin does. The areas in the mesh are called polygons. The more polygons the smoother the form, like a soccer ball compared to a bowling ball. For example a low polygon head got about 600 polygons and a high polygon head about 10000 polygons. The higher the count the higher the memory consumption. The same construct is used for every item in the render. On that mesh comes the surfaces - the look of the parts, shiny, metallic, rusty, bumps, bloody and these surfaces have attributes like reflection, illuminate, transparent etc. And those surfaces (textures) can have different resolutions. The more detail you want the higher the resolution and the memory consumption. When you start a render the engine calculates for every point of a picture how it looks. Only when the render is finished you see if the light is to harsh or to low, if the expression looks unreal etc. For rendering the engine need all parts of the picture in the memory (of the nvidia graphic card) and use the graphic card processor to calculate the render. Depending on engine, resolution and quality that can take a few minutes up to a few hours, but if the parts for the render does not fit in the graphic card it renders with the CPU and system memory. That happens often with complex scenes with many items, high polygon count, high resolution surfaces and a high resolution render - HD needs less memory than 4K. A render with CPU and system memory takes easy a day, a day where you can't do anything else with that PC, because it runs with 100%. And if you see that the light is bad or a surface looks shit etc. you have to change that in the design and start the render again.
After that you maybe want to change something in the pic like contrast, brightness etc with photoshop and the format of the render, because most of the render engines create big files and converting them to jpg or webp can save space in the game. BTW 95% of the engines only work with Nvidia cards and not all engines can switch to CPU rendering and surfaces, textures etc are not easy exchangeable between render engines.
And this process does not include animations that use many pics per second for a smooth animation. A smooth animation need 25 pics per second. Thats 250pics in one file of 10 seconds.
And now think about the over 1100 images in this game. A quarter of an year for an update is reasonable if that is hobby and not main income. Not everyone makes his life with games here like GDS with his Chloe18 series and can work every day on an update.