As for rendering the thing... This scene is a complex scene. It has TWELVE people in it, and that's just going to make things go super slow. It would likely be faster to do the following.
1) Get a picture of the classroom room with a camera and lighting setup with 0 people.
2) Render the 4 people in section 1. Save it. Delete the people.
3) Render the 4 people in section 2. Save it. Delete the people.
4) Render the 4 people in section 3. Save it. Delete the people.
5) Take Images 1,2, and 3 and combine them in a way so that all 12 people are there.
Sometimes attempting a big image like this with 12 can cause things to freeze or other errors (maybe even over heat)
Many developers on Patreon have difficulties because the method of development is incorrect with the game industry standards.
If you must produce a movie, you have no choice but to render the full scene, and can take several hours or days to render even in multiple computers. If you pretend to produce an image for posters and publicity, you may have to render the full scene, but a sole image only takes several minutes in a middle specs rig depending on the quality and size of the resulting image. The arduous process is the pose and compose for the scene.
Then you may have the problem of lacking video ram, from a developer perspective, while making a scene and due to the weight of many 3D meshes and effects on the scene. But again this is only when you have to render an animation, generally video.
Devs for Visual Novels like AW&M tend to render images for the users to load each time they make an update, and that has a bad consequence, it increases the file size of the whole project. Most Novels are now to nearly 2GB with little content, being basically 2D.
A real 2D game is made of static images, sprites and transparency masks.
So in this scene for example, you have to render the static background, those images which never change (If you need day and night, then you would have two backgrounds). Those can be reused multiple times every time a scene occurs at the same place.
Then the sprites are the moving objects, people and items. Those are rendered in the 3D program alone with a transparent background or chroma. These can be rendered doing different moves and tasks, most of them can be reused on multiple scenes. And finally to interlace backgrounds with sprites...these are put on layers and then with the use of a transparency mask, which is basically a black & white solid image to indicate the main program which parts of an image are visible through other images on upper layers, you can make it work. All the stuff, however, is controlled by code... and that is more complex than linking simple images in an array.
Sorry for the long text, but I thought it was needed some explanation after the arguing in the previous posts.