My advice is to look at other html games and see how they use text and images together. It feels like the images are the primary and the text is an afterthought here.
Heres how it feels you made the scenes:
Pick 5 images -> describe what is happening in them -> fit the scene to the characters.
Here's how it should work:
Plan the scene in textual detail (it it's entirety, look at screenplays for reference) -> Find images and gifs that help tell the story, but only if necesary -> Cut out any bit that don't flow well.
If this is your first foray into HTML coding, I'd recommend getting comfortable with styling and formatting for sure, look at what other people do, what works and what doesn't and take only the best features from there and make them your own.
If you do all of the above well, you can have a great game that people will enjoy playing and look forward to each update. If you don't your game will never progress to even the amateur stage.
Obviously it is up to you where to start, but I would recommend taking one of two paths before your game gets too long to refactor. Either 1. Focus on refining your story telling format and work on a good solid v0.1 release or 2. Focus on building a solid UI/UX that can be applied across your whole story.