"And it is not assumed that whomever is talking is whomever is currently on the screen. " Lumi's mouth is open is open in that "cherry picked image" then closed when his mom talks, so it should be pretty obvious what the intention was there.
the mouth opens, but it does not look like talking. it looks more like sneer / baring teeth.
there is also the dissonance between what is written and what you see. you see a woman, which throws you off
"The view moves around during the conversation to look at different people." no, clearly it doesnt.
here is an example of just one of multiple perspective shifts happening in that very conversation
Now, it does show MC and companions.
but... keep in mind that this scene happens immediately after importing a save from last version. So you have not seen your companions in months.
And throughout that scene they are shown in wildly differing lighting levels making it a bit unclear on a casual view how many of them are there and how many are passing local NPCs from the town.
careful examination (done now after the fact) shows that there are "only" 6 participants in that conversation.
MC, Old Man with pitchfork, Lumi, Lumi's mother, and two ninja companions that came with you.
but between the odd lightning, the moving perspective, introducing a bunch characters at once it is easy to lose track abit of how many people are immediately around.
Add to that the "young man" does not appear in the conversation in white, it appears in blue above. So your eyes are not focused on it, they are focused on the words said in white.
And it is there for a moment before being replaced by a name. And the fact Lumi LOOKS like a girl.
And it is very easy to miss that floating blue "name" of "young man".
Speaking of blue... every single floating name in the conversation is color coded.
But when "Young Man" becomes "Lumi" the color changes. "Young Man" is in blue while "Lumi" is in Yellow.
Further indicating they are separate individuals
It certainly would not hurt to be a little clearer.
Most important step I would say is to use the same speaker color