Breaking the fourth wall is not exclusive to the MCU, and really, it's just another form of narration. The audience doesn't even blink twice at the sound of a character's voice narrating their own story. Showing their face while narrating is just an extension of that.
An in-universe explanation would be that there is sort of an in-between reality, that allows the characters to interact with viewers/readers and the lines between those two realties can sometimes blur but in general leave the story unchanged. The narrator reality is sort of like a dream, once it's over it quickly fades from the main reality and the interaction is more or less erased from the main universe.
Like when Zack Morris would call time out on saved by the bell, it was sort of like that reality paused for a moment while he interacted with the viewers and then when time would restart nothing had changed and even Zack was clueless that he was a fictional character being watched on tv.