DieRipps I am not sure I totally understand the end goal here. It makes sense to me that in the end you are browsing your collection using Synology Photos. And if it can read embedded face tags from XMP as you say, then you can totally use Tonfotos for facail tagging, store results into metadata, and SP will pick that up, and you are all set for browsing your collection from the web.
However, I am not sure why you are trying to involve LR into facail recognition workflow, obviously it is not the best tool for that. It is great with RAW processing, but as far as I understand you can just ignore its facial recognition feature alltogether. Why not just doing RAW development in LR and then facial recognition in Tonfotos on developed photots? Why you try to pull facial tags back to LR? I guess I am missing some part of picture here, please comment.