It is not at all standardized, and figuring out the user agents for each site was rather cumbersome. We are surely missing many in our allowlist, but I think we have most of the big ones at this point. I don't actually know what the full regex is at the point, but maybe @sds can share.
If you need to do something similar yourself, I would recommend hosting a simple express app that responds with common OG meta tags that inject the user agent in the description/title, so you can see how things render on various sites and also what user agent they are hitting your site with.
I’m sure there’s a long tail of user agents we’re not handling, but the substring “bot” goes a long way.