> When you publish a Notion page to the web, the webpage’s metadata may include the names, profile photos, and email addresses associated with any Notion users that have contributed to the page.
First: This is documented and we also warn users when they publish a page. But, that’s not good enough!
Second: We don’t like this and are looking at ways to fix this either by removing the PII from the public endpoints or by replacing it with an email proxy similar to GitHub’s equivalent functionality for public commits.
P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
some problems I've identified:
1. suppose you have x users and y groups, of which require some subset of x. joining the data on demand can become expensive, O(x*y).
2. the main usefulness of such an architecture is if the data itself is stored with the user, but as group sizes y increase, a single user's data being offline makes aggregate usecases more difficult. this would lend itself to replicating the data server side, but that would defeat the purpose
3. assuming the previous two are solved, which is very difficult to say the least, how do you secure the data for the user such that someone who knows about this architecture can't just go to the clients and trivially scrape all of the data (per user)?
4. how do you allow for these features without allowing people to modify their data in ways you don't want to allow? encryption?
a concrete example of this would be if HN had it so that each user had a sqlite database that stored all of the posts made per user. then, HN server would actually go and fetch the data for each of the posters to then show the regular page. presumably here if a data of a given user is inaccessible then their data would be omitted.
Tells me everything I need to know about this industry. No regard or seriousness to security at all.