Updated June 2026. Originally published December 2009 on Facebook's addition of relationship anniversaries to user profiles. Rebuilt as EPR's reference on the sixteen-year arc of platform profile data — from voluntary self-disclosure in 2009 to involuntary contribution to large-language-model training corpora in 2026.
In December 2009, Facebook added a small feature to user profiles: relationship anniversaries. Users could now record the date they had married, or the date they had begun dating their current partner, and the platform would surface a reminder when the date came around. The feature was minor. The structural pattern it sat inside was not.
The 2009 anniversary feature was a clean example of how Facebook expanded its user-data footprint across the period: not by demanding more information, but by adding small voluntary fields that incrementally enriched the structured-data graph the platform monetized. Each field looked harmless on its own. Each field was a structured signal that compounded into the most valuable identity-and-relationship dataset ever built.
Sixteen years later, that dataset — and the analogous datasets accumulated by every major consumer platform — is the contested raw material of the AI training-data fight.
The 2009 Pattern
The anniversary feature was one of dozens of small profile-data expansions Facebook rolled out across 2008–2012. Birthdays. Hometowns. Current city. Workplace history. Education. Family members. Languages spoken. Religious views. Political views. Books, movies, music, TV shows. Each addition was framed as "letting you share more of who you are." Each addition added a structured field to the user record.
The commercial logic was that the structured profile fields were more valuable than the unstructured status updates the platform was already collecting. A status update mentioning a partner's name was an unstructured signal — useful for ad targeting at scale, but noisy. A structured relationship field with a named partner and an anniversary date was an unambiguous identity-graph edge that ad-targeting algorithms could rely on.
The 2009 anniversary feature was small. The 2009–2012 cumulative profile-data expansion was the most consequential consumer-data acquisition in the history of advertising.
The Sixteen-Year Arc
The 2009 voluntary-disclosure pattern has evolved through four phases.
2009–2013: Voluntary disclosure scales. Facebook's profile fields, LinkedIn's professional-history fields, Twitter's bio and location fields, and the analogous fields across every consumer platform that followed established the voluntary-disclosure infrastructure. The user provided the data. The platform structured and monetized it. The arrangement was understood, if not always carefully read in the terms of service.
2013–2016: Implicit data dominates. The mobile-first shift moved the data economy from voluntary disclosure to implicit signals: location traces, app-usage patterns, social-graph edges inferred from messaging frequency, content-engagement patterns. The implicit-signal infrastructure scaled to substantially exceed the voluntary-disclosure infrastructure in commercial value. The 2014 acquisition of WhatsApp by Facebook — for $19 billion, primarily for the messaging-frequency graph — was the canonical valuation moment for implicit-signal data.
2016–2022: The data the platform inferred became more valuable than the data the user provided. The Cambridge Analytica disclosures, the GDPR implementation, the Apple App Tracking Transparency rollout, and the cookie-deprecation cycle all addressed the implicit-signal infrastructure, not the voluntary-disclosure infrastructure. The 2009-era anniversary field was never the privacy fight. The implicit-signal infrastructure was.
2022–2026: AI training data is the new infrastructure layer. The November 2022 ChatGPT launch and the subsequent emergence of large-language-model training as a substantial use of consumer-platform data has opened the fourth phase. The voluntary-disclosure fields the 2009-era users provided — and the implicit-signal infrastructure the 2013-era platforms built — are now training data for the language models the 2026-era consumers query. The arrangement is structurally different from the 2009 arrangement, and the consent the 2009-era users gave does not obviously extend to the 2026 use.
What the 2009 Pattern Predicted
Three structural arguments from the anniversary-feature era have held up across sixteen years.
Small fields compound. The 2009 lesson — that small, individually trivial profile fields aggregate into substantially valuable structured-data graphs — has held across every subsequent consumer-platform expansion. The most valuable platform-data investments are typically the ones that did not look valuable at the time.
Consent given to one use does not bound subsequent uses. The 2009-era users who provided anniversary dates were consenting to a relationship-reminder feature. They were not consenting to advertiser-targeting use, to political-microtargeting use (which would emerge through 2014–2018), or to language-model training use (which would emerge through 2022–2026). The consent the platform obtained was structurally narrow. The uses the platform subsequently made of the data were structurally broad. The gap between the two is the operating tension of every subsequent consumer-data regulatory cycle.
The data outlives the platform that collected it. The 2009-era Facebook profile fields are still inside Meta's datasets. The 2013-era Twitter content is still inside X's datasets and, through licensing deals, inside the training corpora of multiple AI companies. The 2015-era LinkedIn profiles are inside Microsoft's datasets. The platforms have changed names, ownership, executive teams, and strategic direction. The data the platforms accumulated has persisted across all of those changes.
What the 2026 Discipline Looks Like
The contemporary platform-data-and-AI-training discipline operates across four substantially more complex surfaces than the 2009 environment.
Publisher-AI licensing. The OpenAI–News Corp deal, OpenAI–Axel Springer, Anthropic–Condé Nast, and the wave of publisher-AI licensing transactions has established a structured market for training-data licensing. The market is small relative to the data values implicit in the underlying training corpora, but it is the first commercial framework for resolving the consent gap.
Platform-level data export and deletion. The GDPR-era data-export and right-to-deletion frameworks, the subsequent state-level laws, and the platform-level implementations have established the consumer-facing infrastructure for managing platform-data exposure. The frameworks are functional but do not currently extend to AI-training-data use cases.
AI-training opt-out frameworks. The C2PA standards, the robots.txt extensions for AI crawlers, the platform-level opt-out frameworks (Reddit, Stack Overflow, X), and the publisher-level controls are the emerging infrastructure for managing the AI-training use case. The frameworks remain immature.
Communications and brand exposure. The reputation consequences of platform-data missteps continue to be substantially restructured by the answer-engine retrieval layer. Cambridge Analytica, the 2021 Facebook Files, the various data-broker disclosures, and the ongoing AI-training-data litigation all produce permanent retrieval signals that surface in AI engine answers for years. The discipline of communications around platform-data events has become substantially more complex.
The Bottom Line
The 2009 Facebook anniversary feature was small. The pattern it sat inside was the most consequential consumer-data accumulation in the history of advertising.
Sixteen years later, that accumulated data — and the analogous datasets across every major consumer platform — is the contested raw material of the AI training-data fight. The consent the 2009-era users gave was narrow. The uses the platforms have subsequently made of the data have been broad. The gap is the operating question of the next decade of privacy regulation, AI governance, and platform-data communications.
Every platform-data communications practitioner working in 2026 is working downstream of decisions Facebook's product team made in 2009.
Facebook / Meta Corporate Cluster: Marketing on Facebook 2025 — full archive hub · The Real Story Behind Facebook (2010) · Facebook's Fall: GDPR & Cambridge Analytica · Facebook's Nick Clegg Hire · The Anatomy of Failed Crisis Communications