How We Migrated Millions of Users from LDAP to MySQL using Feature Flags
At Egnyte, we previously used LDAP to store user and customer identity data for extended periods of time. This worked well until we started facing performance problems with customers trying to create more than 100K users in a single account. Although customer data was shared over multiple LDAP instances, we had scaling issues when more than a million users were mapped to a single instance. Read performance was acceptable (as long as data would fit in the memory), but the write performance started degrading with additional entries. We needed to change our architecture to meet growth demands and continue to give consistent high performance to our users, in addition to enabling customers to easily and effectively scale and manage their Egnyte accounts.
Another obstacle we encountered was that it’s not easy to find engineers that are fluent with LDAP, and features requests started getting bottlenecked to a few engineers. Technically, LDAP transaction model is quite different from a RDBMS, and writing code to rollback updates across multiple entries is not fun and tends to be time consuming. LDAP schema changes required manual intervention and extra maintenance windows, which we had to balance with not impacting our global user base with too much downtime.
Since LDAP has been part of our platform since day one, migration of existing data was a key concern. There were many models, and each model had numerous fields - AND what if migration caused some fields to be missed? We use a Service Oriented Architecture, and there were many moving parts that used LDAP, so converting to MySQL required upgrading ALL services in ALL data centers at the same time. This was risky and a hard sell to the ops and management teams. This exercise was as complex as changing the engine of multiple cars in a race track, while keeping the race alive.
To address the migration challenge, we used Feature Flags and interface-based programming. All code using LDAP was already factored out in one interface among various services and languages (We use both Python and Java extensively). To move to MySQL, we introduced two fields for each customer: URL to reach LDAP and URL to reach MySQL. These fields were pre-populated for all customers. A Routing Directory Service was then created to route requests based on the backend store, to route calls accordingly to LDAP Directory Service and SQL Directory Service. We had to follow the same pattern in all other services, and some were written in Python so we unfortunately needed to duplicate the effort.
To avoid massive disruption for our customers, we released the code in sleeper mode to production. One service at a time was upgraded as different services followed different release schedules. The code was in sleeper mode for 1-2 weeks since none of our customers could be moved to MySQL until we upgraded all services to use router logic.
After migrating a handful of customers to MySQL, we took a pause. Looking at performance metrics, we found a missing index that had to be quickly fixed. It took us about a month to roll out the changes across all customers. But the effort was worth it! Since moving to MySQL, write performance has been constant.
At this point, LDAP is gone, and we plan to remove the Routing Directory Service in an upcoming release and formally shut down all LDAP servers. The biggest benefit I see is that individuals from production support to product management can now easily navigate the schema and solve many problems on their own. Gone are the days of planned maintenance windows for LDAP schema updates; everything is done without disrupting the service. Less downtime and quick updates are key to our global customer base, and these changes definitely improve customer satisfaction and confidence in Egnyte products.