Some customers have tens of thousands of member accounts in their domain. SherpaTools Directory Manager makes it easy to retrieve, edit, and manage user accounts. It also allows you to import or export data in bulk from a Google Docs spreadsheet. This post explains how Directory Manager works with Google Apps, and how we built it.
Authentication to SherpaTools is achieved by using Google Apps as the OpenID identity provider. If a user is already logged into Google Apps, they can access SherpaTools without ever providing SherpaTools their credentials. This Single Sign-On experience greatly enhances user adoption and provides an added security benefit. If the user is not already logged into Google Apps, they are routed to a Google login page. With OpenID, SherpaTools never handles the user's credentials.
Once logged in, SherpaTools securely requests authorization from Google Apps using 2-legged OAuth (2LO) to make API service calls on behalf of the user. Since the app has both an end-user and administrator view, it first retrieves the logged in user's information from Google Apps via a 2LO-authorized call to the UserService of the Provisioning API. Depending on the information sent in the API response, the user is either presented with the administrator application or the end-user screen. Two-legged OAuth (2LO) allows 3rd-party applications like SherpaTools to make authorized API calls to Google Apps on behalf of a user. Here is how we set up our Google Data API ContactsService that will be fetching User Profiles to use 2LO:
ContactsService contactsService =
new ContactsService(GlobalConstants.APPLICATION_NAME);
GoogleOAuthParameters parameters = new GoogleOAuthParameters();
parameters.setOAuthConsumerKey(GlobalConstants.CONSUMER_KEY);
parameters.setOAuthConsumerSecret(GlobalConstants.CONSUMER_SECRET);
OAuthHmacSha1Signer signer = new OAuthHmacSha1Signer();
try {
contactsService.setOAuthCredentials(parameters, signer);
} catch (OAuthException e) {
// not expected if secret is up-to-date
}
As long as our key/secret pair is correct and the Google Apps customer has entitled our OAuth key to have access to their Contacts API feed, Google authorizes SherpaTools to continue to make API calls. There are two other settings that should be mentioned in configuring the service to work well on GAE. First, since we are dealing with somewhat sensitive data, all calls to Google Apps are made over SSL. To ensure this, we simply set the useSSL flag for the contacts service. Next, the default request/response timeout on GAE for these API calls is only five seconds out of a possible ten. Since we will be retrieving as much data as we can within that ten second window to reduce the total number of operations to complete the work, we raise our connection timeout up to just short of that maximum, 9500 milliseconds:contactsService.useSsl();
contactsService.setConnectTimeout(9500);
Cloud Sherpas embraced a number of Google Web Toolkit best practices to ensure scalability of SherpaTools. For example, once the app determines which screen the user should see, SherpaTools employs GWT CodeSplitting to optimize and reduce the amount of javascript that needs to be downloaded by the browser client. The app also uses the GWT RPC framework designed according to the command pattern to transparently communicate with the server, and was architected using the model-view-presenter (MVP) design pattern to allow multiple developers to work on the app simultaneously.
After a Google Apps administrator logs into SherpaTools for the first time, the app caches some key information for better performance. For example, to populate the User Profile and Shared Contacts lists, the app retrieves the IDs and names of all contacts using the User Profiles API and Shared Contacts API respectively, and writes this information to the data store and memcache. And for domains with large data sets, SherpaTools uses task queues to break up operations into smaller chunks.
Since we have to scale the export to handle the contact information of tens of thousands of contact entries, there is no way we can retrieve all of those entries in one request. Retrieval of this set of information requires breaking the operation up into smaller sub-tasks. Fortunately, both the GAE Task Queue API and the Google Datastore APIs make it easy to divide the retrieval into smaller chunks. GAE Task Queues enable background queueing of HTTP operations (GET, POST, etc.) to arbitrary URLs within our application. Each of these queued operations are subject to the same restrictions of any other GAE HTTP operation request. Of particular note is the aforementioned 10 second window to perform our remote service call and the overall 30 second window to complete the total work within a task request. Also, since the Task Queue works from the same set of URLs as are made available to the rest of our application, we need to make sure that there is no ability for unwanted external attempts to execute tasks. We followed the recommended way of eliminating this possibility by restricting outside access to just our application's admins.
This constraint restricts all urls starting with /task/ to only be accessible either from system calls such as from the Task Queue or by admins. The NONE transport guarantee is also important to mention. We initially attempted to encrypt our task calls using SSL with a transport guaranteed of CONFIDENTIAL, but, at the time we attempted this, execution ceased to function properly. Since all of the traffic of all of these calls are strictly on Google's internal network we had no issue with making these calls without SSL. Now that we have our tasks properly secured, we can create a method for sending our User Profiles fetch task request to the Task Queue:
public void fetchUserProfilesPageTask(String spreadsheetTitle,
String loggedInEmailAddress, String nextLink, String memcacheKey) {
Queue queue = QueueFactory.getQueue(USER_PROFILES_QUEUE);
TaskOptions options =
TaskOptions.Builder.url("/task/"+USER_PROFILES_FETCH_URL);
options.param("spreadsheetTitle", spreadsheetTitle);
options.param("loggedInEmailAddress", loggedInEmailAddress);
options.param("nextLink", nextLink);
options.param("memcacheKey", memcacheKey);
queue.add(options);
}
The url points to a Java HttpServlet that handles the task's HTTP POST, parses the sent parameters, and calls a method to perform the work:
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
String loggedInEmailAddress = req.getParameter("loggedInEmailAddress");
String spreadsheetTitle = req.getParameter("title");
String nextLink = req.getParameter("nextLink");
String memcacheKey = req.getParameter("memcacheKey");
// do the work:
fetchUserProfilesPage(spreadsheetTitle, loggedInEmailAddress, nextLink, memcacheKey);
}
Summary
This post explains how SherpaTools Directory Manager uses Two Legged OAuth for authentication, and GAE Task Queue API and the Google Datastore APIs to make it easy to divide large retrievals over long intervals into smaller chunks. Other long-running, divisible operations, can use this same approach to spread work across a string of tasks queued in the GAE Task Queue. We would love to hear what you think of this approach and if you have come up with your own solution for similar issues.
Next week we will discuss how we used the User Profile API to retrieve the data sets and the Document List Data API to populate a Google Spreadsheet.