Key Information

Register
Submit
The challenge is finished.

Challenge Overview

This is an application for a data storage, search and retrieval archiving system for emails in Gmail. The key point is to make the system scalable and provide the ability to search potentially huge datasets of emails. The application will provide the ability to archive emails to the cloud from within a Gmail account based upon selection of labels. The application will provide the ability to search through the archived emails on the cloud. The application will provide the ability to restore emails from the cloud back to the user's Gmail inbox.

This module provides the implementation of the backend services of this application. This component provides the implementation of the MailExportingService and the MailExportingWorkerService.

Workflow: To highlight the context and scope of this component it is important to have a look to the mail exporting workflow in the gmail archiver application.

  • A user (who has a permission to) will search for accounts, select some and ask to export.
  • The export request is passed to the MailExportingService (in scope).
  • The MailExportingService in its turn will send a separate request for each account through a JMS queue. (in scope)
  • The JMS handler will delegate the exporting to MailExportingWorkerService
  • The MailExportingWorkerService will perform the actual exporting job. (in scope)
  • The user (through the front end) will monitor the statuses of the export job using the MailExportingService. (in scope)
  • The user (through the front end) will ask to download/delete the export result file using the MailExportingService. (in scope)

MBOX Format: This is the format of the export result file. It is a simple format that simply appends the messages in the Internet Message Format with a separator line. The separator line in the simplest form, which is used in this design, consists of

    From[space]-[space][sent date of message][new line]
    [empty line]

This simple format was tested with thunderbird.

Cloud Storage The export result file is expected to be a large file (> 25GB), though, it will be saved temporarily in the cloud storage.

The cloud storage used (OpenStack Swift Storage) does not support direct appending to archived files. So, to support exporting in multiple runs a workaround is used in this design; that is enforcing the multipart upload (a.k.a large object creation). Each run will be saved in a separate object in the cloud and a manifest object will link them. A GET request to the manifest file will download all the files (ordered by name) as if they were one big file. The cloud storage access is delegated to the CloudStorageService which is out of scope. 



Final Submission Guidelines

N/A

ELIGIBLE EVENTS:

2013 TopCoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30031762