Key Information

Register
Submit
The challenge is finished.

Challenge Overview

In this assembly, we want to implement a java tool to extract user related data from topcoder_dw and tcs_dw (i.e. TopCoder Data Warehouse and TopCoder Software Data Warehouse) databases into a JSON file to be loaded to the Google Big Query (https://developers.google.com/bigquery/preparing-data-for-bigquery). Unlike the relational database which uses normalized data schemas, in BigQuery, we typically want to denormalize the data structure in order to enable super-fast querying. While JOINs on small datasets are possible with BigQuery, they're not as performant as a denormalized structure.

For user data, we have defined the data we needed to extract to put into BigQuery (See BigQuery_User.xlsx in documentation section) . You should define the BigQuery table schema for user data, a single table is enough. And choose the appopriate big query data type for each column. The schema should be defined in a JSON file which Big Query accepts (refer to https://developers.google.com/bigquery/docs/reference/v2/tables#schema).

Detailed Requirements

  • The extracted json file should be valid and have all the needed characters escaped. The value read from the database is not json value safe.
  • The json data file extracted should be BIG QUERY ready - i.e. It can be directly loaded into the big query table defined by the schema you created.
  • The tables containning the user data are in topcoder_dw, the tables containning the user competition and user rating data are in tcs_dw. The java class will use JDBC to run SQL to select all the needed data out and insert into the JSON file. Please note that the records of user is around 1 million, so make sure you take the SQL and writing to JSON performance into serious consideration.
  • The java class will be called by other java codes to extract the data.
    • It should take a path to specify the store location of the generated JSON data as input.
    • It should take the name of the generated JSON fille without the extension (.json or .gz)
    • It should take a flag to indicate whether the tool should gzip the generated JSON file. If true, the output should be a gzipped like json_data.gz
  • The database connection properties like JDBC connection string, database username, password should be put into a property configuration file.
  • You need to write a small demo to call the tool to generate the data.
  • An ant script should be written to compile and run the demo and tool.

Testing

  • The dataware house databases on the TopCoder VM assigned to you will not have any data, so you need to choose either of the following to setup test data
    • Write SQL to insert test data directly into the data warehouse.
    • Insert test data generated by test data generator and run the tcs loaders to load the data into the warehouse. You need also need to write some manual SQL to insert some data if the loaded data is not comprehensive.

VM Environment

You need a TopCoder VM to work on this challenge, you will mainly use the informix database on VM and if you choose to run data loader to load the warehouse data, you will need to run the loaders under account 'tc'. Please request the VM in the challenge forum.

Information about VM can be found below:

VM specific information is found here: http://www.topcoder.com/wiki/display/docs/VM+Image+2.5. Upon registration as a submitter or reviewer you will need to request a VM based on the new TopCoder Cockpit/Direct image. To request your image, please use the forum. Before requesting your VM, you need to ensure that you have an SSH key created and in your member profile. Instructions to do so are here: http://www.topcoder.com/wiki/display/projects/Generate+SSH+Key, and instructions to connect afterwards are here: http://www.topcoder.com/wiki/display/projects/Connect+Using+SSH+Key.



Final Submission Guidelines

  • The whole package of source codes, libraries, configuration files, test data and extracted json data.
  • The defined big query table schema for user in json format.
  • A detailed deployment guide with verification details.

ELIGIBLE EVENTS:

2015 topcoder Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30044775