The best way to perform an in-depth analysis of GitHub data with RapidMiner is to load GitHub data to a database or cloud data warehouse, and then connect RapidMiner to this database and analyze data. Skyvia can easily load GitHub data (including entities) to a database or a cloud data warehouse of your choice.
ELT process supposes simple copying cloud data to a data warehouse or a database as-is, leaving all the transformation tasks for the database server. This is often uses, for example, when loading data to cloud data warehouses with affordable and nearly unlimited computing power for transformations. In Skyvia, this task is solved with easy-to-configure Replication packages.
ETL process supposes that data structure in source and target is different, and data must be transformed before loading it into target database. For example, you may want to create a schema for OLAP or simply have target tables for data already created. In Skyvia, this is task solved with Import packages, having powerful mapping and transformation capabilities.
All you need to do is to specify parameters for connecting to GitHub and data warehouse and select which data to replicate.
Skyvia’s Replication Tool will painlessly ensure you always have the most current data from your cloud applications in your data warehouse.
You don’t need to prepare the database — Skyvia creates the tables, corresponding to the source objects, in the data warehouse automatically.
Skyvia offers powerful mapping features for data transformations. You can perform data splitting, use complex expressions and formulas, lookups, etc.
You can import only new and updated records, and thus, keep your database for analysis always up-to-date.
With Skyvia all the relations between the imported GitHub objects will be preserved. You need just to specify them in mapping.