Best Practices for Interacting with Veson Data Lake
Overview
The Veson Data Lake is a cloud-native, highly scalable solution that allows customers to interact with and download large amounts of data in a very efficient manner.
This article describes some best practices that customers clients may choose to adopt when interacting with the Veson IMOS Platform (VIP) Data Lake. There are many ways to ingest and process data for reporting purposes; here we seek to lay this article lays out one option that is widely utilized.
Table of Content Zone |
---|
Technologies |
...
& PatternsThis article assumes familiarity with the following technologies and patterns:
|
...
Architecture Diagram
...
Downloading DataData can be downloaded from the data lake on a regular basis using either a scripted call to the AWS CLI |
...
or by building a lightweight application that leverages the AWS SDK. Veson Nautical recommends running the download process at the same cadence as the data lake refresh interval. By default, the refresh interval is set to once per day. However, this can be adjusted to run as often as once per hour. |
...
The result should be a folder that contains an uncompressed copy of the latest flat files, ready for ingestion. Veson Nautical recommends logging the output of the download process. It is also worthwhile setting up automated alerts for the scenario where downloads are interrupted or disk space problems occur. Ingesting Data via ETLPopulate the Staging TablesA recommended pattern for ingesting flat files into a |
...
data warehouse is to use Microsoft SSIS (SQL Server Integration Services) in conjunction with a staging schema and the "wipe and load" technique. |
...
data warehouse consumers should not be permitted to access the staging tables directly. Permissions can be applied at the schema level to implement this restriction. Create an SSIS package that:
Update the Production Data Warehouse TablesWith the latest data now sitting in the staging tables, the penultimate step is to update the production tables in the data warehouse. There are several ways to do this, however, the goal here is to ensure that any reports being executed are not interrupted. Wipe and load cannot be used on production tables |
...
because that would lead to an inconsistent experience for people running reports. Create an SSIS package that:
|
...
|
...
Other Tips
|
...
Local Clean UpAfter the flat files have been successfully ingested, be sure to programmatically clean up the download folder structure on your local file system. Veson Nautical recommends implementing a sliding window delete mechanism |
...
so that you have n days' worth of data on your local system for diagnostics purposes |
...
Feedback
...
. |