Olive Data Ingestion Framework is a versatile, cloud-agnostic tool designed to streamline data ingestion and transfer processes. It connects seamlessly to various data sources and destinations, facilitating faster and more efficient data handling. ODIF operates without the need for pre-installed clusters and can be deployed with minimal resource usage. Its user-friendly web interface allows for easy data source registration, job configuration, execution, and monitoring.
Key Features and Functionality:
- Reusable Connectors: Once created, connectors can function as both source and sink, enhancing flexibility.
- RDBMS Source Support: Enables selection of multiple databases and tables, with options for full dataset retrieval or specific subsets using where clauses.
- Split Job Mechanism: Automatically divides large datasets into smaller jobs to accelerate ingestion.
- Multiple File Format Support: Compatible with CSV, TXT, Parquet, and JSON file formats at the destination.
- Load Types: Supports both incremental loads for regular ingestion and full loads for historical or one-time data transfers.
- User Interface and API Access: Provides both a web interface and REST APIs for comprehensive control.
- Job Scheduling: Allows scheduling of jobs to run at specified intervals.
- Livy Integration: Supports Livy on static clusters for enhanced processing capabilities.
- Cluster Flexibility: Operates on both static and on-demand clusters across AWS, Azure, and GCP platforms.
Primary Value and User Solutions:
ODIF addresses the complexities of data ingestion by offering a cloud-native, platform-agnostic solution that simplifies the connection between diverse data sources and destinations. Its dynamic compute capabilities and API-driven design ensure efficient data transfer without the overhead of extensive infrastructure setup. By automating tasks such as job splitting and scheduling, ODIF reduces manual intervention, accelerates data processing, and enhances overall operational efficiency for organizations handling large-scale data operations.