{"id":1632,"date":"2026-03-09T19:51:40","date_gmt":"2026-03-09T19:51:40","guid":{"rendered":"https:\/\/adrianotanaka.com.br\/?p=1632"},"modified":"2026-03-09T20:01:28","modified_gmt":"2026-03-09T20:01:28","slug":"architecting-the-real-time-lakehouse-from-goldengate-ingestion-to-oracle-aidp","status":"publish","type":"post","link":"https:\/\/adrianotanaka.com.br\/index.php\/2026\/03\/09\/architecting-the-real-time-lakehouse-from-goldengate-ingestion-to-oracle-aidp\/","title":{"rendered":"Architecting the Real-Time Lakehouse: From\u00a0GoldenGate\u00a0Ingestion to Oracle AIDP"},"content":{"rendered":"\n<p><strong>Architecting the Real-Time Lakehouse: From&nbsp;GoldenGate&nbsp;Ingestion to Oracle AIDP<\/strong><\/p>\n\n\n\n<p>Oracle AI Data Platform or simply AIDP is a platform where companies could build their Lakehouse\u2019s using bleeding-edge technologies like AI Agents, Open Table formats and Spark as the engine of their processing layer.&nbsp;<\/p>\n\n\n\n<p>Adding Oracle GoldenGate to this topology is the right way to bring near real time data movement capabilities, in this article I will talk about both technologies, how Oracle GoldenGate interacts with AIDP, what are the requirements and so on.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>AIDP Architecture<\/strong><\/h2>\n\n\n\n<p>Before diving into the replication process, it\u2019s important to understand the key components of AIDP architecture:<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Catalog<\/strong><\/h5>\n\n\n\n<p>The AIDP Catalog is a resource that centralizes the metadata of your datasets, doing a parallel within Oracle Database, like the Data Dictionary, it allows you to setup the data governance and other features like mapping and managing structured and unstructured data (like files in Object Storage)<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Schema<\/strong><\/h5>\n\n\n\n<p>In the schema there will be the logical structure of your datasets just like a schema from Oracle Database, you can have tables and views.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Table<\/strong><\/h5>\n\n\n\n<p>Here we can see one of the most important AIDP features, the support for Delta table format, using Delta table we can have ACID features with \u201cflat files\u201d in a cloud storage<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Compute<\/strong><\/h5>\n\n\n\n<p>The Compute resource from AIDP is responsible for delivering Spark Environment, you can use CPU org GPU machines, divided in Drivers and Workers nodes (you can even have multiple workers if your workload needs to).<\/p>\n\n\n\n<p>AIDP has more components but for GoldenGate we will use the previous ones:<\/p>\n\n\n\n<p>The replicat process will connect to a JDBC endpoint (it runs inside the compute cluster), it will use the three-part namespace (Catalog.Schema.Table) to identify the target object and execute a stage-merge procedure.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Creating the required resources in AIDP:<\/strong><\/h5>\n\n\n\n<p>First, we will create a Catalog :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"257\" height=\"213\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image.png\" alt=\"\" class=\"wp-image-1635\"\/><\/figure>\n\n\n\n<p>It may be a Standard Catalog type.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"315\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-1.png\" alt=\"\" class=\"wp-image-1636\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-1.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-1-300x167.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Now let&#8217;s create our schema, here I will be using the medallion architecture (Bronze, Silver and Gold) and my schema will be \u201cbronze\u201d:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"406\" height=\"147\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-2.png\" alt=\"\" class=\"wp-image-1634\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-2.png 406w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-2-300x109.png 300w\" sizes=\"auto, (max-width: 406px) 100vw, 406px\" \/><\/figure>\n\n\n\n<p>The workspace is used to organize your resources, mainly the python notebooks but could be used to build workflows and most importantly, inside your workspace you will have the compute cluster that provides the compute power to run your workloads.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"255\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-5.png\" alt=\"\" class=\"wp-image-1639\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-5.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-5-300x135.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"94\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-3.png\" alt=\"\" class=\"wp-image-1638\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-3.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-3-300x50.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>When deploying your Compute cluster you can define the size of it and the python version, for whom is used to work with Spark it\u2019s easy to spot the similarity, here you have the Drivers and Workers nodes.<\/p>\n\n\n\n<p>After the cluster creation, go to Connection Details tab, there you will see the JDBC URL, we need it to setup the GoldenGate connection.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"271\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-4.png\" alt=\"\" class=\"wp-image-1637\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-4.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-4-300x143.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Data Delivery<\/strong><\/h5>\n\n\n\n<p>Oracle GoldenGate has multiple formats when we are talking about data delivery, here we will work with Stage and Merge replicat, where our data is stored inside a OCI Object Storage and merged into Delta tables inside AIDP, but you can  send data to a bucket in avro\/parquet file and ingest into AIDP table using PySpark.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Connections<\/strong><\/h5>\n\n\n\n<p>I\u2019m using the OCI GoldenGate but for the on-premises version it should be similar, you need to use the JDBC endpoint with tenancy and user OCID, region, just like a normal OCI authentication, you can always refer to official documents <a href=\"https:\/\/docs.oracle.com\/en\/middleware\/goldengate\/big-data\/23\/gadbd\/qs-realtime-data-ingestion-oracle-ai-data-platform-oracle-goldengate-daa.html#GUID-52A45276-7087-432E-ADC3-76A48D2E4A9E\">https:\/\/docs.oracle.com\/en\/middleware\/goldengate\/big-data\/23\/gadbd\/qs-realtime-data-ingestion-oracle-ai-data-platform-oracle-goldengate-daa.html#GUID-52A45276-7087-432E-ADC3-76A48D2E4A9E<\/a> if you have doubts.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"434\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-6.png\" alt=\"\" class=\"wp-image-1640\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-6.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-6-300x230.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>Besides the AIDP connection you will need an OCI Object Storage connection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Extract<\/strong><\/h2>\n\n\n\n<p>For the extract I suggest you use these parameters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ddl include mapped - -> To capture DDL from mapped objects<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>LOGALLSUPCOLS - -> To capture all columns, we will need this in the target side<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Replicat<\/strong><\/h2>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Stage and Merge<\/strong><\/h5>\n\n\n\n<p>Using the default replicat, where we write directly into AIDP tables:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"446\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-8.png\" alt=\"\" class=\"wp-image-1642\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-8.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-8-300x236.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>In your properties file define the temporary bucket and the compartment with these parameters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gg.eventhandler.oci.compartmentID=ocid1.compartment.oc1..aaaXXXX<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>gg.eventhandler.oci.bucketMappingTemplate=bucket-tmp<\/code><\/pre>\n\n\n\n<p>In parameter file we must configure the Catalog.Schema mapping, something like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>MAP *.*, TARGET catalog_tanaka.bronze.*;<\/code><\/pre>\n\n\n\n<p>In this case the temporary dat file will be generated in bucket-tmp bucket and the final data will be written into the catalog_tanaka catalog inside the bronze schema, using the specified JDBC endpoint from connection.<\/p>\n\n\n\n<p>If everything is correctly configured when can track the steps from Spark UI (Cluster &gt; Spark UI option &gt; SQL \/ DataFrame tab):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"298\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-11.png\" alt=\"\" class=\"wp-image-1644\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-11.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-11-300x158.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>As you can see, first the replicat will see if the table exists (SHOW tables command) and starts to generate the temporary files into the bucket:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"406\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-10.png\" alt=\"\" class=\"wp-image-1645\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-10.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-10-300x215.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>The table will be created, note the USING DELTA parameter:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"295\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-7.png\" alt=\"\" class=\"wp-image-1641\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-7.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-7-300x156.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>And finally, the data will be merged:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"253\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-9.png\" alt=\"\" class=\"wp-image-1643\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-9.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-9-300x134.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>And we can use a Python Notebook to query it:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"221\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-12.png\" alt=\"\" class=\"wp-image-1646\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-12.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-12-300x117.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<p>Because I\u2019m using a wildcard in the map parameter, I just need to trigger a DML from my source DB and It will replicate the table:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"566\" height=\"417\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-14.png\" alt=\"\" class=\"wp-image-1648\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-14.png 566w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-14-300x221.png 300w\" sizes=\"auto, (max-width: 566px) 100vw, 566px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"92\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-13.png\" alt=\"\" class=\"wp-image-1647\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-13.png 567w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-13-300x49.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"560\" height=\"327\" src=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-15.png\" alt=\"\" class=\"wp-image-1649\" srcset=\"https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-15.png 560w, https:\/\/adrianotanaka.com.br\/wp-content\/uploads\/2026\/02\/image-15-300x175.png 300w\" sizes=\"auto, (max-width: 560px) 100vw, 560px\" \/><\/figure>\n\n\n\n<p>Did you notice that _stage_ tables? They\u2019re here to do the merge from the Object Storage files.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>DDL Replication<\/strong><\/h5>\n\n\n\n<p>As of today, there is no DDL replication available for AIDP targets, so you must choose what action to take when a DDL happens at the source.<\/p>\n\n\n\n<p>Use the DDLOPTIONS REPORT parameter in your replicat parameter to print the DDL and you can work with EVENTACTIONS to ignore (STOP) or abend the process(ABORT).<\/p>\n\n\n\n<p>With this combination of parameters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DDL INCLUDE ALL EVENTACTIONS (REPORT, ABORT)&nbsp; EXCLUDE OPTYPE CREATE<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>DDLOPTIONS REPORT<\/code><\/pre>\n\n\n\n<p>You will have this in your discard file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Operation discarded due to ABORT event from DDL on object name tanaka.MINHA_TAB_PERF3 in file \/u02\/Deployment\/var\/lib\/data\/aidp\/aa000000001, RBA 6789660<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>SQL Operation &#91;alter table TNK_SOURCE.MINHA_TAB_PERF3 add (cpf2 varchar2(10)) (size 62)]<\/code><\/pre>\n\n\n\n<p>So, it should be easy to fix that issue.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>The integration allows organizations to leverage GoldenGate&#8217;s proven replication power to feed a high-performance Spark\/LakeHouse environment, enabling real-time analytics and AI-ready data layers within the Oracle ecosystem.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Architecting the Real-Time Lakehouse: From&nbsp;GoldenGate&nbsp;Ingestion to Oracle AIDP Oracle AI Data Platform or simply AIDP is a platform where companies could build their Lakehouse\u2019s using bleeding-edge technologies like AI Agents, Open Table formats and Spark as the engine of their processing layer.&nbsp; Adding Oracle GoldenGate to this topology is the right way to bring near [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1661,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"material-hide-sections":[],"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1632","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/posts\/1632","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/comments?post=1632"}],"version-history":[{"count":11,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/posts\/1632\/revisions"}],"predecessor-version":[{"id":1662,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/posts\/1632\/revisions\/1662"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/media\/1661"}],"wp:attachment":[{"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/media?parent=1632"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/categories?post=1632"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/adrianotanaka.com.br\/index.php\/wp-json\/wp\/v2\/tags?post=1632"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}