Sunday, 9 October 2016

RAC Database Details

RAC Database Details

Oracle Real Application Clusters (RAC) is a shared-everything database architecture where multiple instances (on different servers) access a single set of database files. This provides High Availability (HA) and Horizontal Scalability that a single-instance database cannot offer.

1. Core Architecture Components

The power of RAC lies in its ability to present multiple servers as a single database to the end-user.

ComponentFunction
Grid InfrastructureThe foundation layer that includes Oracle Clusterware and ASM (Automatic Storage Management). It manages the "membership" of nodes.
InterconnectA dedicated, high-speed private network used for Cache Fusion. It allows nodes to ship data blocks to each other without hitting the disk.
Shared StorageA SAN or NAS where the actual data files, redo logs, and control files reside, accessible by all nodes simultaneously.
SCAN (Single Client Access Name)A virtual name that provides a single point of entry for clients, regardless of how many nodes are in the cluster.

2. Global Resource Management

To prevent data corruption when multiple servers try to update the same record, RAC uses two primary background services:

  • Global Cache Service (GCS): Tracks the location and status of data blocks in the various instance caches.

  • Global Enqueue Service (GES): Manages "enqueues" (locks) to ensure transaction consistency across the cluster.

These services together maintain the Global Resource Directory (GRD), a "map" of which node holds which piece of data in its memory.

3. High Availability & Failover

One of RAC's biggest selling points is its ability to hide failures from the application.

  • Fast Application Notification (FAN): An "interrupt" mechanism where the database tells the application immediately when a node fails, so the app doesn't hang waiting for a TCP timeout.

  • Transparent Application Failover (TAF): A client-side feature that automatically reconnects a user session to a surviving node. If you were running a SELECT query when the node died, TAF can even resume the query on the new node.

  • Application Continuity (AC): A more modern feature (12c+) that can replay uncommitted transactions (DML) after a failure, making the outage almost completely invisible to the user.

4. Key Benefits vs. Challenges

While RAC offers massive scale, it comes with increased complexity.

Benefits

  • Scalability: Add a server to the cluster to increase CPU/RAM capacity without taking the database offline.

  • Rolling Upgrades: Apply patches or OS updates to one node at a time while the others keep the business running.

  • Load Balancing: Automatically routes new users to the least-busy server in the cluster.

Challenges

  • Cost: RAC is an "Extra Cost Option" on top of the Enterprise Edition license.

  • Complexity: Requires specialized networking (Interconnect) and storage (ASM/OCFS2) skills.

  • Application Design: If an application is not "RAC-aware" (e.g., heavy contention for the same data blocks), the overhead of moving blocks between nodes can actually make it slower than a single-instance database.