Oracle introduced a new feature for DataGuard called Far Sync. This configuration is design to support synchronous redo transport between primary and standby database which are far apart. In usual configuration synchronous redo transport between data centre in Europe and US is possible but inefficient from the primary database performance perspective. Every commit has to be written locally to redo logs and send to standby with confirmation – a network round trip between Europe and US is a significant delay for every commit operation in synchronous mode. If database has to run in no data loss mode DBA had no choice but run primary in SYNC mode and suffer from every transaction network round-trip or setup both databases in data centres which are relatively close to each other.
To mitigate that problem an Oracle 12c has introduced DataGuard Far Sync instance. Far Sync instance is a log transport "proxy" and it has to be located close to the primary database. As Far Sync is a type of instance only – there is no need to put whole database there. Far Sync instance requires space for a standby and an archive logs.
In that configuration primary is sending transaction to Far Sync instance in synchronous mode so every transaction is written to primary redo logs and Far Sync standby logs – so commit time is limited to time required for both writes and network round trip between primary and Far Sync. In next stage Far Sync instance is sending redo stream in asynchronous mode to standby database located in other data centre.
In my test configuration I setup OpenVPN connection between VM running in my home ESX server (located in Europe) and two EC2 instances running in USA. That configuration allows me to measure commit time for synchronous mode with and without Far Sync instance.
Configuration details:
Primary database – TEST_PRM
Far sync instance – FARSYNC
Standby database – TEST_STD
Primary database configuration – DataGuard related parameters only
Primary log transport configuration
As we can see in histogram – log file sync time between 256 and 512 ms is most frequent one and commit time for most of transaction were in that time range.
In addition to Swingbench traffic I run one test transaction with simple row insert and traced it – here are important lines from trace file To mitigate that problem an Oracle 12c has introduced DataGuard Far Sync instance. Far Sync instance is a log transport "proxy" and it has to be located close to the primary database. As Far Sync is a type of instance only – there is no need to put whole database there. Far Sync instance requires space for a standby and an archive logs.
In that configuration primary is sending transaction to Far Sync instance in synchronous mode so every transaction is written to primary redo logs and Far Sync standby logs – so commit time is limited to time required for both writes and network round trip between primary and Far Sync. In next stage Far Sync instance is sending redo stream in asynchronous mode to standby database located in other data centre.
In my test configuration I setup OpenVPN connection between VM running in my home ESX server (located in Europe) and two EC2 instances running in USA. That configuration allows me to measure commit time for synchronous mode with and without Far Sync instance.
Configuration details:
Primary database – TEST_PRM
Far sync instance – FARSYNC
Standby database – TEST_STD
Primary database configuration – DataGuard related parameters only
DB_UNIQUE_NAME=TEST_PRM FAL_SERVER=TEST_STD LOG_ARCHIVE_CONFIG='DG_CONFIG=( TEST_PRM,FARSYNC, TEST_STD)' LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME= TEST_PRMStandby configuration:
LOG_ARCHIVE_CONFIG='DG_CONFIG=( TEST_PRM,FARSYNC, TEST_STD)' LOG_ARCHIVE_DEST_1='location=/u01/app/oracle/oradata/test12c/arch'
First test case – SYNC mode redo transport between primary and standby database
Primary log transport configuration
log_archive_dest_2 = 'service="TEST_STD", SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="TEST_STD" net_timeout=10, valid_for=(online_logfile,all_roles)'; log_archive_config = 'dg_config=(TEST_PRM,TEST_STD,FARSYNC)In both cases I run Swingbench against primary database and monitor time histogram for ‘log file sync’ event (primary database was restarted every time to have identical test conditions and reset counters)
SQL> select * from v$event_histogram where EVENT = 'log file sync'; EVENT# EVENT WAIT_TIME_MILLI WAIT_COUNT LAST_UPDATE_TIME CON_ID ---------- ------------------------------ --------------- ---------- ------------------------------------------------------------------------- ---------- 144 log file sync 1 5 10-JUL-13 08.50.01.857358 PM +01:00 0 144 log file sync 2 23 10-JUL-13 08.38.51.202819 PM +01:00 0 144 log file sync 4 11 10-JUL-13 08.40.00.723312 PM +01:00 0 144 log file sync 8 2 10-JUL-13 08.59.06.068904 PM +01:00 0 144 log file sync 16 8 10-JUL-13 08.59.22.090580 PM +01:00 0 144 log file sync 32 18 10-JUL-13 08.59.29.450597 PM +01:00 0 144 log file sync 64 41 10-JUL-13 08.59.33.983966 PM +01:00 0 144 log file sync 128 62 10-JUL-13 08.59.39.447536 PM +01:00 0 144 log file sync 256 326 10-JUL-13 08.59.40.640604 PM +01:00 0 144 log file sync 512 1423 10-JUL-13 08.59.40.640691 PM +01:00 0 144 log file sync 1024 37 10-JUL-13 08.59.11.646609 PM +01:00 0 144 log file sync 2048 1 10-JUL-13 08.53.53.715921 PM +01:00 0 144 log file sync 4096 1 10-JUL-13 08.56.15.150343 PM +01:00 0 13 rows selected.
WAIT #140043716402464: nam='log file sync' ela= 184627 buffer#=419 sync scn=987285
As you can see 184 ms for commit is not a great result.
Second test case – SYNC mode redo transport between primary and far sync instance
Primary log transport configuration
LOG_ARCHIVE_DEST_2='service="FARSYNC"','SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="FARSYNC" net_timeout=30','valid_for=(online_logfile,all_roles)' SCOPE=BOTH;Far Sync instance required Oracle 12c binaries installed and special version of control file created on primary database using following command
ALTER DATABASE CREATE FAR SYNC INSTANCE CONTROLFILE AS '/tmp/farsync.ctl';
Instance parameters:
CONTROL_FILE=/u01/app/oracle/oradata/test12c/control01.ctl DB_UNIQUE_NAME= FARSYNC FAL_SERVER= TEST_PRM LOG_FILE_NAME_CONVERT='test','test' LOG_ARCHIVE_CONFIG='DG_CONFIG=( TEST_PRM,FARSYNC, TEST_STD)' LOG_ARCHIVE_DEST_1 = "location=/u01/app/oracle/oradata/test12c/arch" LOG_ARCHIVE_DEST_2='service="TEST_STD"','ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="TEST_STD" net_timeout=10','valid_for=(standby_logfile,all_roles)' SCOPE=BOTH;As you can see in above configuration LOG_ARCHIVE_DEST_2 is configured as ASYNC and pointed to real standby database.
Let's run test again
SQL> select * from v$event_histogram where EVENT = 'log file sync'; EVENT# EVENT WAIT_TIME_MILLI WAIT_COUNT LAST_UPDATE_TIME CON_ID ---------- ------------------------------ --------------- ---------- ------------------------------------------------------------------------- ---------- 144 log file sync 1 254 10-JUL-13 09.23.58.069192 PM +01:00 0 144 log file sync 2 364 10-JUL-13 09.23.58.027215 PM +01:00 0 144 log file sync 4 338 10-JUL-13 09.23.57.951897 PM +01:00 0 144 log file sync 8 278 10-JUL-13 09.23.57.540682 PM +01:00 0 144 log file sync 16 82 10-JUL-13 09.23.55.700236 PM +01:00 0 144 log file sync 32 29 10-JUL-13 09.23.56.896548 PM +01:00 0 144 log file sync 64 29 10-JUL-13 09.23.52.709345 PM +01:00 0 144 log file sync 128 18 10-JUL-13 09.23.40.675756 PM +01:00 0 144 log file sync 256 8 10-JUL-13 09.23.32.935084 PM +01:00 0 144 log file sync 512 18 10-JUL-13 09.23.32.949511 PM +01:00 0 144 log file sync 1024 7 10-JUL-13 09.22.50.183919 PM +01:00 0 11 rows selected.Now results are totally different – most of transactions have a commit time between 1 to 8 ms – so all transaction has been commit after writing data to primary redo and far sync standby logs. From performance perspective this is much better and still primary database is protected in no data loss mode. Similar line from trace file:
WAIT #140132880983720: nam='log file sync' ela= 1003 buffer#=63 sync scn=1042355
This time commit time was almost 180 faster and took only 1 ms.
Above configuration is a simplest one and doesn't include alternate destination if Far Sync is down. When standby database become primary there is no far sync instance close to new primary and it has to be addressed as well.
I will keep my tests and I hope I will blog soon next part.
regards,
Marcin
1 comments:
Great test. I just want to setup this scenario and have a try.
Thanks for sharing.
--Ricky
Post a Comment