Tuesday, April 17, 2012

Replication between Content Server Instances


There have been several questions recently in the various WebCenter Content or Universal Content Management forums regarding how to replicate content from one instance to another. I will try to explain the process here.

There are several reasons to set up for replication between instances, such as
  1. Moving development files (templates, dynamic pages, etc.) from development to test to production
  2. Moving content from production to test so that real content is used in the testing of new developments
  3. Moving content from a contribution instance in production to a consumption instance in production
Typically when moving files from development to test and from test to production, the replication is manually. Since a development environment may have several different projects being worked at one time, it is not desirable to automatically replicate the files every time one is edited as this would disrupt testing. The same is also true of moving from test to production.  To address this, it becomes the responsibility of the development team to identify and track all the files that comprise each project so that when it is time to replicate, the proper files are moved.

In test, you may be testing a new or modified workflow or a change to a Site Studio web site. In this case, it is not always practical or desirable to have the testing team create a lot of test content. The answer is to use production content. So the test plan should identify the content to be replicated from production to test (and development). This would be done manually in most cases to meet the needs of the tests being done.

If you have set up an environment for contribution and consumption, perhaps where the contribution for an internet site is done behind the firewall and the consumption is done from the DMZ, then it would be desirable to set up an automated replication so that content is made available as soon as possible after approval in contribution.

There are also several ways to move content, and table data. All involve using the Content Server’s Archiver tool and can be done manually or in an automated fashion.  I should also mention, in an effort to be complete, that there are times when the replication of content or table data is not the entire effort. Some developments will also require the use of the Content Migration Utility (CMU) to move configuration changes, metadata, workflow or other changes from one instance to another. The use of CMU is a topic for another day.

The Archiver Applet

The Content Server’s Archiver is a Java applet that is used to transfer and reorganize Content Server files and information. Archiver has four main functions:
  • Export - Used to copy native and web-viewable files out of the Content Server instance for backup, storage, or import to another Content Server instance. You can also export content types and user attributes. You export to an archive, which contains the exported files and their metadata in the form of batch files.  In addition, the Archiver can export the contents any of the tables in the table space.
  • Import­ - Used to retrieve files and Content Server information from an exported archive. Importing is typically used to get a copy of content from another Content Server or to restore data that has been in storage. You can also change metadata values during an import.  In addition, the Archiver can import the contents any of the tables in the table space.
  • Transfer - Used to transfer content from one Content Server instance to another over sockets. This is typically used to move or copy content across a firewall or between two Content Server systems that do not have access to the same file system. You can also use the Transfer function to transfer archive files between Content Server systems that have access to the same file system. 
  • Replicate ­­- Used to automate the export, import, and transfer functions. For example, you can use replication to automatically export from one Content Server instance, transfer the archive to another computer, and import to another Content Server instance.   
You can read more about the Archiver and its functionality in the Oracle® WebCenter Content System Administrator's Guide for Content Server, 11g Release 1 (11.1.1), Chapter 8 Managing System Migration and Archiving.

Archiver Concepts

I am not going to repeat the product documentation here but there are some concepts that you need to be familiar with.

Archives

When you run the Archiver, you define the archive to be created or appended to. On the file system, each archive is a folder or subdirectory within the collection it belongs to. This archive contains all the exported content files along with one or more batch files.

Collections

A collection is, by default, a set of archives on a particular instance of a content server. It is also possible to create multiple collections within an instance.

Batch File

A batch file is a text file that contains the file records for archived content items. Batch files describe the metadata for each exported revision.
  • A new batch file subdirectory is created each time an archive is exported.
  • Each batch file contains up to 1000 file records. If an export contains more than 1000 revisions, a new batch file is created.
  • Archiver batch files are not the same as the batch files that are used with the Batch Loader application.

Source and Target

The Source is the instance where the content is coming from, usually a development or contribution instance.

The Target is where the content will end up, usually a consumption instance.

Push vs. Pull

When setting up a transfer or replication you need to consider the type you will set up: Push or Pull

A pull transfer is a transfer that is owned by the target instance.  In this type of migration the outgoing provider is setup on the target.  Below are the main characteristics of a pull transfer.

  • Multiple pull transfers can be concurrent.
  • The Pull method is best when the Target can talk to the Source but because of firewall issues the Source cannot talk to the Target.  It is also used when the Source is a Cluster.
  • If you are running a pull transfer across a firewall, you might need to configure the firewall to permit the outgoing provider’s socket to pass through it.  
A push transfer is a transfer that is owned by the source instance.   In this type of transfer the outgoing provider is setup on the source.  Below are the main characteristics of a pull transfer.
  • Only one push transfer can be in progress at a time.
  • The push method is best used when the Source system can talk to the Target but the Target cannot talk to the Source because of firewall or network issues.
  • If you are running a push transfer across a firewall, you might need to configure the firewall to permit the both providers’ sockets to pass through it.
Please refer to Oracle® WebCenter Content System Administrator's Guide for Content Server, 11g Release 1 (11.1.1), Chapter 8 Managing System Migration and Archiving for more detailed information.

Providers

Content Server allows for the creation of providers to manage connections to external entities. These can be databases, directory servers, other content servers, and several other types. For the purpose of this article, the outgoing provider is the one that is used. It is basically a connection initiated to an outside entity. You can use this type to communicate between Content Server instances. In the case of a Push transfer, you would set up an outgoing provider on both the source and the target. Doing this allows the target to talk back to the source, providing notifications about the transfer. With a Pull, you only need to set up the outgoing provider on the target instance.

For details on creating the outgoing providers, see Oracle® WebCenter Content System Administrator's Guide for Content Server 11g Release 1 (11.1.1), 4.5 Connecting to Outside Entities with Providers

Caveats

There are three principal configurations that must be in place for replication.
  • IDC_Name - Archiver cannot be used to move or copy data between two instances that share the same Content Server instance name (IDC_Name). To do so corrupts the data on the target system. It is best practice that all content servers have unique instance names.
  • SocketHostAddressSecurityFilter – Outgoing providers communicate at the socket level. You must inform the content servers of the IP addresses of the machines that will be allowed to communicate over the sockets. This setting is in the config.cfg file.
  • AutoPrefix – If you are using the AutoPrefix it must be unique between instances. It is best practice to always use the AutoPrefix and to set it to be unique between instances.  

The Replication Process

Organize Content

The best practice in managing the migration process is to use the divide and conquer approach.  Content can be divided by types and other metadata to allow for different types of transfer.  For example, web site assets will be transferred manually whereas contributed content can be setup to automatically transfer from source to target. 

Set up Communication between servers

In order to transfer content an outgoing provider must be created to facilitate transfer from one instance to another.  For a push transfer the administrator needs to create an outgoing provider on the Source and (optionally) on the Target.  For a pull transfer the administrator needs to create an outgoing provider on the Target.

One element to consider is the ability of the Target and Source to talk through the firewall.  To transfer across a firewall, the network administrator might need to configure the firewall to permit the outgoing provider’s socket to pass through it.

Manual Migration vs. Automatic Migration

Manual migration is where an archive will be created from the source instance and manually migrated to the target instance.  Manual migration is typically important when unscheduled publishing of content to production environment is not desirable.  Automatic migration also known as replication is a process where migration can occur automatically based on some setting, for example migration could be trigger by a new release.

Manual Migration

In general these are the steps to setup manual transfer. 
Manual Export Steps
  • Create an archive where the exported Content Server data will be stored.
  • In the Current Archives list, select the archive.
  • Create an export query.
  • Set configuration information export options.
  • Set the general export options.
  • Initiate the export.
Manual Transfer Steps
  • Open Archiver on the source Content Server.
  • Open the archive collection that contains the source archive.
  • Select the source archive in the Current Archives list.
  • Select Actions—Transfer.
Manual Import Steps
  • In the Current Archives list, select the archive from which to retrieve data.
  • Review the batch files in the archive. If necessary, remove revisions from the batch
  • If you want to change metadata fields or values during the import, set up the field and value mappings.
  • Set the general import options. Test the import mappings and rules on a few individual revisions.
  • Initiate the import.

Automatic Migration

Automatic Migration, also known as replication, follows these general steps

Setting up Automatic Export
  • Set up the export and run a manual export
  • Open Archiver on the Content Server that content is to be exported from.
  • Open the archive collection.
  • Select the archive to export to automatically in the Current Archives list.
  • Click the Replication tab.
  • Click Edit.  The Registered Exporter Screen is displayed.
  • Select the Enable Automated Export check box.
  • Click Register.
  • The current collection
  • Click OK.
Setting up Automatic Transfer
  • Set up the transfer and run a manual transfer.
  • Open Archiver on the source Content Server.
  • Open the archive collection.
  • Select the source archive in the Current Archives list.
  • Click the Transfer To tab.
  • Click Edit.  The Transfer Options Screen is displayed.
  • Select the Is Transfer Automated check box.
  • Click OK.
  • Test the automatic transfer
  • In the source Content Server, check in a new document that meets the export criteria.
  • If the export is automated, wait until automated export occurs after indexing. Otherwise, export the source archive manually. The archive should be transferred to the target Content Server within a few minutes.
Setting up Automatic Import
  • Set up the import and run a manual import.
  • Open Archiver on the Content Server that the archive is to be imported to.
  • Open the archive collection.
  • Select the archive to import automatically in the Current Archives list.
  • Click the Replication tab.
  • Click Register Self. You are prompted to confirm the action.
  • Click OK.