Data Transfer for Replatforming Projects
Replatforming, or moving an application from one platform to another, has become increasingly popular because of the potential cost savings. Typically, replatforming is performed from a Mainframe, to a variety of other platforms including : Windows, Linux, AIX, HPUX and Sun. In any replatforming project, transferring the data will be one of many issues that will need to be addressed.
Seven Issues of Data Transfer:
What to transfer
Time to transfer
Methods to transfer
Automation of transfer
Data format compatibility
In this article, I have broken the issue of transferring data into seven separate items. I will discuss each of these items with the potential pitfalls and the solutions to each.
What to Transfer
At some sites determining what to transfer will be a trivial matter. But at other sites with years of different developers and consultants, this could be a significant effort. Begin by reviewing the contents of your FCT or resource definitions for FILE, your database schema, and also review the DLBLs in your JCL. For each dataset you will want to determine the following information:
Is the data static, transient, dynamic, cumulative, or historical.
Static data does not change. Ok, there is no such thing as a data that does not change, but what I really mean that the data does not change very often and if it does it is because of some external event such as a software upgrade.
Transient data is only valid for a fixed period of time. Transient data files typically get written to during the day and processed and deleted during the nightly cycle.
Dynamic data is written and read continuously.
Cumulative data is not deleted or rewritten. Log files are a good example of cumulative data.
Historical data is only kept in case it is needed because of an external event such as a subpoena or lawsuit.
What is format of the data:
Fixed record size or variable record size
Time to Transfer
Knowing how long it will take to transfer the data will be critical to planning the cut over day. Large amounts of data take time to transfer and more than likely your mainframe has a large amount of data or you would not be using a mainframe. Fortunately, you will probably have opportunity for several test runs that will be sufficient to be confident that you have a good measure of how long it will take to transfer the data.
In order to know that you are transferring the data in the fastest way possible, you will need to determine the theoretical maximum for the transfer method you are using. If you are going over your internal network, that theoretical maximum will be the speed of the network. By knowing the maximum you will know when you can stop refining your transfer technique because it is already at the maximum. Also, by knowing the maximum you will be able to determine if a particular method is even worth pursuing.
Methods to Transfer
With the name File Transfer Protocol, you would expect FTP to be the first choice in transferring files. FTP has several options that need to be set properly for a file to transfer properly. With a certain set of options FTP will convert the file from EBCDIC to ASCII, which may seem like a good thing, but if the file contains binary data, the binary data will become corrupt. People who are familiar with FTP will be quick to point out the BINARY option to turn off the character conversion, but there is more to a successful transfer than that. Typical problems that occur are the RDW (Record Descriptor Word) being removed and trailing space being removed from lines without the addition of EOL (End Of Line) markers; both problems lead to data that can not be recovered reliably. Because implementations of FTP vary on the mainframe I can not give you the magic formula for transferring files, but I can tell you to look at these FTP commands: SITE RDW, BINARY, and ASCII.
scp is secure copy and it part of the ssh tool chain. Because scp is run from the command line on the target platform it can be handy to use when the data transfer is to be initiated from the target platform. The old standby, IND$FILE, will be useful for older machines that do not support FTP or scp. Custom data transfer methods can be built where there are special considerations.
A tape transfer will be the best option when the mainframe does not have network connectivity or when the target machine is not on the same network.
Automation of Transfer
To achieve repeatable transfers it will be important to automate or have a rigorous procedure for transfers.
FTP is normally an interactive transfer, but some implementation of FTP on the mainframe support JCL control. In addition, there are Perl modules available that can be used on the target machine that can be programmed to automate FTP. Scp can be invoked from any shell language, so it readily adapts to automation. IND$FILE would have to be automated from a screen scraping utility.
It is an obvious statement, but it is worth saying: it is vitally important that your data be compatible between the mainframe and the target platform. But what about the subtleties of making sure data is compatible?
Running EBCDIC on the target platform can eliminate most of the problems of data conversion, but you still need to be aware that there are a couple of versions of EBCDIC. Those versions vary so slightly that you could easily miss this problem in testing. Pay particular attention to the compatibility of these characters and you should be fine. If you find you have problems with these characters, then review you data transfer method to confirm it is not changing the data and review your target system configuration to confirm it is using the same version of EBCDIC as the mainframe.
If you have any use of National data items (PIC N in COBOL) then you will want to pay particular attention to that data. Support for PIC N in COBOL compilers varies widely, so it is important to test these features thoroughly.
You will also want to scan your code for the use of floating point numbers and verify the compatibility of floating point data.
Running the native character set (usually ASCII) on the target computer has definite advantages, especially if you are using a database. But this adds a major complication to the cut over process: you must convert the data.
Data conversion is especially complicated if you (and you probably do) have binary data or packed data mixed with your character data. To do this type of conversion you will need to write custom programs or use a template based conversion scheme. Make sure that your conversion routines flag any data that does not convert perfectly as a way to validate your conversion programs/templates. This may seem obvious, but in practice the programmer may take a short cut and just convert a byte of data from EBCDIC to ASCII without checking that the character is within the printable range.
If running in the native character set is a requirement, one strategy that should be considered is a two-stage conversion: the first stage is to convert to the new platform using EBCDIC, with the second stage to convert the data to ASCII.
Data Format Compatibility
I have separated the issue of data format from the issue of data to make a point that might otherwise be overlooked: each COBOL vendor has defined their own format for ESDS and KSDS files. This is further complicated by separate formats for fixed record and variable record ESDS files.
There are two solutions to the data format issue:
1) Convert the data format as part of the procedure of transferring the data to the target platform.
2) Change the way that the COBOL runtime system reads the file such that it can read the mainframe format files.
The second option is the preferred technique when data on the target machine needs to be transferred back to the mainframe. By eliminating the conversion, you have eliminated unnecessary IO. This can be more than a performance issue; it can be an issue of maintaining organization by eliminating the need to support multiple file formats.
It is my hope that this article has been helpful in identifying issues that may affect the data transfer phase of your replatforming project. This information is intended to be general in nature and may not apply to all environments. In an effort to keep this article brief, I have left out details that may be important in your environment. If you would like to discuss your particular environment, please contact us at firstname.lastname@example.org.