TCSE IEEE Technical Council on Software Engineering

September 2001
Home Up Contents

WCRE 2004 call for papers - see Events

 

Message from the Chair

Hello and welcome to this new issue of the Reverse Engineering and Reengineering Newsletter. This is the newsletter for the Reverse Engineering and Reengineering (RER) Committee, which is a part of the IEEE's Technical Council on Software Engineering (TCSE). I am your new RER Chair, Dr Cristina Cifuentes, and I work at Sun Microsystems Laboratories in the area of binary translation (i.e. reengineering of executable/low-level code). 

As we all know, reverse engineering is the process of recovering higher-level information from lower-levels of abstraction. For example, recovery of design information from source code, or recovery of high-level code from assembler code. On the other hand, reengineering is the process of transforming a system to the same level of abstraction, for example, transforming a program written in the C language to one written in the C++ language, or transforming assembler code for one machine into assembler code for another machine. Clearly, the process of reengineering involves both reverse and forward engineering steps. 

The RER Committee grew out of the growing interest in the areas of reverse and reengineering as software became larger and complex. People working on migrations of legacy code, where the code has been written throughout a period of 10-20 years and where there is lack of complete documentation or understanding of the system, have great sympathy for RER techniques. Migrations involve program understanding, design abstraction, documentation recovery, program transformation, and more. Typically, customers want to migrate to a new machine, a new language, or a new platform all together. 

In my new role as RER Chair, I have organized two workshops/seminars for 2001; "Innovation, Software, and Reverse Engineering: Technological and Legal Issues", 23 March 2001, Santa Clara University (see summary elsewhere in this issue), and the "Workshop on Decompilation Techniques", 3 October 2001, Stuttgart (see events announcements). Expect to see other events during 2002, if you have any suggestions on particular topics of interest, please drop me a line at cristina.cifuentes@sun.com. 

This newsletter will include summaries of important events held in the RER community, articles of interest to our membership, and announcements of future events. We would like you all to be a part of this newsletter, please send suggestions on future articles of interest to our editor Michael Olsem (olsemm@tacom.army.mil). 

Cristina Cifuentes
Sun Microsystems Laboratories
August 2001

Editorial

Well, this newsletter is back after a hiatus of about 3 years.  Your "humble" editor is now working for the US Army instead of the US Air Force.  But I'm still very much interested in Software Reverse Engineering and Reengineering.  After all, software maintenance is just as big a headache for the Army as it is for the Air Force.  We are now online as you can see.  This will solve a number of logistical issues regarding the periodic publication of this newsletter.  But the bottom line is that this newsletter must be useful to you, the community of people working in software reverse engineering and reengineering.  So please let us know what you think and whether we can improve this means of communications within this community.

I look forward to seeing you at the upcoming WCRE in Germany.

Innovation, Software, and Reverse Engineering: Technological and Legal Issues

By: Cristina Cifuentes, Sun Microsytems Laboratories
and Brian Fitzgerald, Southern Cross University, School of Law

This 1-day seminar was held at Santa Clara University, California, on 23 March 2001. The aim of the seminar was to have people from the legal and computing communities in the one room, to discuss issues relating to the reverse engineering of software. The event was sponsored by Santa Clara University School of Law, the Reengineering Forum industry association (REF) and the IEEE TCSE Reverse Engineering and Reengineering Committee. 

The morning session was comprised by technology practitioners and academics, and the afternoon session by legal practitioners and academics. Both sessions had a specialist commentator who raised important points made by the speakers and directed the questions from the audience. The following was the program for the day; we will summarize each talk, expanding on the legal talks as those are less familiar to the readers of this newsletter. 

Technology Panel 

Panelists: 


Elliot Chikofsky (META Group and REF)
"Reverse Engineering: A Technological Definition"
Dr. Cristina Cifuentes (Sun Microsystems and IEEE)
"Reverse Engineering for Interoperability: The Road to Innovation"
Professor Edward W. Felten (Princeton University)
"Reverse Engineering and Information Security Assurance"
Specialist Commentator: 
Paul Martino (InterTrust)

Legal Panel 

Panelists: 


Fred von Lohmann (Morrison & Foerster)
"Reverse Engineering and Copyright Law"
Dr. Anne Fitzgerald (Gadens Lawyers)
"Comparing U.S. and Australian Experience"
Professor Donald Chisum (Santa Clara University Law School)
"Reverse Engineering and Patent Law"
Professor Michael Lehmann (University of Munich, Max Planck Institute)
"Reverse Engineering and European Union Law"
Specialist Commentator: 
Professor Howard C. Anawalt (Santa Clara University School of Law)

Technology Talks

Elliot Chikofsky introduced the terminology used in the area of reverse engineering; what is meant by reverse engineering, reengineering, forward engineering and restructuring. He pointed out that most software developers end up doing one form or another of reverse engineering when developing code, whether by debugging their own program, trying to understand someone else's program (or their own), or when wanting to migrate to another platform. 

Dr Cristina Cifuentes talked about reverse and reengineering of low-level code, that is assembler/machine code. She summarized the differences between compilation, disassembly, decompilation, emulation and binary translation, and gave examples of usages of these types of technologies in the last decades; pointing out that these techniques have been in use for several decades. 

Professor Ed Felton talked about his experiences with decompilation of code for computer security purposes. He gave an example where his team at Princeton had to decompile Java applets in order to determine that the problem with such applets was a linking issue that had not been addressed so far in the literature. Professor Felton also raised the issue of how the DMCA exceptions for reverse 
engineering for computer security are actually making it harder for security researchers to do research or publish their results; as recently seen when SDMI threatened to sue Professor Felten and his team if they published an article on how they unlocked (broke) all SDMI watermarking techniques in a contest. 

Paul Martino talked about his experiences at Ahpah Software, a company he founded to market a Java decompiler, SourceAgain. In his experience, the license given with SourceAgain states what uses are allowed for the decompiler. Paul now works at InterTrust, a company that looks at, among other things, obfuscation techniques in order to prevent reverse engineering of executable code. 

Legal Talks 

Professor Fitzgerald, in opening the legal session, asked the audience to consider the legal issues that would unfold against a tapestry of four theories of intellectual property: 

- economic/utilitarian theory: intellectual property law is justified in terms of economic efficiency
- lockean/labor desert: intellectual property rights are natural rights earned through adding labor to the common resource of information 
- personhood: intellectual property is an emanation of the person and the law should facilitate this personal aspect
- social planning: intellectual property law should be designed to culturally enrich democratic society.

It was suggested that this theoretical backdrop could be used as a framework for analyzing the definition of intellectual property rights concerning software especially in the context of reverse engineering of software.

Reverse Engineering and US Copyright Law

Fred von Lohmann presented on the issue of the intersection between copyright law and reverse engineering of software. He explained from the outset that there may be a number of legal obstacles to reverse engineering software including:

1. traditional copyright law - as intermediate copying (at least) is inherent in the process of looking at running software and therefore most reverse engineering 
2. anti-circumvention law now embodied in s 1201 DMCA - where the reverse engineering involves circumvention of a technological protection measure (TPM)
3. contract law - may involve infringing a license agreement which prohibits RE, although the validity of such license terms is largely untested; see also UCITA
4. patent law - as once again to look at patented software you normally need to make a copy although arguments concerning fair or experimental use are resurfacing
5. trademark law 
6. Computer Fraud and Abuse Act (CFAA)
7. Electronic Communications Privacy Act (ECPA)

Von Lohmann proceeded to outline the ambit of fair use privileges to reverse engineer espoused in Sega v Accolade (9th Cir, 1993) and Atari v Nintendo 975 F. 2d. 832 (Fed Cir 1992), and recently confirmed in Sony v Connectix 203 F. 3d, 596 (9th Cir. 2000). In Sony v Connectix the Ninth Circuit Court of Appeals held firmly in favor of reverse engineering of software in order to allow different software products to be ported or joined (i.e. interoperate) with different hardware, firmware or software platforms. Intermediate copying that occurred in this instance was legitimate because it merely facilitated the copying of the unprotected (non copyright) function or idea of the software (note that ideas are not protected by copyright; patents protect ideas and there were no patents protecting the software at hand). The Court was not concerned with the number of intermediate copies that had been made nor that the end product was used as a competitor of the Sony Play Station. 

Von Lohmann also talked about the DMCA (US Digital Millennium Copyright Act), and he explained that s 1201 could be paraphrased in crude terms as "thou shalt not circumvent", "do not break into my castle and do not violate my house rules 
- seen from the perspective of a copyright holder." He noted that there are exceptions to this regime listed in sections 1201 (d)-(j); especially 1201 (f) concerning reverse engineering, 1201 (g) relating to encryption research and 1201 (j) on security testing.

Section 1201 DMCA was recently at issue in the case of Universal City Studios Inc v Reimerdes. In this action, brought pursuant to the DMCA for offering, providing or otherwise trafficking in a device to circumvent a technological protection measure - arising from the situation where a 15 year old allegedly cracked the encryption code (CSS) of the software lock (a technological protection measure) employed (it was argued) to prevent easy copying of the content of Digital Versatile Discs (DVDs). The court held that the reverse engineering exception could not be invoked. In this case the TPM was CSS and the circumvention device was DeCSS. The defendant was alleged in the final instance to be linking to websites which allowed downloading of DeCSS. 

The defendants had argued that reverse engineering of the software lock was needed in order to play DVDs on other platforms such as Linux, an open source operating system. They argued that DeCSS was necessary to achieve interoperability between computers running on the Linux system and DVDs and therefore the interoperability exception in the DMCA was enlivened. The judge explained that he could not accept such argument for three reasons: there was no evidence to support this assertion, DeCSS runs not only on Linux but on Windows as well (hence it was not developed for the "sole purpose" of running it under Linux), and 1201(f) permits reverse engineering of copyrighted computer programs only and does not authorize circumvention of technological systems that control access to other copyrighted works, such as movies. Von Lohmann pointed out that the crucial issues in determining liability under the DMCA in cases of reverse engineering will be whether fair use has occurred and whether a TPM is involved. 

Reverse Engineering in Australia

Dr Anne Fitzgerald explained recent reforms in Australia concerning reverse engineering of software. The Copyright Amendment (Computer Programs) Act 1999 inserts a new Division 4A, entitled "Acts not constituting infringements of copyright in computer programs", into the Australian Copyright Act 1968. Further amendments to Division 4A have been effected by the Copyright Amendment (Digital Agenda) Act 2000. 

The reverse engineering exceptions apply when performing the act on a legal 
version of a program; the exceptions are: 

- for interoperability purposes, to the extent necessary to enable a new program to be used together with an original program, 
- for error correction, where the owner is not able to provide an error-free version of the program on time, at a reasonable commercial price or has gone out of business,
- for computer security testing of a program, a computer system or network, where the testing is done in good faith to investigate or correct a security flaw.

The exception for computer security testing is limiting as it does not take into account the fact that many computer programs that computer security organizations are interested in testing (e.g. viruses, trojans) are actually tested without the permission of the copyright owner of such program, and the program may be an illegal version of the program. 

Reverse Engineering and Patented Software

Professor Donald Chisum offered some comments in response to recent academic writings arguing for clearer reverse engineering rights in the context of patented software. Professor Chisum opened by reasserting his support for an experimental use exception in patent law, something he said that was explicit in other patent laws throughout the world.

If you purchase a TV you have the right to take it apart or reverse engineer it, which is generally not a case of patent infringement, although as Professor Chisum pointed out, in some instances, contract has been used to try and control activities beyond the first sale of the patented item. However when you reverse engineer software, due to the nature of the product, you engage in the act of reproduction, which is technically a making for the purposes of patent law. A clear and full disclosure of source code in the patent claims would help alleviate this problem, although in many instances there would be a need to look at running code, which would entail a reproduction. 

For Professor Chisum the temporary copying involved in an experimental use should be allowed and to this extent reverse engineering software should not be an issue. In general Professor Chisum explained that the enabling and disclosure aspects of the patent system (at least potentially) make it much more conducive to innovation in software than is copyright law, as patent law places an obligation to disclose to the public the relevant information. The word 'potentially' is stressed as it is widely argued that the necessary information is not being disclosed in the patent claims.

European Union Law on Reverse Engineering and Software

Professor Lehmann talked about the reverse engineering exceptions available in the EU Software Directive of 1991. The EU Software Directive provides for the first statutory provisions on reverse engineering of software world wide. Prof Lehmann argued that Art 6 of the Software Directive can be characterized as a specific limitation on the copyright protection of software. 

Professor Lehmann asked "Why should one limit reverse engineering software when the reverse engineering of any literature has always been?" Seen from a perspective of law and economics, it is the problem of sunk costs with respect to the object code; as free access to the source code will make it easy to copy the software at very low production cost, thereby allowing the copyist to free ride on the investment in the software by the copyright holder. Furthermore the burden of proving infringement in the digital world is very difficult. Therefore the EU Software Directive allows reverse engineering but only under very narrow conditions and so as to facilitate interoperability and error correction.

Underpinning the Directive is the notion that ideas and principles underlying interfaces are not eligible for copyright protection, the general principle for which is now explained in TRIPs art 9 (2). The reverse engineering exceptions bring about limitations of the exclusive rights of the creator so as to allow adequate access to these ideas and principles. 

Similarly, the new EU Directive on copyright in the information society, which aims to transform the WIPO Copyright Treaty (WCT) into European law, explicitly does not want to alter or modify this balance provided by the Software Directive. Although the new Directive generates obligations, in line with the WCT, to prohibit the circumvention of "technological measures" they do not modify Art 6 of the Software Directive, i.e. permitted instances of reverse engineering.

Conclusion

This was an interesting and informative event and highlighted the fact that reverse engineering is a vital and challenging topic to debate. To this end there are many issues in this area across the world which deserve further and continuing attention. 



Automated Transformation of Legacy Systems

By: Philip Newcomb, The Software Revolution, Inc. President, Chairman Board of Directors, and Vice President of Research & Development

Introduction 

Over the last 50 years information processing systems have become the intellectual repositories for most business and government organizations. Today these organizations face the complex and costly problem of how best to restructure the installed base of outdated information processing resources, while maintaining their legacy intellectual property. This legacy intellectual property continues to provide value as the organizations continue to survive in the fast-paced age of e-business, e-communication, e-organizations, and, in the case of the military, e-warfare. 

The need to modernize is primarily driven by three factors: (1) expansion of the information system's functionality; (2) improved maintainability of the information system using modern tools and techniques; and (3) reduction of operational costs and improved reliability by replacing obsolete hardware suites with high-speed, open-architecture systems. 

Businesses are forced to innovate continuously to keep pace with intense levels of competition. The business-related e-commerce revolution will not occur without addressing the legacy intellectual property maintained by the old computing infrastructure. The same is true for the e-warfare revolution being pursued by the US military. Legacy applications and databases are extremely valuable to military business systems as well as to mission-critical systems. 

Alternative solutions for modernization of a legacy information system, in part or in its entirety, include: (1) developing a new system; (2) system replacement with a Commercial-Off-The-Shelf (COTS) solution; or (3) transformation of the legacy applications and databases to operate within a modern computing environment. In the case of the military, the available solution set is even more restrictive because COTS solutions for mission-critical systems don't exist. 

Replacing legacy systems by manually rewriting the system's software, even with the support of semi-automated translation tools, is extremely costly and time consuming. Replacement using COTS technologies, while less costly and timelier, usually requires extensive and expensive customization to provide functionality not provided by the COTS product. A new cost-effective, low risk solution is available that leverages the legacy system's intellectual property to create the new system in a modern environment. 

Through the application of a suite of artificial intelligence (AI) technology tools, it is now possible to automatically assess, transform, re-factor or re-engineer, and if desired, web-enable a wide variety of legacy computer programming languages, along with system databases, into modern, platform-independent object oriented C++, JAVA, and XML with CORBA compatibility (Figure 1). Much of this work originated from Air Force-funded program transformation research in the late 1980s and early 1990s[1].




Figure 1: Legacy System Transformation

Employing highly automated approach, legacy software and databases can be modernized in a fraction of the time and cost needed by the two competing alternatives mentioned above. This dramatically reduces the time for return on investment to less than one year in many cases. The application of the artificial intelligence technology reduces the technical and schedule risks associated with the modernization process, and reduces the flow time of the project. The process provides fully documented code that compiles, links and loads. The modernized application is now in a state where it can be maintained and upgraded using modern tools and software workbenches.

The newly generated software also has the benefit of consistent quality and uniformity because an automated tool created it. Systems comprised of large quantities of code, if addressed manually, will require many programmers. Programmers, even though they are writing code in the same language, have different styles. Those stylistic differences can create major difficulties during the system integration and testing phase of a transformation project. A highly automated approach requires negligible manual intervention, offers a solution that facilitates the uniformity of the code and thus, compresses the integration and testing schedule for the project. 

Technology Description 

By employing AI-based automated program transformation technologies, the legacy application and database modernization process can be addressed in the following tightly disciplined multi-step process. 

* Assessment: captures the legacy system's "As Is" state by extracting properties of the existing system's design, and simultaneously generating detailed documentation of the system.

* Transformation: Provides transformed software that is compiler-ready and testable at the unit level, and fully documented.

Re-factoring: re-engineers the new system to improve system architecture and performance. The re-factoring process provides a disciplined approach to design improvement that minimizes the chances of introducing new flaws.
Web-enablement: facilitates migration of the new system to the web environment by transforming the legacy application to Java that runs on a Java Virtual Machine.

Process Description 

Assessment: For each system to be assessed the AI-based technology must be modified to allow ingestion and parsing of the dialect and any specialized sub-languages (embedded SQL or DB DML statements) of the legacy language. The code is then parsed to build an in-memory Knowledge-based Abstract Syntax Tree (KBAST) model of the entire system. The KBAST is the starting point for all of the detailed analyses that follow. An inventory is developed against the KBAST model to: (1) determine if any components of the application system are missing; (2) detect multiple versions of code; and (3) identify linkage problems. This process is iterative because deviations of the dialects from standard are typically not well documented; making it necessary to develop a series of modifications to the parser before the technology addresses all of the applications code.

After development of the KBAST model, a preliminary transformation of the source code into the target "To Be" model is performed. The purpose of this effort is to assess and compare the "As Is" and "To Be" system models to determine what modifications are to be made to the transformation process to achieve a highly automated transformation into the target language. 

A "dry run" of the transformation process is performed by creating an Intermediate Object-Oriented (IOO) model to develop the transformation metrics including: (1) identification of the percentage of redundant and re-usable code; (2) current and predicted code properties; and (3) potential code and data size reductions possible in the re-factoring process. The code is then transformed into the IOO formalism that allows for detailed identification and assessment of the properties of the target system. Of key significance is: (1) extraction, parameterization, and merging of derived methods associated with derived classes; and (2) measurement of the amount of decoupling, and degree of cohesion and coherence of the resultant system.

The final step of the assessment process involves Domain Analysis of a system. Domain Analysis is best understood as a process that systematically creates a common framework for describing program elements and situations within the code. This descriptive framework facilitates recognition of unique and common roles and relationships among one or more systems. The framework has two tightly related dimensions of analysis that address both the classification of identifiers (taxonomy) as well as the situations involving their usages (semantics). 

The construction of the first dimension of analysis entails describing the individual names that occur within a system as elements of common terms within a Domain Dictionary. The second dimension of analysis describes the more complex relationships among elements in the form of interpretations. The interpretations are denotations for complex situations in the code. The AI-based technology automatically constructs interpretations by rewriting code directly from the structures represented by the code's abstract syntax in the KBAST. Interpretations resemble the language semantics rules form of the code except that Domain Dictionary terms have been substituted for the identifier names in the program code. A single interpretation will therefore match many syntactically similar, but terminologically different specific situations. These situations may occur in the code and serve to identify commonality among complex relationships that occur within a single system or span multiple systems. 

Interpretations are stored within Annotation Libraries and used within the technology for documenting decisions about the situations in the code. Interpretations are automatically generated for individual program statements, data structure definitions, basic code blocks, functions, or entire programs. 
Transformation: The modernization process begins with the automatic identification of candidate classes and objects for output into the IOO model created during the assessment. The IOO model is a relatively complete transformation of the input source code consistent with the structure of object-oriented C++. This transformation into the IOO form locates redundant, duplicate and similar data, and processes and abstracts those detected items into classes. The classes, relationships, attributes and operations of the derived IOO model conform to Universal Modeling Language (UML) standards. 

The design documentation extracted from the IOO model is a hybrid between conventional OO modeling languages and event-driven programming models. The hybrid is more precise and detailed than UML documentation that is manually generated. The mapping from procedural code into object-oriented code creates an intermediate abstract IOO model of the software system that is functionally faithful to the original procedural system. However, this IOO model follows the semantic and syntactic rules of the object-oriented languages C++ and Java. The IOO formalism as implemented, bridges the conceptual gap between the original legacy and the target object-oriented system, explicitly exhibiting the relationship between the source and the target code, as well as the source and target design.
Figure 2 illustrates the process for automated legacy system modernization through the transformation process.



Figure 2: Automated Modernization of a Source into C++ 

The overall process for transforming from a procedural to an object-oriented application starts with input of application programs, and produces as output a completely integrated system. The output system consists of object classes and their instances. These object class instances are complete with regard to data typing, methods, and IOO processes (executable mission-oriented C++ functions, which refer to Class member element and member functions or methods). These constructs are implemented in terms of a derived control structure based on the IOO model. The IOO application contains calls to derived methods associated with desired classes. Figure 3 illustrates this process. 


Figure 3: Procedural to Object-Oriented Transformation

Table 1 summarizes the principal processes of scope analysis, set-use analysis, program unit analysis, alias analysis and physical data modeling. The evaluation of the structure and relationships between the data and processes in the input system provides the starting point for the transformation from the source language into the IOO model. The generation of the IOO model is preparatory to transformation into C++. The following analyses are performed automatically and captured within the KBAST as auxiliary knowledge-based structures to support analysis and transformation operations.

Scope Analysis

Scope analysis relates the occurrence of a variable to its declaration.

Set Use Analysis

Set-Use analysis identifies the definitions and usages of variables in a program with respect to usage semantics for each class of constructs occurring in the programming language.

Program Unit Analysis

Program unit analysis describes the local properties of each designated program unit in the programming language. It describes a data unit's type, and location of its declarations, definitions and references. Program unit analysis describes the signatures of functions: the number, order, type of all procedural objects, declarations, and occurrences.

Physical Data Modeling

Physical data modeling consists of analyzing each data item and data structure for its size, the byte offsets of every field, field length, and the types of each field. The physical data model is used in alias analysis to compare the properties of data structures.

Alias Analysis

Alias analysis analyzes every record and field for its occurrence in other programs and the identification of the aliases for data structures that may occur in a disguised or modified form in different programs.

Table 1: Data and Unit Analysis

Table 2 summarizes the principal control, data flow and program slice analyses that are used to guide the transformation from the source language into C++. The following analyses are performed automatically and made available within the KBAST as auxiliary knowledge-based structures to support analysis and transformation operations.

Control Flow Graph

The Control Flow Graphs of a procedure describe the points of entry and point of exit of each procedure and intermediate control conditions.

Data Flow Graph

The Data Flow Graph of a procedure describes the definition of each data node and its usages. A Usage – Definition (UD) link connects a usage of a variable to a definition (assignment) of a variable.

Program Slice

A program slice is the transitive closure of one or more UD arc (usage to definition) with respect to a variable for a scope determined by the slicing criteria.

Table 2: Control Flow Data Flow and Slice Analysis

Table 3 summarizes the principal "As Is" and "To Be" design artifacts that are the by-products of the transformation from the source language into C++. These design artifacts are hyper-linked to parallel mouse sensitive window views of the source language and target C++. The following documentation is generated automatically and made available within the KBAST to support analysis, documentation, and transformation operations.

Object Model Outline Browser

A structured outline browser for navigating derived objects, classes, methods, and procedures. Popup and Pull down menu provide commands for displaying integrated graphical and textual views of source and target code and derived design and allow operators to perform operations upon the evolving software configuration.

Process Data Flow

Procedure process dataflow behavior is depicted by process data flow graphs which depict the flow of data through the procedure as data flow graphs whose begin and end-points are data sources and sinks (i.e. parameters and designated classes and their fields) and whose processes are methods.

State Transition Table

A state-transition table consists of a table depicting one-or more states and the conditions and actions involved in transitions between the states depicted as a limited entry decision table.

State Transition Graph

The State Transition Graph of a procedure consists of a start state, an end state and a set of intermediate states joined by state transitions.

Table 3: Derived Design Artifacts

Table 4 summarizes the principal object oriented model components that are partial products of the transformation from a source language into intermediate C++. The following object model components are generated automatically and made available within the KBAST to support analysis, documentation, and transformation operations.

OO Data Class

Top-level OO data classes whose member elements are formed from large-grained or top-level data structures from the legacy application, and whose methods are derived from functionally cohesive slices of code associated with the class’s member elements.

OO Process Class

OO Process Classes whose instances are transformations of procedures into OO processes, which refer to the elements of classes. Calls to Methods formed from statement can be invoked from OO processes.

OO Methods

There are several classes of Methods. Data transforming methods are formed from block level slices between designated classes. Side effecting methods are formed from sequences of statements that invoke procedures or use data, but do not directly transform data.

Table 4: OO Model Entities

Re-factoring: Re-factoring is the process of changing a software system in such a way as to improve the software's structure and performance without altering the functional behavior of the code. Re-factoring operations are primitive transformations of the code that accomplish limited design restructuring to improve the code prior to completing transformation. Larger-grained transformations are accomplished by applying multiple sequential re-factoring operations to accomplish the overall larger restructuring objective.

Re-factoring is used to eliminate, replace, or rewrite code to improve its efficiency and understandability or to transform applications to use a suitable set of modern infrastructure support functions. Re-factoring operations, especially fine-grained operations supported by the technology, but introduced by humans, can potentially change the system behavior, either in ways that are intended or unexpected. Large grained re-factoring operations can be broad scoped re-factoring operations, which preserve functionality, while improving code quality. 

Re-factoring encapsulates, abstracts, and/or reorganizes data structures or functionality. Re-factoring introduces modes to decouple a slice of code from its original context and makes it a re-usable component. Re-factoring to introduce parameters into syntactically similar functions or methods for the parts of the code that are dissimilar transforms these components to make them more re-usable. Re-factoring extracts and parameterizes methods; merges and consolidates similar methods; reduces the set of methods associated with a class to the minimal set of well-understood operations; improves the coupling, cohesion and comprehensibility of the overall application; and reduces overall code duplication and code redundancy in the application. Re-factoring to combine classes or extract common data elements of several classes into a common parent class improves the structure of the application data model by reducing data duplication. 

Legacy applications often have many dependencies on the legacy software infrastructure. Consequently it is often not directly portable to another software environment. Modernization of applications may require isolating the mission application code from the code associated with various environment specific utility/support functions known as Infrastructure Code. 

After isolating the infrastructure from the mission software, the mission-critical software shall be analyzed and re-factored without risk that comprehension of the functional essence of the mission-critical software application is obfuscated by legacy infrastructure code. One key objective for re-factoring the mission-critical software application will be to minimize and isolate the target system's unique features by partitioning the software application into functional-area generic and mission-specific classes. The domain model constructed during the assessment will guide this partitioning.

Infrastructure code of legacy software is sometimes sufficiently similar to the desired modern equivalent that it is practical to achieve operational code directly from the transformation process without any re-factoring. In such cases there is often no need to re-factor infrastructure code as a separate step in the transformation process. Infrastructure code is often tightly coupled or embedded in the mission code in a way that makes it difficult to re-engineer or maintain the code. In such cases re-factoring of the application to isolate the infrastructure code may be necessary. 


The development of infrastructure support software, its unit testing, and integrated functional testing, is typically the most time consuming phase of the overall transformation process. Infrastructure support software, often called middleware, provides a layer of functionality in the new target environment that replaces functionality that previously existed in the application legacy environment. 

Often the legacy functionality does not exist in the new environment, or only exists in a different form. Thus, this layer of software services must be discontinued, redeveloped, or suitably replaced. The definition or introduction of an appropriate interface to the facilities/services layer into the newly derived application, and the testing of both the interface and the services layers accessed through the facilities/services layer interface, is an iterative process. Subtle changes in the way the application interfaces with its data bases or environment is a common source of errors in a transformation project. Separation of the infrastructure layer from the application layer expedites resolution of errors that are caused by changes in the operational environment. Domain model annotation rules are used to formally document the disposition of the infrastructure support software. 

This re-factoring effort entails decoupling the infrastructure code from the application code by encapsulating it into a separate layer of code that is accessible through an API (Application Program Interface). The API hides the complexities of the infrastructure code from the application logic and allows the infrastructure layer code to be maintained without modification of the application code layer. This approach has the twin benefits of: (1) separating the intricacies of the infrastructure code layer from the application logic layer so that the infrastructure code can be easily developed or adapted to different needs without necessitating changes to the application logic; and (2) makes the application logic easier to maintain by separating it from the intricacies of the infrastructure layer implementation. 

After the legacy infrastructure software is re-factored or transformed to operate within the target environment, it is practical to complete the translation of the OO model into code, and commence testing. Testing of the infrastructure code is completed before commencing re-factoring of the mission-critical software application. 

The degree of reduction and consolidation that the re-factoring transformation brings about is an important property of the transformation process that is indicative of the benefits that conversion to an object-oriented form can bring about. Table 5 summarizes the model metrics that shall be available and used to measure the benefits of re-factoring and other forms of code optimization.

Method Instance Table

The Method-Instance Table (MIT) describes the number of instances created per class. This provides an indication of the amount of code reduction associated with the description of data structures for the target language. It is a partial measure of data redundancy eliminated through object class and instance formation. The C++ application may dynamically create the data structures rather than statically pre-allocate the data. This dynamic allocation is taken into account.

The MIT also indicates the number of methods eliminated through method merging. This measure, combined with the amount of code per method, provides a reasonably accurate measure of the amount of code reduction achieved by the transformation and re-factoring processes.

Model Statistics Report

Model Statistics metrics provide a measure of the amount of functional and structural compression achieved through method merging reductions, and object-instance abstraction.

State Transition Table Parallelism

An indication of the potential amount of block-level parallelism is indicated by sequence numbers associated with the method invocations in a state transition table. (Methods are independent if they are not dependent upon one another by a Set/Use-chain relationship and hence could be invoked simultaneously rather than sequentially.) In the paragraph state-transition table, invocations of a transition’s action are independent of one another if their invocation sequence numbers are identical.

Table 5: Transformation Metrics

Web-enablement: Web-enablement entails the transformation of an application into a networked distributed application that makes use of browser user-interfaces (BUI), web-based languages, run-time environments such as Java and the Java Virtual Machine (JVM), and web-based data transmission and manipulation protocols, such as the extended markup language (XML) for the interchange of data. 
Hybrid web applications are web applications that exhibit some, but not necessarily all of these features. A web-application may be written in C++ and have a BUI front-end that supports interface with users or a back-end database with connectivity via Microsoft's Object Data Base Connectivity (ODBC). Java applications typically employ JDBC (Java Data Base Connectivity) to connect to various vendor data base products with similar functionality as ODBC.

The Common Object Request Broker Architecture (CORBA) has a common Interface Definition Language (IDL) that supports both Java and C++. CORBA is commonly used to provide distributed component services for a smaller number of users with high-performance requirements. CORBA's IDL facilitates the creation of networked distributed applications by simplifying the definition of interfaces that allow components to call one another that reside anywhere in the network and may be implemented in any language that supports IDL. Enterprise Java Beans are frequently used for lightweight Java applets with many users supported by a distributed server. Java applets more frequently use Remote Method Invocation (RMI) rather than IDL as the principal method for transferring data between an application and a remote host. 


The prevalence of both CORBA and Enterprise Java Beans for distributed component management architectures often coexists in large applications. For instance CORBA with C++ is often used for highly optimized transaction oriented data base applications, while Java Enterprise Java Beans and Java applets are often used for lightweight interactive end-user applications. 


Given the multiple business processes performed by the applications within large confederated legacy systems and the tradeoffs between several possible alternative distributed component web-based architectures, the definition of the most appropriate transformation approach is a complex decision to be evaluated. It requires an in-depth analysis of the customer's applications, analysis of the implications of alternative solutions, and possibly some amount of iteration to define an appropriate transformation pathway. This decision process is driven by the current system architecture, the target architecture objectives, the technical infrastructure, including overall toolset capabilities, and personnel resources available to support this transformation process. 

Several approaches to web-enablement are achievable with a relatively high-level of automation. 


Transform legacy applications into C++, but componentize and containerize the applications as a CORBA domain within the CORBA framework. Transform the current hierarchical networked databases into a relational or object-oriented database using ODBC for data base access. Transform the user interfaces of the applications to use a Browser User Interface (BUI) by inserting an explicit API layer written in C++ to drive BUI level interactions. 

Transform legacy applications into Java rather than into C++. Componentize and containerize the resulting Java Applets as Enterprise Java Beans (EJBs) in an Enterprise Java Architecture. Transform the current hierarchical networked data bases into relational or object-oriented using JDBC for data base access. Transform the user interfaces of the applications to use a browser user interface (BUI) by inserting an explicit API layer written in Java to drive BUI level interactions. 

Create a hybrid architecture consisting of both C++ and Java for some legacy systems (or parts of systems) using CORBA IDL or XML-RPC as a connectivity layer between components. Use XML (XML Metadata Interchange) for modeling the distributed architecture. Transform the current hierarchical networked databases into relational or object-oriented databases allowing data access by either JDBC or ODBC. Allow for definition and interchange of data via XML using XSDL (XML Schema Definition Language) and XQL (XML Query Language). Transform application user interfaces to use BULs by inserting an API supporting XHTML and Java or C++ to drive BUL-level interactions.

Summary

As today's organizations address the critical structural, cultural, and financial issues surrounding the migration of their often irreplaceable legacy software applications and databases to modern platform-independent computing environments, it is essential that they understand that a new automated low-risk approach is available. Exceedingly advanced toolsets and processes for rapidly re-engineering legacy system software into a modern computing environments provides organizations with a valuable new alternative that is faster, lower cost, lower risk, and higher quality than other methods currently available.

Author Biography



Philip Newcomb, The Software Revolution, Inc. President, Chairman Board of Directors, and Vice President of Research & Development: An internationally recognized expert in the application of artificial intelligence and formal methods of software engineering, he has published numerous papers in his field. He has graduate work and degrees from Carnegie Mellon University, the University of Washington, Ball State University and Indiana University. He has done groundbreaking research in the applications of artificial intelligence, software engineering, and automatic programming. He formulated the conceptual product framework and developed the software transformation technology, which now resides in the eVolution 2000TM toolset described.

1 Rome Laboratory's Knowledge Based Software Assistant Program.
The Software Revolution, Inc. 08/29/01

 

Send mail to olsemm@tacom.army.mil with questions or comments about this web site.
Last modified: May 25, 2004