Transcription

2016 5th IIAI International Congress on Advanced Applied InformaticsReverse Engineering from Mainframe Assembly toC Codes in Legacy MigrationDaisuke Fujiwara and Nagisa IshiuraRyo Sakai, Ryo Aoki, and Takashi OgawaraSchool of Science and TechnologyKwansei Gakuin University2–1 Gakuen, Sanda, Hyogo, 669–1337, JapanSYSTEM’S Co., Ltd.7–24–5 NishigotandaShinagawa-ku, Tokyo, 141–0031 JapanAbstract—This paper presents a method of constructing Cprograms from mainframe assembly programs. IBM mainframeassembly programs, which are called as subroutines from programs written in high-level language such as COBOL, areautomatically translated into equivalent C programs. The assembly programs are converted into intermediate representation(IR) of the SSA form on which dataflow analysis, recognitionof control structures, and pattern match based transformationare applied to produce codes with readability. Our methodfeatures documentation of the translation process. Along withtranslation, correspondence between the source assembly codesand the resulting C codes are generated as documents, whichplays very important role in manually correcting incompleteC codes from architecture dependent codes or self morphingcodes. Furthermore, comments in the assembly programs areembedded into appropriate positions in the resulting C programs.A prototype system based on our method successfully translatedsome assembly codes into C program with function, if, and dowhile structures.I.IBM mainframe assembly codes into to C programs. Assemblycodes are converted into intermediate representation, to whichreverse engineering such as reconstruction of control structuresare performed to generate C codes with reasonable readability.However, it should be noted that static translation does notalways succeed; architecture specific codes and self morphingcodes may not always be converted to correct codes. Insuch cases, the resulting codes must be first inspected to seewhat went wrong and be modified manually. In this situation,resulting C codes alone are not enough; it is very important thatthe process of translation or correspondence between originalassembly code fragments and the resulting code sequenceshould be well documented.This paper proposes a method of translating IBM assemblyprograms to equivalent C programs, with emphasis on generating auxiliary information to understand the resulting C codesand the conversion process. C codes with high readability aregenerated via intermediate representation extracted from thegiven assembly codes, in the similar way as [5], [6], [7]. Atthe same time, a table to show the correspondence betweenthe assembly instructions and the resulting C code fragment isgenerated. Furthermore, the comments in the assembly codes,which hold important information to understand the programsuch as the authors of the codes, how the codes should be used,the intent of each code fragment, etc., are embedded into theproper position of the resulting C programs.I NTRODUCTIONIn many business and enterprise systems, mainframe computers have long been used as core computing systems, fortheir high reliability and fault tolerance. In recent years,however, as lower cost open systems such as Linux andWindows servers have gained higher performance, there hasbeen motivation to move from the legacy systems to the opensystems. Since the legacy systems accumulate business knowhow of many years, it is often a rational choice to port theexisting systems to operate on the open systems than to redevelop equivalent systems. Such kind of porting is called“legacy migration,” on which there is a growing demand.The tool based on the proposed method is implemented inPerl5, which has successfully converted some assembly programs consisting of about 100 lines into working C programs.One of the core technologies in legacy migration is portingof programs on the mainframe computers, written in COBOL,PL/1, assembly languages, etc., to run on the modern systems. Programs written in high-level languages may work byre-compilation, or they can be auto-converted [1] or autocorrected to run on the target systems. Since they are relativelyeasy to understand, manual modification on the resulting codesis also easy. On the other hand, assembly programs needsmanual translation to some languages, after understanding theirbehavior. Although those assembly programs are usually small,a single legacy system may contain hundreds, or sometimesthousands of assembly codes, which needs enormous manhours for migration.II.A. Target of ConversionIn this paper, we deal with the problem of converting handwritten assembly programs of the IBM 370–390 mainframesinto C programs, for there is a big demand for migration fromthis architecture. As shown in Fig. 1, we assume that theassembly programs are called as subroutines from the otherprograms written in assembly, COBOL, etc. We also assumethat C libraries equivalent to the library routines and macroscalled from the assembly programs are already prepared.The first priority in this kind of conversion is that theresulting C programs should work correctly. This needs thesame technology as binary translation [2]. The second priorityis readability of the reconstructed C programs. This is becausethe new programs must be maintained, or sometimes beTo solve this problem, there have been several attempts onautomated conversion of assembly codes to high-level codes.Literatures [5], [6], [7] have presented methods of translating978-1-4673-8985-3/16 31.00 2016 IEEEDOI 10.1109/IIAI-AAI.2016.37M IGRATION OF A SSEMBLY P ROGRAMS1058

Fig. get of conversion.debugged, on the new platforms. This needs the technologiesused in decompilation [3], [4].There have been several attempts to convert mainframe assemblies to higher-level programs. Feldman [5] translates IBMassemblies to C programs via an intermediate representationnamed HLL. Ward [6], [7] makes use of a formal intermediatemodel FermaT to generate C programs with high readabilityfrom IBM assembly codes.However, automatic translation does not always succeed.Since the character codes used on the mainframe computersand the open systems are different, routines that directly manipulate character bit patterns may not be converted to intendedprograms. Differences of the address spaces and addressingconventions, such as the use of the most significant bit on theIBM mainframes, cause the same problems. Many unexpectedcoding techniques are often used in hand written assemblies.Furthermore, it is impossible to convert self morphing codescompletely by static translation.Thus, in a practical point of view, resulting codes mustbe investigated to see if the translation was correct or to seewhat went wrong. For this purpose, documentation of howcodes are translated would be very important. At the sametime, comments in the original assembly codes would be veryhelpful, for they describe the intentions behind the codes orgive explanations to complicated logic.Fig. 2.EMONTHER FMTFC EEND18FD12IBM assembly program.B. IBM AssemblyThe IBM mainframes have 32bit architecture. It has 16general purpose registers numbered 0 through 15. It dealswith 32bit, 16bit, and 8bit binaries as well as character stringsand packed/zone decimals. The instruction set consists of 631instructions with 0, 1, 2, or 3 operands.An example of the IBM assembly program is shown inFig. 2. This code defines a subroutine named EMONTH whichreceives the address to a date in YYYYMMDD form as thefirst parameter, and write the last day of the month of the datein the same form at the address passed as the second parameter.III.C ONVERSION OF M AINFRAME A SSEMBLY TO CA. OverviewThe flow of our translation process is shown in Fig. 3.A given assembly program is compiled into an intermediaterepresentation (IR) of the SSA (static single assignment) form,where each instruction is decomposed into atomic operations.Dataflow analysis and a various kind of transformation on theIR are performed to generate a resulting C program.B. Intermediate Representation (IR)The structure of the IR in our method is shown in Fig. 4.The root node Assembly, representing the assembly 8(,13)13,153,4,0(1)0(6,4), C’00000000’ER FMTFC EOM*0999,DUMP14,EOM*0(8,4), CL9’ ,EOM14(2,3), C’01’RET4(2,3), C’12’RETWORK,4(2,3)2,WORK2,02, lyprogramFig. 3.parseIR(SSA)generationCprogramFlow of translation process.under conversion, consists of a list of Sections and a SymbolTable. The SymbolTable keeps track of all the variable names,function names, label names, etc. in the program. The Sectionrepresents a section in the program, a group of instructions anddata placed in contiguous storage locations, which has a listof Functions. The Function represents a subroutine and has alist of Ifs, Loops, and BasicBlocks. The If and Loop representif and do-while structures having then/else parts and a bodypart, respectively. The BasicBlock represents a basic blockconsisting of a list of Operations and links to its next basicblocks. The Operation represents an atomic operation, such asarithmetic/logical, load/store, string and decimal operations.Fig. 5 shows some examples of conversion from instructions to IR. A load instruction (L) is decomposed into a32bit addition (addu32) to calculate the address and a 4-bytememory access operation (load32). String and decimal datamanipulation to implement instructions like AP (add packeddecimal) and CLC (compare logical character) are dealt with1059

reachable to b11reachable from tbodyt9t31034456511b12128Fig. 4.assembly AP DAT1,DAT2 5(3)CLC 4(2,4), C’01’ IRFig. 6.addr addu32(r13, 4)r11 load32(addr)cc addpack(DAT1, 8, DAT2, 8)addr addu32(r4, 4)cc compstr(addr, C’01’, 2)Recognition of loop body.10fConversion from instructions to IR.4as atomic operations.85Fig. 7.1)C. Recognition of Control Structures1) Recognition of Functions: Firstly, all the entry pointsin the IR are identified. The entry points are either 1) thebasic blocks starting with nop operations converted fromENTRY instructions, or 2) the basic blocks which are thetargets of jump operations converted from BAL, BALR, andBAS instructions. A function is extracted by enumerating allthe basic blocks reachable from each entry point. In handwritten assembly codes, there are cases where a basic blockis reachable from multiple entry points. For simplicity, suchsituation is averted by cloning the basic blocks.2)2) Recognition of Loops: Do-while loops are recognizedaccording to the following steps.2)3)4)5)120311612713reachable ffrom tt4142reachablefrom e3e 1161271314jThe lists of the operations translated from the instructionsare converted to the SSA form. This normalizes differentassembly code sequences of the same meaning to the same IR.The SSA form is also useful in unnecessary code elimination(in III-D) and transformation by pattern matching (in III-F).After the SSA conversion, dataflow analysis is performed anddefinition/reference relation is stored into the IR.1)8Structure of IR.L 11,4(,13)Fig. 5.910611bloopEnumerate all the loops by traversing basic blocksfrom the starting point of recognition (which is theentry point of the function in the first iteration).Choose a loop, whose bottom basic block b is thefarthest from the starting point and the top basicblock t is the nearest to the starting point. This is torecognize the outermost loop first and to avoid jumpinto the loop.Determine set B of the basic blocks that form thebody of the loop, which are reachable from t andreachable to b (as shown in Fig. 6).Extract the continuation condition of the loop fromthe branch condition of the bottom basic block b, andconstruct the data structure of the loop.Apply this process to the other basic blocks in thefunction and the body of the loop until no loop isdetected.3)4)89105910Choice of branch and join nodes.Choose a set of branch point f and join point j.By traversing basic blocks from the starting point ofrecognition (the entry point of the function in the firstiteration), select the basic block f that has branchand is nearest to the starting point. Let t and e bethe next basic blocks of f . Then, identify the joiningbasic block j that are reachable from both t and e andyet nearest to f (see Fig. 7). This is to avoid jumpinto then and else parts of the if statement. Note thatthere are cases where there is no joining point j.Identify the sets of basic blocks T and E that belongto then part and else part, respectively, of the ifstatement. Let T be the set of the basic blocksreachable from t. Let X be the set of basic blocks inT that are directly reachable from the basic blocksin T , and X be the set of the basic blocks reachablefrom X. Then T is defined as T X (see Fig. 8).E is computed in the same way.Extract the branch condition from the basic block fand construct the data structure of the if statement.Apply this process to the other basic blocks in thefunction and the then part and else part of the ifstatement, until no branch is detected.D. Elimination of Register Save/Restore and Dead CodesBased on the result of the data flow analysis, codes forregister save and restore of each function, which are no morenecessary in C programs, are deleted. Store operation s of32bit data is judged as save codes and is removed from the IRif 1) the address operand of s is obtained by adding offset of0, 4, · · ·, or 68 to register 13, and 2) no operation defines thedata operand of s. Similarly, 32bit load operation l is regardedas restore codes and is eliminated if 1) the address operand ofl is obtained by adding offset of 0, 4, · · ·, or 68 to register 13,and 2) no operation uses the result of l.3) Recognition of Conditionals: If statements are recognized in the following steps.1060

C10reachable ffrom tt41203e 1161278513Tt414285reachablefrom ee 116127j910f3130Tt4142Ee 116127j910f38513C14j910Recognition of then and else parts.IRDAT1A DSB DS・・・MAINCr10 add32(r10,r7) r10 r10 r7;cc addpack(D1,8,D2,8) cc addpack(D1,8,D2,8);cc compstr(s,C’01’,2) cc compstr((char*)s, "01",2);Fig. 9.C’r1 1 mul32(r1,X)r2 r1*X;r2 1 load32(r1 1)r1 (r2 Y)*z; r1 2 add32(r1 1,Y)r2 r2 Tr1 3 mul32(r1 2,Z)r2 2 add32(r2 1,T)Fig. 10.Fig. 8.C’r1 1 mul32(r1,X)r1 ((r1*X) Y)*Z; r1 2 add32(r1 1,Y)r1 3 mul32(r1 2,Z)1IR to C Conversion.Generation of expression with multiple operations.DSECTF4Ctypedef struct {int A;char B[4];} DAT1 t;CSECT USING DAT1,2MVIA,123MVCB, C’abcd’・・・Dead codes, which are often created during long years’maintenance, are eliminated during the function recognitionprocess (in III-C1).Fig. 11.int main(void){((DAT1 t*) r2)- A 123;strncpy( ((DAT1 t*) r2)- B,"abcd", 4);・・・DSECT ConversionIV.D OCUMENTATIONDue to limitation of static translation and to architecturalissues, C programs generated from assembly programs may notalways work. In such cases, the C programs must be inspectedand modified manually. Even after the new C programs runsuccessfully, they must be altered for maintenance. In such asituation, understanding of the code must be important. Thispaper proposes (1) to generate a table to show correspondencebetween the original and the resulting codes, and (2) toembed comments in the original assembly program into properpositions of the translated C program.E. IR to C Conversion and Handling of Dummy SectionsThe IR is almost straightforwardly converted into a Cprogram. The registers and variables in the original assemblyprogram are treated as global variables in the C program.Arithmetic/logic operations and load/store operations are alsoconverted into the corresponding operations in the C program.Operations on strings and decimals are translated into functioncalls to the the support library. Fig. 9 shows examples ofthe conversion, where r10 and r7 are registers, cc is thecondition code, addpack and compstr are support librarycalls for addition on packed decimals and comparison onstrings, respectively.A. Correspondence between Instructions and StatementsAlong with a working C programs for a given assemblyprogram, a table to show the correspondence between originalinstructions and resulting C code fragments is generated in anHTML file. Fig. 13(a) shows an example. The STM instructionin line 113 has no corresponding C code because it is recognized as a save code and deleted. The L instruction in line 114has been translated into the two C statements. The resultingC statements for the AH and SH instructions in lines 115–116are grouped because their relation is many-to-many. Note thatthe rows of the table is based on the order of the statements inthe C program, so the assembly instructions are reordered as123, 124, 127, 125, 126, and 128 as the result of if statementrecognition.In order to enhance readability, arithmetic operations arecollected into a single expression whenever possible. Fig. 10shows examples. If the results of all the operations are referenced only once, they are grouped into a single expression.A dummy section of the IBM assembly, initiated by DSECTinstruction, is a section that results in no machine instructionnor data area but is used to specify the layout of the aggregatedata passed between subroutines. In our method, the memorylayout described by DSECT is converted to definition of thecorresponding struct type and the data are accessed using theregister variable designated by USING instruction as a basepointer. Fig. 11 shows an example. The DSECT instructiondefines data layout with two items A and B, and the MAINroutine accesses the data area pointed by register 2 according tothe layout. The dummy section is expressed by type DAT1 tand the data are accessed using r2 as a base pointer.The table is generated by referencing the back pointers (depicted in dashed arrows in Fig. 13(b)) from the IR operationsto the assembly instructions. If there is no IR operation for1 a virtual IR operation NOPan assembly instruction (as ),is generated. When one or more IR operations have links to1 and ),2 a table entry isa single assembly instruction (as created after all the IR operations for the assembly instructionare translated to C statements. If multiple IR operations aregrouped to form a single C statement, all the instructions linked3to the operations are put into an entry of the table (as ).F. Readability Improvement by Pattern MatchingIn our method, further readability improvement is attempted by pattern matching based transformation. This is realized by defining rewriting rules on tree structures consistingof IR operations. Fig. 12(a) defines a tree rewriting rule tomake a conditional statement based on string comparison morereadable. IR in Fig. 12(b) is converted into IR’ by the rule,resulting in program C’ which should be better than C.B. Embedding Assembly Comments to CA comment of the IBM assembly are either 1) a stringplaced in the same line of an instruction at the right of the1061

C’MXD’ r3 30:1:2:3:4:5:6:7:8:9:C’MXD’ r3 3strncmpcompstrE0MXD NZMXDnextnext(a) Rewriting rule.STM 14,12,12(13)*calculate*(modified 2005.10.3)SH 0,0(,2) SUBAH 0,0(,3) ADD*end calculateEJECTL 15,CALLADR LOAD*call BDISPLAYBALR 14,15 CALL.upper comments of 3: SHright comment of 3: SHright comment of 4: AHlower comment of 4: AHASMCLCBE C’MXD’,0(R3)MXDFig. 14.IRcc compstr(C’MXD’, r3, 3)BB.next {condv cc, cond E, then MXD}Assembly.STM 14,12,12(13)*calculate*(modified 2005.10.3)SH 0,0(,2) SUBAH 0,0(,3) ADD*end calculateEJECTL 15,CALLADR LOAD*call BDISPLAYBALR 14,15 CALL.tmp0 strncmp(C’MXD’, r3, 3)tmp1 equ32(tmp0,0)BB.next {condv tmp1, cond NZ, then MXD}0:1:2:3:4:5:6:7:8:9:Ccc compstr("MXD", (char*) r3, 3);if ( cc & cc E )goto MXD;C’if (strncmp("MXD", (char*) r3, 3) 0)goto MXD;(b) Application of the ruleFig. 12.Readability improvement by pattern matching.Fig. 15.2)3)(a) Correspondence table (HTML)Generation of correspondence table.1)operands of the instruction, or 2) a string in a line startingfrom character ’*’. In our method, assembly comments arelined to instructions, and are embedded into the resulting Cprogram as their corresponding C statements are generated.2)Assembly comments are classified into the following 3types and are linked to instructions. Two instructions END(marking the end of the program) and EJECT (forcing a pagebreak in source code listing) plays different role in commentclassification than the other instructions, so in this paper,they are referred to as delimiter instructions and the otherinstructions as normal instructions.1)C.0: //*calculate1: //*(modified 2005.10.3)2: uint32 t addr0 r2;// SUB3: int16 t tmpvar0 *(int16 t*)addr0;4: uint32 t addr1 r3;// ADD5: int16 t tmpvar1 *(int16 t*)addr1;6: r0 r0-(int32 t)tmpvar0 (int32 t)tmpvar1;7: //*end calculate8: //*call BDISPLAY9: BDISPLAY( (void**) r1 ); // LOAD10:// CALL.Comment embedding.previous instruction of i are defined as the uppercomments of i. In Fig. 14, for example, lines 1–2are the upper comments of the SH instruction in line3.Right commentsA comment placed at the right of normal instructioni is defined as the right comment of i. In Fig. 14,string SUB in line 3 is the right comment of the SHinstruction.Lower commentsThe lower comments of normal instruction i existonly when i’s next instruction is a delimiter instruction. Let k be the next normal instruction ofi, and j be the previous (delimiter) instruction ofk. Then comments between i and j are defined aslower comments of i. In Fig. 14, line 5 is the lowercomment of the AH instruction in line 4.Based on the above classification, the positions of thecomments in the C program are determined as follows.(b) Establishing the correspondenceFig. 13.Classification of comments.IR’3)The upper comments of instruction i are placed abovethe first C statement generated from i. In Fig. 15, forexample, since SH instruction in line 3 is expanded tostatements in lines 2, 3, and 6, the upper commentsof SH (assembly lines 1–2) are embedded into lines0–1 of the C program.The right comment of instruction i is placed at theright of the first C statement generated from i, as //SUB in line 2 of the C program.The lower comments of instruction i are placed belowthe last C statement generated from i. The assemblycomments in line 5 goes to line 7 of the C program.Comment generation according to this policy is implemented by making use of the back pointer from the IRoperations to the assembly instructions.Upper commentsComments between normal instruction i and the1062

:99:100:A migration system based on the proposed method has beenimplemented in Perl 5. It operates on Ubuntu 14.04LTS, MacOSX 10.10, and Cygwin on Windows. Currently, translationof 86 instructions out of 631 has been supported.Fig. 16(a) is the result of conversion from the assemblyprogram in Fig. 2. Subroutine EOM (in lines 16-48) of theassembly was converted to the function in lines 30–75. Thecode is structured with a do-while loop (in lines 34–46) andif statements (in lines 36–45, 48–71, and 51–70). In line 54,multiple operations are grouped into a single statement. Patternmatching based transformation proposed in III-F was appliedto the string comparison in lines 36, 39, 48, and 51. Thebehavior of the converted program was confirmed by the driverprogram in Fig. 16(b).There are some limitations in our current implementation.The resulting C programs do not run properly on 64bitmachines, for the memory layout of the original IBM assemblyis based on 32bit architecture. The current system does notsupport EX instruction which modifies other instructions.VI.C ONCLUSIONThis paper has proposed a method of converting IBMassembly programs to C programs, and at the same timegenerating a document of conversion and embedding assemblycomments into C programs. Future work includes supportof the remaining instructions, improvement of readability byadding tree rewriting rules, and migration to 64bit architecture.ACKNOWLEDGMENTThe authors would like to thank Kenji Okamoto of SYSTEM’S Co., Ltd. for valuable advice for this research. Wewould also like to thank all the members of Ishiura Lab.of Kwansei Gakuin University for their help developing theprototype systems. This work was partly supported by Smalland Medium Enterprise Agency’s “Services for reformingSMEs and micro-businesses, manufacturing, commerce andservices” of fiscal 2013.R EFERENCES[1]T. Ogawara: “Information processing apparatus, information processingmethod, and program,” Japanese patent, 2014–215938 (Nov. 2014).[2] C. Cifuentes, M. Van Emmerik, D. Ung, D. Simon, and T. Waddington:“Preliminary experiences with the use of the UQBT binary translationframework,” in Proc. Workshop on Binary Translation, pp.12–22 (Oct.1999).[3] M. Van Emmerik: Static single assignment for decompilation, PhDThesis, University of Queensland (2007).[4] G. Chen, Z. Wang, R. Zhang, K. Zhou, S. Huang, K. Ni, Z. Qi, K.Chen, and H. Guan: “A refined decompiler to generate C code withhigh readability,” in Proc. Working Conference on Reverse Engineering,pp.150–154 (Oct. 2010).[5] Y. A. Feldman: “Portability by automatic translation: A large-scale casestudy,” in Proc. Knowledge-Based Software Engineering Conference,pp.123–130 (Nov. 1995).[6] M. P. Ward: “Assembler to C migration using the FermaT transformationsystem,” in Proc. IEEE International Conference on Software Maintenance 1999 (ICSM 99), pp. 67–76 (Aug.–Sept. 1999).[7] Martin Ward: “Assembler restructuring in FermaT,” in Proc. IEEE International Working Conference on Source Code Analysis and Manipulation(SCAM 2013), pp. 147–156 (Sept. icstaticstaticstatic stdio.h stdlib.h string.h stdint.h "miglib.h"uint32 t r0, r1, r2, r3, r4, r5, r6, r7;uint32 t r8, r9, r10, r11, r12, r13, r14, r15;int cc;char savearea[72];int16 t MLAST[] {0, 0,31, 31,59, 60,90, 91,120, 121,151, 152,181, 182,212, 213,243, 244,273, 274,304, 305,334, 335,365, 366};uint32 t SAVE[18];uint64 t WORK;void EOMfunc(void){strncpy( (char*) r4, "", 8 );r0 6;r1 r3;do {EOM1: ;if ( strncmp( (char*) r1, "0", 1 ) 0 ) { goto RET; }else {BB1: ;if ( strncmp( (char*) r1, "9", 1 ) 0 ) { goto RET; }else {BB2: ;r1 r1 1;r0 r0 - 1;}}} while ( r0 );BB3: ;if ( strncmp( (char*) r3 4, "01", 2 ) 0 ) { goto RET; }else {BB4: ;if ( strncmp( (char*) r3 4, "12", 2 ) 0 ) { goto RET; }else {BB5: ;zone to pack( (char*) &WORK, 8, (char*) r3 4, 2 );r2 pack to int32( (char*) &WORK, 8) - 1;BB6: ;r2 (r2 * 4) (uint32 t) MLAST;uint32 t addr27 r2 4;r0 * (uint16 t* ) addr27;int16 t tmpvar29 * (int16 t*) r2;r0 r0 - (int32 t) tmpvar29;int32 to pack( (char*) &WORK, 8, r0);pack to zone( (char*) r4 6, 2, (char*) &WORK, 8 );uint32 t addr31 r4 7;unsigned char tmp32 * (unsigned char *) addr31;unsigned char tmp33 tmp32 & 0x0f;unsigned char tmp34 tmp33 0x30;* (unsigned char*) addr31 tmp34;strncpy( (char*) r4, (char*) r3, 6 );}}RET: ;r15 r15 - r15;return;}int32 t EMONTH( void** param ){r1 (uint32 t) param;r13 (uint32 t) savearea;r14 (uint32 t) &&RETURN;EMONTHbb: ;r12 r15;r15 (uint32 t) SAVE;uint32 t addr15 r15 4;* (uint32 t*) addr15 r13;uint32 t addr16 r13 8;* (uint32 t*) addr16 r15;r13 r15;r3 * (uint32 t* ) r1;r4 * (uint32 t* ) (r1 4);if ( strncmp((char*) r4, "00000000", 6 ) ) {ER FMT: ;abend( 999 );}else {FC EOM: ;EOMfunc();}RETURN: ;return r15;}(a) Generated C program from the assembly of Fig. 21:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:#include stdio.h #include stdint.h #include string.h int32 t EMONTH(void **param);int main(void){char gdate[] "20141215";char result[] "00000000";void* param[2] {gdate, result};int32 t rc EMONTH(param);printf("rc %d, result \"%s\"\n", rc, result);return 0;}(b) Test driverFig. 16.1063Resulting C program.

thousands of assembly codes, which needs enormous man-hours for migration. To solve this problem, there have been several attempts on automated conversion of assembly codes to high-level codes. Literatures [5], [6], [7] have presented methods of translating IBM mainframe ass