MIPS Assembly Instructions Parser
For this assignment, you will implement a C function that parses a restricted subset of MIPS assembly instructions and prints out information relevant to translating those instructions into machine code. In particular, you will be concerned with MIPS assembly instructions, expressed in one of the following forms:
R-format: mnemonic reg1, reg2, reg3 (mul, sub, add)
I-format: mnemonic reg1, reg2, immediate (addi, beq)
mnemonic reg1, immediate (lui)
mnemonic reg1, offset(reg2) (lw, sw)
where mnemonic is one of the following MIPS32 mnemonics:
add addi mul beq lui lw sw sub
and reg1, reg2 and reg3 are each one of the following MIPS registers:
$t0, $t1, $t2, $t3, $s0, $s1, $s2, $s3
and immediate and offset are integers in the range -32768 to 32767. The elements of each instruction are separated by whitespace and commas, as shown. Whitespace can be any mixture of spaces and tab characters.
The instructions you are concerned with in this assignment can be classified as shown above; but lui, lw, sw break the ideal format rules the MIPS designers proposed for the instruction set architecture. You should consult the MIPS32 Architecture Volume 2: the MIPS32 Instruction Set, which is available on the course resources page
on Canvas for details of the machine language representations of these instructions. They can be viewed as illustrating the notion that a good design often requires (hopefully good) compromises from the pure design goals you might start with.
In order to decide how to interpret an assembly instruction, we would need to know exactly which assembly instruction we are dealing with, since different assembly instructions follow different patterns. For example, if the mnemonic is add, we have an R-format machine instruction, and we know that the instruction has three parameters
that are all register names.
How do we know these things? From consulting the available MIPS32 references, chiefly the MIPS32 Architecture Volume 2: the MIPS32 Instruction Set, which is available on the course resources page on Canvas. From there, we find that add is an R-format instruction, and that the add assembly instruction
always has the form:
add $rd, $rs, $rt
We also find that executing this instruction results in the assignment: $rd =$rs + $rt. Moreover, we find that this is expressed in binary machine format as:
R
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
15 14 13 12 11
10 9 8 7 6
5 4 3 2 1 0
0 0 0 0 0 0 rs rt rd 0 0 0 0 0 1 0 0 0 0 0
We also find, from other MIPS32 references, how the symbolic register names map to integer register numbers, so if we are given a specific instance of the add assembly instruction, we can determine all the components of the binary
representation.
A later assignment will require you to implement a C program that translates complete MIPS32 assembly programs
into machine code. For now, we will focus on the narrower problem of determining the pieces that make up the representations of a small selection of assembly instructions.
This assignment is, in some ways, a warm-up for the assembler project. Therefore, if you give careful thought to your design, you can produce lots of C code that can be plugged into the assembler. And, if you choose to do this in minimalist fashion, you'll gain little or nothing towards implementing the assembler.
Given one of the MIPS32 assembly instructions mentioned earlier, you will create a C struct variable that contains
information relevant to the specific assembly instruction and its representation in machine code. We will use the following user-defined C type to represent your analysis of the given instruction:
/** Represents the possible field values for a MIPS32 machine instruction.
*
* A ParseResult object is said to be proper iff:
*
*/
struct _ParseResult {
// Each char* member will be NULL or point to dynamically-allocated char array
// holding a zero-terminated C string.
// The assembly code portion
char* ASMInstruction; // the assembly instruction, as a C-string
char* Mnemonic; // the symbolic name of the instruction
char* rdName; // the symbolic names of the registers, as C-strings;
char* rsName; // NULL if the register field is not specified
char* rtName; // in the assembly instruction
// The following are integer values
int16_t Imm; // the immediate field, as a signed integer;
// 0 if not relevant for the assembly instruction
uint8_t rd; // the three register fields, as small unsigned integers;
uint8_t rs; // 255 if not relevant for the assembly instruction
uint8_t rt;
};
This type includes every possible component related to any of the assembly instructions in which we are interested.
However, no particular MIPS32 assembly instruction actually has all of the possible components defined in this type, so we will stipulate that are unused will be set to default values, given in the comments above.
For example, suppose you have the assembly instruction: add $s3, $t1, $t0
The corresponding ParseResult object should contain the following information, parsed directly from the given
instruction:
ASMInstruction --> "add $s3, $t1, $t0"
Mnemonic --> "add"
rdName --> "$s3"
rsName --> "$t1"
rtName --> "$t0"
The following information can be obtained from properly-constructed static lookup tables:
rd == 19
rs == 9
rt == 8
Opcode --> "000000"
Funct --> "100000"
Imm == 0
IMM == NULL
Given the assembly instruction "addi $t0, $s2, -42" , we find it has the form:
addi $rt, $rs, immediate
Coding Requirements
You will implement the following C function:
ParseResult* parseASM(const char* const pASM);
The stated precondition will be satisfied whenever the testing code calls your implementation. Your implementation must satisfy the stated return specification and must not violate const or create memory violations of any kind.
You are required1 to implement static lookup tables and use them to determine instruction opcodes from mnemonics, and to map register numbers to register names. See the discussion of static lookup tables later in this specification.
void clearResult(ParseResult* const pPR);
The test harness will call clearResult() at appropriate times during testing. If you have correctly implemented the
function, and otherwise coded your solution correctly, tests run on valgrind will not indicate any memory leaks.
We will require1 your solution to achieve a "clean" run on valgrind. See the discussion of Valgrind below.
Finally, this is not a requirement, but you are strongly advised to use calloc() when you allocate dynamically, rather
than malloc(). This will guarantee your dynamically-allocated memory is zeroed when it's allocated, and that may help prevent certain errors.
Static Lookup Tables in C
Consider implementing a program that will organize and support searches of a fixed collection of data records. For
example, if the data records involve geographic features, we might employ a struct type:
// GData.h
enum _FeatureType {CITY, RIVER, MOUNTAIN, BUILDING, . . . , ISLAND};
typedef enum _FeatureType FeatureType;
...
struct _GData {
char* Name;
char* State;
...
FeatureType FType;
uint16_t Elevation;
};
typedef struct _GData GData;
...
We might then initialize an array of GData objects by taking advantage of the ability to initialize struct variables at the
point they are declared:
// GData.c
#define NUMRECORDS 50
static GData GISTable[NUMRECORDS] = {
{"New York", "NY", ..., CITY, 33},
{"Pikes Peak", "CO", ..., MOUNTAIN, 14115},
...
{"McBryde Hall", "VA", ..., BUILDING, 2080}
};
Memory Management Requirements and Valgrind
Valgrind is a tool for detecting certain memory-related errors, including out of bounds accessed to dynamically-allocated
arrays and memory leaks (failure to deallocate memory that was allocated dynamically). A short introduction to Valgrind is
posted on the Resources page, and an extensive manual is available at the Valgrind project site (www.valgrind.org).
For best results, you should compile your C program with a debugging switch (-g or –ggdb3); this allows Valgrind to
provide more precise information about the sources of errors it detects. For example, we ran our solution for this project, with
one of the test cases, on Valgrind:
[dsn@centosvm parseMI]$ valgrind --leak-check=full --show-leak-kinds=all --log-file=vlog.txt
--track-origins=yes -v driver instr.txt parse.txt –rand
And, we got good news... there were no detected memory-related issues with my code:
==10669== Memcheck, a memory error detector
==10669== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==10669== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==10669== Command: driver instr.txt parse.txt -rand
==10669==
==10669== HEAP SUMMARY:
==10669== in use at exit: 0 bytes in 0 blocks
==10669== total heap usage: 275 allocs, 275 frees, 6,904 bytes allocated
==10669==
==10669== All heap blocks were freed -- no leaks are possible
==10669==
==10669== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
==10669==
==10669== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
And, we got good news... there were no detected memory-related issues with my code. That's the sort of result you want
to see when you try your solution with Valgrind.
On the other hand, here’s what we got from a student submission:
==8596== Memcheck, a memory error detector
==8596== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8596== Using Valgrind-3.14.0-353a3587bb-20181007X and LibVEX; rerun with -h for copyright info
==8596== Command: c02driver tests.txt results.txt -rand
==8596== Parent PID: 8595
Testing and Grading
Download the posted tar file, c02Files.tar from the course website and unpack it on a CentOS 8 system. You should
receive the following files, organized in two subdirectories:
README - usage instructions
dev:
c02driver.c - test driver
ASMParser.h - "public" interface for parser; do not modify!
ASMParser.c - implement the functions here, as needed
ParseResult.h - "public" interface for results type; do not modify!
ParseResult.c - implement the functions here, as needed
Grader.h - declaration for grading function
Grader.o - 64-bit CentOS binary for grading function
Generate.h - declaration for test data generator
Generate.o - 64-bit CentOS binary for test data generator
grading:
gradeC02.sh - this script file
c02Grader.tar - grading code, including:
c02driver.c - test driver
ASMParser.h - supplied C header file
ParseResult.h - supplied C header file
Grader.h - declaration for grading function
Grader.o - 64-bit CentOS binary for grading function
What to submit
You will submit an uncompressed tar file containing your completed C implementation files (ASMParser.c and
ParseResult.c and any supporting .c and .h files you have written), and nothing else. Do not include any of the
supplied files from C02Grader.tar (see above), object files, or executable files.
We will grade your submission with the posted test/grading harness, and we will make no allowances for submissions that do not operate correctly with that. Be warned: we will use the original, posted versions of the header and .o files in grading your submission, so you may encounter problems if you’ve modified any of those.
Make your submission to Canvas. Late submissions will be assessed a penalty of 10% per diem.