Assignment 2: Sheepy
Aims
This assignment aims to give you
- practice in Python programming generally
- experience in translating between complex formats with Python
- clarify your understanding of Shell syntax & semantics
- introduce you to Python syntax & semantics
Introduction
Your task in this assignment is to write a POSIX Shell Transpiler.
Generally, compilers take a high-level language as input and output assembler, which can then can be directly executed.
A Transpiler (or Source-to-Source Compiler) takes a high-level language as input and outputs a different high-level language.
Your transpiler will take Shell scripts as input and output Python.
Such a translation is useful because programmers sometimes convert Shell scripts to Python.
Most commonly this is done because extra functionality is needed, e.g. a GUI.
And this functionality is much easier to implement in Python.
Your task in this assignment is to automate this conversion.
You must write a Python program that takes as input a Shell script and outputs an equivalent Python program.
The translation of some POSIX Shell code to Python is straightforward.
The translation of other Shell code is difficult or infeasible.
So your program will not be able to translate all Shell code to Python.
But a tool that performs only a partial translation of shell to Python could still be very useful.
You should assume the Python code output by your program will be subsequently read and modified by humans.
In other words, you have to output readable Python code.
For example, you should aim to preserve variable names and comments.
Your compiler must be written in Python.
You must call your Python program sheepy.py
.
It will be given a single argument, the path to a Shell script as its first command line argument.
It should output, to standard output, the equivalent Python code.
For example:
cat gcc.sh #!/bin/dash for c_file in *.c do gcc -c $c_file done ./sheepy.py gcc.sh #!/usr/bin/python3 -u import glob, subprocess for c_file in sorted(glob.glob("*.c")): subprocess.run(["gcc", "-c", c_file])
If you look carefully at the example above you will notice the Python code does not have exactly the same semantics as the shell code.
If there are no .c
files in the current directory the for loop in the shell program executes once and tries to compile a non-existent file named *.c
whereas the Python for loop does not execute.
And if the file name contains spaces the shell code will pass it as multiple arguments to gcc but the Python code will pass it as a single argument - in other words the shell breaks but the Python works.
This is a general issue with translating Shell to Python.
In many cases, the natural translation of the shell code will have slightly different semantics.
For some purposes, it might be desirable to produce more complex Python code that matches the semantics exactly.
For example:
#!/usr/bin/python3 -u import glob, subprocess if glob.glob("*.c"): for c_file in sorted(glob.glob("*.c")): subprocess.run(["gcc", "-c"] + c_file.split()) else: subprocess.run(["gcc", "-c", "*.c"])
This is not desirable for our purposes.
Our goal is to produce the clearest most human-readable code so the first (simpler) translation is more desirable.
Subsets
The shell features you need to implement is described below as a series of subsets.
It suggested you tackle the subset in the order listed but this is not required.
Subset 0
echo
The echo
builtin is used to output a string to stdout.
For example:
#!/bin/dash echo hello world echo 42 is the meaning of life, the universe, and everything echo To be or not to be: that is the question
=
The =
operator is used to assign a value to a variable.
For example:
#!/bin/dash x=1 y=2 foo=hello bar=world course_code=COMP2041 AssignmentName=Sheepy
$
The $
operator is used to access the value of a variable.
For example:
#!/bin/dash theAnswer=42 echo The meaning of life, the universe, and everything is $theAnswer name=COMP2041 echo I hope you are enjoying $name this semester H=Hello W=World echo $H, $W P1=race P2=car palindrome=$P1$P2 echo $palindrome
#
The #
operator is used to start a comment.
For example:
#!/bin/dash # This is a comment echo hello world # This is also a comment
Subset 1
globbing
The *
, ?
, [
, and ]
characters are used in globbing.
For example:
#!/bin/dash echo * C_files=*.[ch] echo all of the single letter Python files are: ?.py
for
The for
, do
, and done
keywords are used to start and end for loops.
For example:
#!/bin/dash for i in 1 2 3 do echo $i done for word in this is a string do echo $word done for file in *.c do echo $file done
exit
The exit
builtin is used to exit the shell.
For example:
#!/bin/dash echo hello world exit echo this will not be printed exit 0 echo this will double not be printed exit 3
cd
The cd
builtin is used to change the current working directory.
For example:
#!/bin/dash echo * cd /tmp echo * cd .. echo *
read
The read
builtin is used to read a line from stdin.
For example:
#!/bin/dash echo What is your name: read name echo What is your quest: read quest echo What is your favourite colour: read colour echo What is the airspeed velocity of an unladen swallow: read velocity echo Hello $name, my favourite colour is $colour too.
External Commands
Any line that is not a known builtin, or keyword, or other shell syntax should be treated as an external command.
For example:
#!/bin/dash touch test_file.txt ls -l test_file.txt for course in COMP1511 COMP1521 COMP2511 COMP2521 # keyword do # keyword echo $course # builtin mkdir $course # external command chmod 700 $course # external command done # keyword
Subset 2
Command Line Arguments
The $0
, $1
, $2
, etc. variables are used to access the command line arguments.
For example:
#!/bin/dash echo This program is: $0 file_name=$2 number_of_lines=$5 echo going to print the first $number_of_lines lines of $file_name
${}
The ${}
operator is used to access the value of a variable.
For example:
#!/bin/dash string=BAR echo FOO${string}BAZ
test
The test
builtin is used to test a condition.
if
The if
, then
, elif
, else
, and fi
keywords are used to start and end if statements.
For example:
#!/bin/dash if test -w /dev/null then echo /dev/null is writeable fi
while
The while
, do
, and done
keywords are used to start and end while loops.
For example:
#!/bin/dash row=1 while test $row != 11111111111 do echo $row row=1$row done
Single Quotes
The '
character is used to start and end a single-quoted string.
For example:
#!/bin/dash echo 'hello world' echo 'This is not a $variable' echo 'This is not a glob *.sh'
Subset 3
Double Quotes
The "
character is used to start and end a double-quoted string.
For example:
#!/bin/dash echo "hello world" echo "This is sill a $variable" echo "This is not a glob *.sh"
backticks
A command substitution can be started and ended with a `
character (backtick).
For example:
#!/bin/dash date=`date +%Y-%m-%d` echo Hello `whoami`, today is $date echo "command substitution still works in double quotes: `hostname`" echo 'command substitution does not work in single quotes: `not a command`'
echo -n
The -n flag for echo
tells it not to print a newline at the end of the output
For example:
#!/bin/dash echo -n "How many? " read n
number of command line arguments
The $#
variable is used to access the number of command line arguments.
For example:
#!/bin/dash echo I have $# arguments
command line argument lists
The $@
variable is used to access all the command line arguments.
For example:
#!/bin/dash echo "My arguments are $@"
Subset 4
case
The case
, in
, )
, esac
, and ;;
keywords are used to start and end case statements.
For example:
#!/bin/dash case $# in 0) echo no arguments ;; 1) echo one argument ;; *) echo more than one argument ;; esac
$()
The $()
operator is used for command substitution.
The $()
operator is the same as the `
operator.
Except that The $()
operator may be nested.
For example:
#!/bin/dash date=$(date +%Y-%m-%d) echo Hello $(whoami), today is $date echo "command substitution still works in double quotes: $(hostname)" echo 'command substitution does not work in single quotes: $(not a command)' echo "The groups I am part of are $(groups $(whoami))"
$(())
The $(())
operator is used to evaluate an arithmetic expression For example:
#!/bin/dash x=6 y=7 echo $((x + y))
<, >, and >>
The <
, >
, and >>
operators are used to redirect stdin and stdout respectively.
For example:
#!/bin/dash echo hello >file echo world >> file cat <file
&& and ||
The &&
and ||
operators are used to perform boolean logic.
For example:
#!/bin/dash test -w /dev/null && echo /dev/null is writeable test -x /dev/null || echo /dev/null is not executable
if/while conditions
In this subset, if
and while
conditions can now be any external command, or multiple commands joined by &&
or ||
.
For example:
#!/bin/dash if test -w /dev/null && test -x /dev/null then echo /dev/null is writeable and executable fi if grep -Eq $(whoami) enrolments.tsv then echo I am enrolled in COMP2041/9044 fi
Examples
Some examples of shell code and possible translations are available as a table or a zip file
These examples should provide most the information you need to tackle subsets 0 & 1.
Translating subsets 2-4 will require you to discover information from online or other resources.
This is a deliberately part of the assignment.
The Python you output can and probably will be different to the examples you've been given.
So there is no way to directly test if your Python output is correct.
But the Python you output when run has to behave the same as the input shell script it was generated from.
So a good check of any translation is to execute the Shell and the Python and then use diff
to check that their output is identical.
Assumptions/Clarifications
Like all good programmers, you should make as few assumptions about your input as possible.You can assume the code you are given is Shell which works with the version of on CSE systems (essentially POSIX compatible).
Other shells such as Bash contain many other features. If these features and not present in /bin/dash
on CSE machines you do not have to handle these
You do not need to implement keywords & builtins not listed above, for example the pipe operator ('|') does not appear above so you do not need to translate pipelines.
You should implement the keywords & builtins listed above directly in Python, you cannot execute them indirectly via subprocess
or other Python modules. For example, this is not an acceptable translation.
subprocess.call("for c_file in *.c; do gcc -c $c_file; done", shell=True)
The only shell builtins which you must translate directly into Python are:
exit read cd test echoThe builtins (
exit read cd
) must be translated into Python to work. For example this Python code does not work:subprocess.call(['exit'])The last 2 (
test echo
) can be, and often are also implemented by stand-alone programs. So, for example, this Python code will work:subprocess.call(['echo','hello','world'])Doing this will receive no marks, instead of using
subprocess.call
you should translate uses of test
, and echo
directly to Python, e.g.:print "hello world"
The only Shell builtin option you need to handle is echo's -n option.
You do not need to handle other echo options such as -e
.
You do not need to handle options for other builtins.
For example, you do not need to handle the various (rarely-used) read options.
Dash has many special variables.
You need only handle a few of these, which indicate the shell script's arguments.
These special variables need be translated:
$# $@ $0 $1 $2 $3 ...
You assume the shell scripts you are given execute correctly on a CSE lab machine.
Your program does not have to detect errors in the shell script it is given.
You should assume as little as possible about the formatting of the shell script you are given but most of the evaluation of your program will be on sensibly formatted shell scripts. Copying the indenting will mostly but not always give you legal Python.
You should transfer any comments in the shell code. With some approaches it can be difficult to transfer comments in exactly the same position, in this case it is OK if comments are shifted to some degree.
You don't have to preserve white-space and you will not be penalized for example for removing or adding trailing white-space.
If there are shell keywords, e.g. case
, that you cannot translate the preferred behaviour is to include the untranslated shell construct as a comment. Other sensible behaviour is acceptable.
Hints
Get the easiest transformations working first, make simplifying assumptions as needed, and get some simple small shell scripts successfully transformed. Then look at handling more constructs and removing the assumptions.You won't be able to output Python as you generate it e.g. you won't know which import statements are needed to be printed first. Append the Python code to a list as you generate it.
If you want a good mark, you'll need to be careful in your handling of syntax which has no special meaning in shell but does in Python.
The bulk of knowledge about shell syntax & semantics you need to know has been covered in lectures. But if you want to get a high mark, you may need to discover more. Similarly much of the knowledge of Python you need has been covered but if you want to get a high mark you may need to discover more.
Python in sheepy.py
Yoursheepy.py
should work with the default Python on a CSE lab machine.You are only permitted to import these modules in sheepy.py
argparse array atexit bisect collections copy dataclasses datetime decimal enum fileinput fnmatch fractions functools itertools keyword locale math operator os pathlib pprint random re statistics string sys tempfile textwrap time traceback turtle typing unicodedata uuid
You are not permitted to use other modules.
For example, you are not permitted to import shlex
in sheepy.py
You are also not permitted to import subprocess
in sheepy.py
but the Python you generate can import subprocess
.
You can request modules be added to the permitted list in the course forum.
Most of the modules listed above are little or no use for the assignment.
Three modules, os
, re
and sys
are important for the assignment.
Python Translated from Shell
The Pythonsheepy.py
generates from the input Shell script should work with Python on a CSE lab machine.
Any import statements should be at the top of the generated Python to ensure it is readable.
The generated Python should only import modules it uses.
The generated code is permitted only to import these modules.
glob os shutil subprocess sysThe generated Python should not generate warnings from the Python static checker
pyflakes3
. The automarking will run pyflakes3
and if it does generate warnings there may be a small penalty. For example, pyflakes3
will generate a warning if you have an For example unnecessary import statement and there might be a small penalty for this.You are encouraged to generate Python which does not generate warnings from the Python style checker ' pycodestyle' but there will be no penalty if you fail to do so and some warnings, e.g. "line too long", are hard to avoid.
You'll have subtle problems with output ordering unless you use Python's -u
flag in the '#!' line for your shell script. This is easy to do - just copy the first line of the example Python.
The '#!' line in the example generated code assumes /usr/bin/python3
is the pathname of the Python interpreter, which it is on CSE systems and many other places.
If /usr/bin/python3
isn't the the appropriate pathname on your computer try this '#!' line which searches all the directories in $PATH
for the Python interpreter:.
#! /usr/bin/env -S python3 -u
Demo Shell Scripts
You should submit five shell scripts nameddemo00.sh .. demo04.sh
which your program translates correctly (or at least well). These should be realistic shell scripts containing features whose successful translation indicates the performance of your assignment. Your demo scripts don't have to be original, e.g. they might be lecture examples. If they are not original they should be correctly attributed.If you have implemented most of the subsets,r these should be longer shell scripts (20+ lines). They should if possible test many aspects of shell to Python translation.
Test Shell Scripts
You should submit five shell scripts namedtest00.sh .. test04.sh
which each test a single aspect of translation. They should be short scripts containing shell code which is likely to be mis-translated. The test??.sh
scripts do not have to be examples that your program translates successfully.You may share your test examples with your friends but the ones you submit must be your own creation.
The test scripts should show how you've thought about testing carefully. They should be as short as possible (even just a single line).