logo

Practical Computing Advice and Tutorials

Sun: 26 May 2019


Site Content

Programming
&
Development


Technical Knowhow


Command Line Interface


Security

Python: Introduction

You may have read my Programming in C post where I documented a modest project and how I developed it. Although I like C, I have to use it with a Linux distro as I don't have a MS Windows implementation. Python, on the other hand, I can run on my Windows 7 system, so I'm going to learn how to program with Python 3.5.1 and post my learnings in these pages so that others may learn also.

There are a couple of way to interact with Python as, unlike C, it's not a complied language. One can use the 'shell' environment or save a 'script' as a .py file and have that run via the shell. Scripts can be written using any text editor and I'm sure that you'll have one that's your 'go-to', but Python comes with an integrated development environment called 'IDLE', which is very good and I'd encourage you to use.


Hello world!

As with any language, the classic 'Hello world!' routine is the first thing to do. This is very simple with Python, requiring just one line of code.

print ("Hello, world!")

This can be run from IDLE by pressing F5 and first saving it as a .py file or run directly from the shell; the result being exactly the same:

Hello, world!
>>>

So, let's examine what we did. We made a 'function call' to the print function and passed to that function an 'argument', defined by the 'parentheses'. Inside the parentheses we defined an 'Object', in this case a string of characters, surrounding the string with quotes. It matters not if we use single quotes (technically apostrophise) or double quotes, but they have to match at each end. It's also considered to be good practice to be consistent throughout your code. The function then did what it was designed to do and output the Object that we just defined.

When we define an Object, Python assigns that Object a 'Class' based upon how we represent the Object, so that computer memory can be correctly allocated. Because we used quotes, Python assigned a Class type 'str' to our Object and allocated 13 bytes of memory in which to store that string.

We can see this in action by using the 'type' function and the 'len' function.

>>> type ("Hello, world!")
<class 'str'>
>>> len ("Hello, world!")
13
>>>

Realise that in reality what is being stored in computer memory are zeros and ones, so the first byte of memory for our string will contain 01001000 the second byte will contain 01100101 and so on.

Character strings can be joined together, or 'concatenated', by using the '+' symbol, like this...

>>> print ("Hello, " + "world!")

...which can be a handy way to prefix or suffix some predefined text to some arbitrary user input, as with this small script.

name = input ("What is your name? ")
print ("Hello " + name)

Variables

This introduces the topic of 'Variables'. We now have a variable called name. You can find the class type and length of any variable, by using the 'type' and 'len' functions. To get the content, simply enter the variable name at the shell prompt: >>> name

Notice that we didn't have to declare the variable to Python before we used it. Python uses 'auto declarations' and assigns the variable a class type based upon what is stored in memory. Assignments evaluate the right hand side first, so in the above example, the function 'input' was used to get the user input, Python then allocated the memory needed, based on what was returned from 'input' then the variable 'name' was assigned the class type 'str' (assuming that the user typed alpha characters) and attached to the memory location in which the string (the keyboard input in this case) was stored.

The three main variable type classes in Python are 'str' (character string) 'int' (integer value) and 'float' (floating point value).

Because the input, in this case, is coming from the computer keyboard, the class type will always be 'str', that is to say that the function call to input will always return a string value, but if we wanted a numeric value, type conversions can be done on-the-fly. Consider this script...

answer = input ("What is 10 + 10? ")
if int (answer) != 10+10:
    print ("No! You're an idiot!")
elif int (answer) == 10+10:
    print ("Wow! You're a genius!")

The first line of that script works in the same way as before, that is to say that the user input will be stored as a string. Then we have the if and the elif statements which are doing on-the-fly type conversions from a string to an integer, using the int function, so that a comparison can be made to the result of the 10+10 operation. Also notice that the four-space indentation is not purely for aesthetics, but is a requisite (a minimum of one space being required) for the syntax of the code. Try leaving the spaces out to see what error messages you get.

Complementary to the int function are float and str any one of which can be used to do on-the-fly type conversions or the result of the conversion can be stored in a new variable.

Now that we can get some user input, one of the first jobs is to 'sanitise' it, that is to say, we need to check that what was input is what we were expecting. Many exploits have been made possible because the user input was not checked for sanity.

If we ask the user for a user name, it would be prudent to disallow so called 'special characters' such as \ / ? " % & ~ # > < and the like. The exception to this would be if an email address is allowed for a user name, in which case the at sign @ and a period . would need to be allowed and possibly the underscore _. It may also be prudent to restrict the maximum length of a user name to something that is reasonable.


ASCII & Lists

If we look at all the printable ASCII characters, we'll see that there are 95 in total, with decimal values from 32 to 126. We can see all 95 by running a small Python script.

value = 32
while value <= 126:
    print (value, " - ", chr(value))
    value +=1

So, we'll need to check the decimal value of each character of the user name. If the value is less than 46, it needs to be rejected. If the value is 47, it needs to be rejected. If the value is between 58 and 63, it needs to be rejected. If the value is between 91 and 94, it needs to be rejected. If the value is 96, it needs to be rejected and if the value is greater than 122, it needs to be rejected. We'll also need to check that the maximum length of the user name has not been exceeded.

Python has the notion of 'Lists'. This is an odd name for a very useful feature. If you've any experience of other programming languages, you'll be familiar with the notion of an array. A Python List is a kind of an array.

Run this script...

user_name = input ("User Name? ")
print (list (user_name))

We can check on the length of the input with len (user_name) and reject anything longer than an arbitrary length right off the bat, before checking anything else. I'd not want an email address longer than about 30 characters, but some users may have one a little longer, so let's set a limit of 50. So, the first check we do is...

if len (user_name) > 50:
    print ("Your input has exceeded the maximum length allowed.")
else:

...and plug that into our script, which is now...

user_name = input ("User Name? ")
if len (user_name) > 50:
    print ("Your input has exceeded the maximum length allowed.")
else:
    print (list (user_name))

If you run that script with a reasonable input, you'll see the input in a List form. Lists are indexed from zero to x, where x is the number of the last item in the list. The last item can also be found with list_name [-1], so for this example user_name [-1] will display the last character of the user name, which is short hand for user_name [(len(user_name) -1)]

With the ASCII listing script, we used the chr() function to list the ASCII character for a given value. The complement to this is the ord() function which returns the decimal value for any given string character, as in ord ('R') which will return the value of 82. We can use this to return the decimal values of each character in the user name by iterating over our list, starting with ord(user_name [0]) and checking the returned value against our criteria.


Range

First we'll have our script output the decimal values for each character in the user name. Python has another very useful object called 'range'. The range object can produce a list-like output and, like a list, is indexed from zero to x. Run this script...

for number in range (20):
    print (number)

We can plug this object into our user name script and have it control the output like this...

user_name = input ("User Name? ")
if len (user_name) > 50:
    print ("Your input has exceeded the maximum length allowed.")
else:
    print (list (user_name))
for number in range (len (user_name)):
    print (ord (user_name [number]))

We now have a range loop iterating over each character in the user name, printing the decimal value of each ASCII character and controlled by the length of the user name so that we don't run past the end and crash the script. With each iteration, ord (user_name [number]) returns the decimal value of each ASCII character in the user name, indexed by [number] which is a value controlled by the range object. So, instead of simply passing that ASCII value to the print function, we need a custom function to which the ASCII value can be passed and checked against our criteria.

The name of our custom function will be error_check so, we can now plug that into our user name script, like this...

user_name = input ("User Name? ")

if len (user_name) > 50:
    print ("Your input has exceeded the maximum length allowed.")
else:
    for number in range (len (user_name)):
        error_check (ord (user_name [number]))
    if errors == 1:
        print ("Your user name contains",errors,"illegal character.")
    elif errors > 1:
        print ("Your user name contains",errors,"illegal characters.")
    else:
        print ("The user name " + user_name + " is good.")

Epilogue

So, to recap: The first thing we check is the length of the user name. If that is greater than 50, the if condition is 'true' and we print the message. The script will then simply drop past the else clause and the script will terminate, otherwise the else clause is executed and we iterate over the user name, by way of the range object, passing each ASCII value to the error_check function by way of the ord (user_name [number]) function. When the 'range' loop finishes, we check the value of 'errors'. Only if that value is one or greater do we print the appropriate message, otherwise no errors were detected, we print a nice friendly message and the script terminates.

So, the error_check function will need a way of keeping track of the number of errors found so that we can give the user some useful feedback. This is done via a 'global variable' which we've named 'errors'. Global variables are the only variables that need to be declared before being used, and, as the name suggests, they can be used by any function, which is an easy way to pass values from one function to another.

Custom functions also need to be declared and all declarations need to be made at the beginning of the script.

# Global variable declarations
errors = 0

# Define the error checking function
def error_check (value):
    global errors

    if value < 46:
        errors += 1
    elif value == 47:
        errors += 1
    elif value > 57 and value < 64:
        errors += 1
    elif value > 90 and value < 95:
        errors += 1
    elif value == 96:
        errors += 1
    elif value > 122:
        errors += 1
# End of error checking function

# Start of main routine
user_name = input ("User Name? ")

if len (user_name) > 50:
    print ("Your input has exceeded the maximum length allowed.")
else:
    for number in range (len (user_name)):
        error_check (ord (user_name [number]))
    if errors == 1:
        print ("Your user name contains",errors,"illegal character.")
    elif errors > 1:
        print ("Your user name contains",errors,"illegal characters.")
    else:
        print ("The user name " + user_name + " is good.")
# End of main routine

A couple of things to note in this finished script. The error check function uses a local variable called 'value' which is how it gets the ASCII value from the ord (user_name [number]) function, hence the choice of the name. Also, you'll see that we had to declare the global variable 'errors' at the beginning of the function, otherwise the function would be blind to this variable, which is incremented by one for each error that is detected.

As with any script or code, there is more than one way to accomplish the goal, but this seemed to me to be quite easy to follow. I'm new to Python and I'm in no way an expert, so if I can work this out and follow it, I hope that you can also. As always, if you've any feedback to offer, please use the form I've provided. I'll be posting another instalment as and when. Thank you for reading and I hope you found this of use.