logo

Practical Computing Advice and Tutorials

Tue: 23 Jul 2019


Site Content

Programming
&
Development


Technical Knowhow


Command Line Interface


Security

C Programming Project

The best way to learn any programming language is to have a project: think of something that you want to achieve, break it down into small steps and build the code, testing each step as you go. Don't worry about the fact that whatever you think of has, most likely, already been done; just reinvent the wheel. The motivation here is to learn the language.

As an example, I follow the work of Steve Gibson of grc.com. Each week Steve produces a podcast called Security Now! and along with the podcasts are transcripts and show notes, so each episode has six resources, of which I download three; one .mp3 file and two .pdf files. I'd like to be able to run a program that downloads all three files by just entering the episode number into my program and have the program do a system call to 'wget' and append the url with the episode number so that it downloads all three files. So, for episode 600 I'll need these three wget calls...

wget https://media.grc.com/sn/sn-600.mp3
wget https://www.grc.com/sn/sn-600-notes.pdf
wget https://www.grc.com/sn/sn-600.pdf

The C code would look like this...

#include<stdlib.h> // needed for the system call

int main()
{
    system("wget https://media.grc.com/sn/sn-600.mp3");
    system("wget https://www.grc.com/sn/sn-600-notes.pdf");
    system("wget https://www.grc.com/sn/sn-600.pdf");
    return (0);
}

I'm going to be building a program that asks for the episode number and then makes the three system calls based on the episode number entered.

So, the first thing to do is to have a program that asks for the episode number and stores that in a string variable, ready to be appended to the fixed part of the wget string. The wget is going to be a custom function call, with the argument being the episode number.

First things first, how do we get some user input?

I'm going to use the getchar() function which is a part of the stdio.h library. If you enter the command man getchar you'll see the details.

The getchar() function gets the input one character at a time ('char' is shorthand for 'character') from the standard input, which in this case will be the computer keyboard, but it doesn't have to be.

To test this function, we can use this test routine, which simply copies its input to its output, which is not something that you're likely to do, in practice, but it serves to demonstrate what's going on, under the hood, so to speak.

#include <stdio.h>

int main()
{
    int c;
    while ((c = getchar()) != EOF) {
        putchar (c);
    }
    return (0);
}

On running this routine it will wait at the 'while' loop for some input. The loop will only run if it encounters either a LF {Line Feed} or a EOF {End Of File} with contents still in the input buffer. LF can be sent from the keyboard by pressing the Enter key, while a EOF can be sent with the Ctrl/d key combination. Because LF has an ASCII value, it's treated as such and will be passed from the input to the output, that is from the 'getchar()' function to the 'putchar()' function, by way of the 'c' variable. The loop will only terminate if the input buffer is empty and EOF has been encountered. If EOF is encountered and the input buffer is not empty, the loop will execute and again wait for an empty buffer and EOF condition before terminating.

Any text that is typed before a LF or a EOF, is stored in a input buffer. Only when LF or EOF is encountered, does the loop from input to output, run: What you see on the screen, as you type, is simply an 'echo' of the keyboard.

So, if you entered the word Hello, then pressed the Enter key (or Ctrl/d) the 'getchar()' function will pull the first item from the input buffer and store its ASCII value in the 'c' variable (hence the c = ). This value is then passed to the 'putchar (c)' function which pushes it out to the output buffer. The routine then loops back and makes 'c' equal to the next value in the input buffer, which is again passed to the 'putchar (c)' function, and so on, until 'c' is equal to 10 (the ASCII value of LF) or EOF, at which point the loop will start over. If the EOF is encountered and the input buffer is empty, the loop will exit. This is the != part of the 'while loop'. In English it means "while the input is not equal to EOF". You could have != -1, which is the same thing on some systems, but while -1 is machine dependent, EOF is machine independent.

It's possible to see this in action by using a debugger called 'gdb', but we can also introduce our own custom debug routine by intercepting the loop with a custom function call.

#include <stdio.h>
int loop = 0;
char debug();
int main()
{
    printf("Main Routine Start");
    int c;

    while ((c = getchar()) != EOF) {
        debug(c);
        putchar(c);
    }

    printf("Main Routine End");

    return(0);
}

char debug(c)

int c;

{
    ++loop;

    printf("Debug Routine Start");
    printf("Loop Count = %d c = %d", loop, c);
    printf("Debug Routine End");

    return(0);
}

With this routine, each time the loop runs, we're branching to our custom function, printing some stuff, and then returning back to the main routine. Have a play with this and you'll better understand what's going on. You can stop the debug function running by simply commenting out the 'debug(c);' line with // (i.e. // debug(c);)

So, we can get some user input, but now we need to work out how to store that variable input as a string and then append that string to a string constant so that it can be used with the wget system call.

We already know how to copy the input to the output and intercept that, by way of a custom function, so maybe we can adapt this technique and store the user input in a sting variable, but the issue is that if I enter 600 for the episode number, it's the ASCII values 54, 48, 48, that are being captured, not the number 600.

While it's tempting to write the input like this...

char c;
while((c = getchar()) != EOF){

... this is in fact incorrect. The reason being, getchar() must return all possible characters so that it can be used to read arbitrary input, including any EOF value, which will be an integer, not a character.

First, we need to know what it is that we're dealing with, so, what is a string?

A string is a sequence of zero or more characters terminated by a null character, which is escape zero: \0. So, if we entered the string Hello, and pressed the Enter key, what was created was an array like this...
H e l l o \n \0

The end of a string is terminated with a zero (a so-called null character) so that functions such as printf() can detect the end of any string passed to it, so, if we're to construct our own strings, we need to conform to this format.

The '\n' is a Line Feed, the 'n' standing for 'newline' . You'll have already come across the '\n' sequence in the intro routines that I posted, used with printf() function calls, and it's called an 'escape sequence'. Understand that although we have to use two characters for an escape sequence, the computer stores it in a single location, just as it would for any ASCII character.

With this project, we'll be using two types of strings:

  1. String Variables
  2. String Constants

To recap, we're trying to build three wget commands, replacing the number with a user input...

wget https://media.grc.com/sn/sn-600.mp3
wget https://www.grc.com/sn/sn-600-notes.pdf
wget https://www.grc.com/sn/sn-600.pdf

... so the 'variable' bit will be not only the number, but also notice that we have a different sub-domain for the .mp3 files, just to keep things interesting.

Okay, so, much of what we need is already in a string format. The next thing to do is to convert the user input from ASCII values to a string. Also, we could do without having to press Ctrl/d to terminate the input.

#define LF 10        // The value of Line Feed
#include <stdio.h>
#include <stdlib.h>  // Needed for the exit command

int main()

{
    printf("Main Routine Start\n");

    int n, c;
    n = c = 0;            // Init the variables to zero

    char epn[3];          // Init a character array to hold the episode number sequence:
                          // epn[0], epn[1], epn[2]

    while ((c = getchar()) != LF) {      // Stop getting input when Enter is pressed
        epn[n] = c;                      // Assign epn[n] the value of c
        ++n;                             // Increment to the next position of epn
    }        // End of while loop

    if (n != 3) {     // Error check the input
        printf("ERROR IN INPUT. EPISODE NUMBERS ARE 3 DIGITS!\n");
        exit(10);     // On error Exit
    }                 // End of if error check

    puts (epn);       // Display the contents of epn

    printf("Main Routine End\n");

    return(0);

} // End of main

This routine terminates with epn holding the string variable of the number entered. It also does some basic error checks of the input and will exit if the number is not 3 digits long. I've not error checked for a non-numerical input since I'm going to be the user, I'll not be trying to break things by entering anything other than digits.

We now have our string variable, epn, which will hold the episode number of the resource we want to download, so now we need our string constants.

A sting constant is also a sequence of zero or more characters, hard coded into our program by surrounding any characters with quotes, as in "I am a sting!" or "" a null string. Remember that all stings have a \0 (escape zero) at the end so that programs and/or functions can 'see' where the end of the string is, so technically, even our "[empty string]" is stored as "[empty string]"\0 which means that the storage required is always going to be one more location than the number of characters between the quotes, but the length of the string remains the number of characters between the quotes.

There are some string constants in the URL, that can be hard coded, so, we can start and assign some names to them.

scheme = "https://"
sd1 = "media."
sd2 = "www."
dn = "grc.com/sn/sn-"
mp3 = ".mp3"
pdf = ".pdf"
notes = "-notes"

These can now be concatenated, together with the epn variable, before being passed to a system call. We can test the concatenation process with this routine...

#include <stdio.h>
#include <string.h> // needed for the str functions

int main()
{
    char *scheme, *sd1, *sd2, *dn, *mp3, *pdf, *notes, *url;
    char epn[3];
    epn[0] = '6';
    epn[1] = '0';
    epn[2] = '0';
    scheme = "https://";
    sd1 = "media.";
    sd2 = "www.";
    dn = "grc.com/sn/sn-";
    mp3 = ".mp3";
    pdf = ".pdf";
    notes = "-notes";

    strcpy (url, "wget ");    // The system command used
    strcat (url, scheme);
    strcat (url, sd2);
    strcat (url, dn);
    strcat (url, epn);        // The episode number
    strcat (url, pdf);
    puts (url);               // Test the url is good by outputting to the screen

    return (0);
}

While moving this forward, I've discovered that we've a memory allocation issue that shows up when the url string is reused. So, having done a little more research, I've employed a new approach and implemented some dynamic memory allocation to keep things under control. I've also improved the user input checking so that a strict "3 digits only" rule is enforced.

I've also used the puts command in place of the printf() function for screen messages, which is a much cleaner way of working.

There's some code repetition that could be written as a function call, but my thinking is that the entire routine is not long enough to worry about that, plus, should you want to, you can learn about that topic and build your own routine to include custom function calls. I've also left this in "test mode" by commenting out the 'system' commands.

I've commented the code as much as I can so that you should be able to follow what I've done and why.

One of the advantages I see with using this method for downloading these resources, as opposed to using the web browser/links method, is that the file dates are maintained; that is to say, the files are dated for when the files were made available, not for when I downloaded them. The advantage there is that I can sort the files by date, regardless of when I download them, so if I go a few weeks and get three or four episodes in one session, the files are dated correctly rather than them all having the same date.

I hope you found this interesting.



#define LF 10 // The value of Line Feed
#include <stdio.h> // The standard IO library
#include <stdlib.h> // Needed for the exit and system commands
#include <string.h> // Needed for the str functions

int main() {
  /* Declare the variables and set any init values */
    int loop; // A throwaway variable for loop control
    int epnbytes = 3; // Number of bytes to read into epn (Episode Number)
    int bytes_read; // Number of bytes read from the input
    char *static_scheme = "wget https://";
    char *static_sd1 = "media.";
    char *static_sd2 = "www.";
    char *static_dn = "grc.com/sn/sn-";
    char *static_mp3 = ".mp3";
    char *static_pdf = ".pdf";
    char *static_notes = "-notes";
    char *url = NULL;
    char *epn = NULL;

    /* Memory Allocation */
    url = (char *) malloc (50); // More then enough for the longest url
    if(url == NULL) {
        puts("Error. URL Memory allocation failed!");
        exit(10);
    }

    epn = (char *) malloc (epnbytes + 1); // Dynamic memory allocation
                                          // to match the size of the input.
    if(epn == NULL) {
        puts("Error. EPN Memory allocation failed!");
        exit(10);
    }

    /* Clear the screen and generate the title */
    printf("\033[2J"); // Clear the screen
    puts ("Security Now! Podcast and pdf files Downloader v1.0");
    puts ("This program will download the high quality MP3 audio file, Show Notes and transcript PDFs of the episode number entered.");

    /* Screen message for input request */
    puts("Enter the 3 digit episode number to download");

    /* Get the user input */
    bytes_read = getline (&epn, &epnbytes, stdin);
    epn[strlen(epn) - 1] = 0; // Remove the \n from epn

    /* Check that we have the correct input length */
    if (bytes_read != 4) { // Expect 3 byte input plus \0 = 4 in total
        printf("ERROR IN INPUT. EPISODE NUMBERS ARE 3 DIGITS!");

        /* On error, release the memory allocation and exit */
        free(epn);
        free(url);
        exit(10);
    }

    /* Check that we have a three digits */
    for (loop = 0; loop !=3; ++loop) {
        if (epn[loop] > '9' || epn[loop] < '0') {
            printf("ERROR IN INPUT. EPISODE NUMBERS ARE 3 DIGITS!");

            /* On error, release the memory allocation and exit */
            free(epn);
            free(url);
            exit(10);
        }
    }

    /* Begin the download */
    printf ("Downloading the files for episode #%s...\n", epn);

    /* Build the wget command and download the .mp3 file */
    strcpy(url, static_scheme);
    strcat(url, static_sd1);
    strcat(url, static_dn);
    strcat(url, epn);
    strcat(url, static_mp3);
    printf("%s \n", url);
    // system (url);

    printf ("sn-%s.mp3 download finished.\n\n", epn);

    /* Build the wget command and download the first .pdf file */
    strcpy(url, static_scheme);
    strcat(url, static_sd2);
    strcat(url, static_dn);
    strcat(url, epn);
    strcat(url, static_pdf);
    printf("%s \n", url);
    // system (url);

    printf ("sn-%s.pdf download finished.\n\n", epn);

    /* Build the wget command and download the second .pdf file */
    strcpy(url, static_scheme);
    strcat(url, static_sd2);
    strcat(url, static_dn);
    strcat(url, epn);
    strcat(url, static_notes);
    strcat(url, static_pdf);
    printf("%s \n", url);
    // system (url);

    printf ("sn-%s%s.pdf download finished.\n\n", epn, static_notes);

    /* Release the memory allocation */
    free(epn);
    free(url);
}    // End of main