Oddbean new post about | logout
  ======================================================
=====================================================

# About

## Introduction

The purpose of this project is to test my understanding of the concepts taught in
the "Introduction to Programming Concepts" course (CS124) at the University of
Minnesota Twin Cities.  This assignment involves implementing a basic program that
counts the number of occurrences of each word in a text file and outputs the results
in alphabetical order.

## Requirements

The program should accept a text file as input, process it line by line, and count
the number of times each word appears.  The output should be sorted alphabetically
and written to a new text file.

## Concepts Covered
- [File I/O](#file-io)
- [String Processing](#string-processing)
- [Map Data Structure](#map-data-structure)
- [Sorting Algorithms](#sorting-algorithms)


# File I/O
The first step in this project is to read a text file and count the number of words it contains.  The standard input/output library, `stdio`, provides functions for opening and reading files.

## Reading Files

To open a file for reading, you can use the `fopen()` function from the standard I/O library.  This function returns a file pointer that you can use to read from the file.  Here is an example:

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
  FILE *fp;
  fp = fopen("input.txt", "r");
  if (fp == NULL) {
    printf("Could not open file.
");
    return 1;
  }
  // ...
}
```

In this example, `fopen()` is called with the file name "input.txt" and the mode "r", which specifies that we want to read from the file.  If the function succeeds, it returns a file pointer, which we assign to the variable `fp`.  If the function fails (e.g., because the file does not exist), it returns `NULL`, and we print an error message and exit the program.

To read from a file, you can use the `fgets()` function.  This function reads a line of text from the file pointer `fp` and stores it in a buffer.  Here is an example:

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
  FILE *fp;
  char line[1024];
  fp = fopen("input.txt", "r");
  if (fp == NULL) {
    printf("Could not open file.
");
    return 1;
  }
  while (fgets(line, sizeof(line), fp)) {
    // ...
  }
}
```

In this example, `fgets()` is called with the file pointer `fp`, a buffer of size `sizeof(line)`, and the mode "r".  If the function succeeds, it returns the number of characters read.  If the function fails (e.g., because there are no more lines in the file), it returns `EOF`.

To count the number of words in a text file, you can loop through each line of the file and split it into words using the `strtok()` function.  Here is an example:

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
  FILE *fp;
  char line[1024];
  fp = fopen("input.txt", "r");
  if (fp == NULL) {
    printf("Could not open file.
");
    return 1;
  }
  int num_words = 0;
  while (fgets(line, sizeof(line), fp)) {
    char *word;
    size_t len = strlen(line);
    for (size_t i = 0; i < len; ++i) {
      if (line[i] == ' ') {
        word = strtok(line, " ");
        num_words++;
        // ...
      }
    }
  }
}
```

In this example, we initialize a counter variable `num_words` to zero.  Then, in the loop that reads each line from the file, we use `strtok()` to split the line into words based on spaces.  For each word, we increment the counter `num_words`.

## Writing Files

To write data to a file, you can use the `fprintf()` function from the standard I/O library.  This function writes formatted output to a stream (i.e., a file pointer).  Here is an example:

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
  FILE *fp;
  char output[1024];
  fp = fopen("output.txt", "w");
  if (fp == NULL) {
    printf("Could not open file.
");
    return 1;
  }
  fprintf(fp, "%s
", "Hello World!");
}
```

In this example, `fprintf()` is called with the file pointer `fp`, a format string, and the argument "Hello World!".  The format string specifies how the output should be formatted.  In this case, we use `%s` to write a string, followed by a newline character `
`.

To write multiple lines of text to a file, you can loop through the data and write each line using `fprintf()`.  Here is an example:

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
  FILE *fp;
  char output[1024];
  fp = fopen("output.txt", "w");
  if (fp == NULL) {
    printf("Could not open file.
");
    return 1;
  }
  for (int i = 0; i < 3; ++i) {
    fprintf(fp, "%d: Hello World!
", i + 1);
  }
}
```

In this example, we use a loop to write three lines of text to the file.  For each line, we use `fprintf()` with a different argument (i.e., the value of `i + 1`).

# String Processing

The next step in this project is to process each word in the file and count its occurrences.  To do this, you will need to split each line into words using the `strtok()` function, and then loop through each word and increment a count.

## Splitting Strings

The `strtok()` function takes two arguments: a string and a delimiter.  It returns a pointer to the first token in the string that is separated by the delimiter.  If there are no more tokens, it returns a null pointer.

Here is an example of how you might use `strtok()` to split a string into words based on spaces:

```c
#include <stdio.h>
#include <string.h>

int main() {
  char *line = "Hello World!";
  char *token;
  size_t len = strlen(line);
  for (size_t i = 0; i < len; ++i) {
    if (line[i] == ' ') {
      token = strtok(line, " ");
      printf("Token: %s
", token);
    }
  }
}
```

In this example, `strtok()` is called with the string `"Hello World!"` and the delimiter `" "`.  If the function succeeds, it returns a pointer to the first token (i.e., "Hello").  Then, we print the value of the token using `printf()`.  We continue this process until the end of the line is reached (i.e., the character after the last space).

## Counting Words

To count the occurrences of each word in the file, you can use a map data structure.  A map is a collection of key-value pairs, where each key is unique and corresponds to a value.  In this case, we will use the key as the word and the value as its count.

Here is an example of how you might implement a map using an array:

```c
#include <stdio.h>
#include <string.h>

int main() {
  char *words[1024];
  int counts[1024] = { 0 };
  for (int i = 0; i < 1024; ++i) {
    words[i] = NULL;
    counts[i] = 0;
  }
  // ...
}
```

In this example, we declare two arrays: `words` and `counts`.  The `words` array is used to store pointers to the strings (i.e., the words).  The `counts` array is used to store the counts for each word.  We initialize both arrays with zeros.

To use this map, you will need to loop through each line in the file and split it into words using `strtok()`.  Then, for each word, you can search the map to see if it already exists.  If it does, you can increment its count.  If it doesn't, you can add it to the map with a count of one.

Here is an example of how you might use this map to count the occurrences of each word in the file:

```c
#include <stdio.h>
#include <string.h>

int main() {
  char *words[1024];
  int counts[1024] = { 0 };
  for (int i = 0; i < 1024; ++i) {
    words[i] = NULL;
    counts[i] = 0;
  }
  FILE *fp;
  char line[1024];
  fp = fopen("input.txt", "r");
  if (fp == NULL) {
    printf("Could not open file.
");
    return 1;
  }
  while (fgets(line, sizeof(line), fp)) {
    char *token;
    size_t len = strlen(line);
    for (size_t i = 0; i < len; ++i) {
      if (line[i] == ' ') {
        token = strtok(line, " ");
        if (strcmp(token, "hello") == 0) {
          counts[strlen(token) - 1]++;
        }
      }
    }
  }
}
```

In this example, we use a loop to read each line from the file.  For each line, we split it into words using `strtok()`.  Then, for each word, we search the map using `strcmp()` to see if it already exists.  If it does (i.e., the comparison returns zero), we increment its count by one.

## Sorting Maps

To sort the map by key (i.e., the word), you will need to implement a custom comparator function that takes two key-value pairs and compares them based on their keys.  Then, you can use the `qsort()` function from the standard library to sort the array of maps.

Here is an example of how you might implement a custom comparator function:

```c
int compare(const void *a, const void *b) {
  char *word1 = *(char **)a;
  char *word2 = *(char **)b;
  return strcmp(word1, word2);
}
```

In this example, `compare()` takes two pointers to strings (i.e., the keys of the maps).  It compares the two strings using `strcmp()`, and returns the result.

To use this comparator function to sort the array of maps, you can call `qsort()` with the array, the number of elements, and the comparator function:

```c
int count = 0;
for (int i = 0; i < 1024; ++i) {
  if (strcmp(words[i], "hello") == 0) {
    counts[strlen(words[i]) - 1]++;
    count++;
  }
}
qsort(counts, count, sizeof(int), compare);
```

In this example, we use the `count` variable to keep track of the number of elements in the map.  Then, we call `qsort()` with the array `counts`, the number of elements (i.e., `count`), and the comparator function `compare`.

# Sorting Strings

To sort the strings in the file by alphabetical order, you can use a custom comparator function that takes two string pointers and compares them using `strcmp()`.  Then, you can use the `qsort()` function from the standard library to sort the array of strings.

Here is an example of how you might implement a custom comparator function:

```c
int compare(const void *a, const void *b) {
  char *str1 = *(char **)a;
  char *str2 = *(char **)b;
  return strcmp(str1, str2);
}
```

In this example, `compare()` takes two pointers to strings.  It compares the two strings using `strcmp()`, and returns the result.

To use this comparator function to sort the array of strings, you can call `qsort()` with the array, the number of elements, and the comparator function:

```c
int count = 0;
for (int i = 0; i < 1024; ++i) {
  if (strcmp(words[i], "hello") == 0) {
    counts[strlen(words[i]) - 1]++;
    count++;
  }
}
qsort(counts, count, sizeof(int), compare);
```

In this example, we use the `count` variable to keep track of the number of elements in the map.  Then, we call `qsort()` with the array `counts`, the number of elements (i.e., `count`), and the comparator function `compare`.

# Writing Output

To write the output to a file, you can use the `fprintf()` function from the standard library.  This function takes three arguments: a pointer to a stream (i.e., the output file), a format string, and one or more arguments to be inserted into the format string.

Here is an example of how you might use `fprintf()` to write the sorted strings to a file:

```c
FILE *fp;
fp = fopen("output.txt", "w");
if (fp == NULL) {
  printf("Could not open file.
");
  return 1;
}
for (int i = 0; i < count; ++i) {
  fprintf(fp, "%s
", words[i]);
}
```

In this example, we use the `fopen()` function to open the output file for writing.  Then, we use a loop to write each sorted string to the file using `fprintf()`.  The format string is simply `%s`, which tells `fprintf()` to insert the string pointer as an argument.

Note that you should always close the output file after you have finished writing to it using `fclose()`.  This function takes a pointer to a stream (i.e., the output file) and closes it.

Here is an example of how you might use `fclose()` to close the output file:

```c
fclose(fp);
```

In this example, we simply call `fclose()` with the pointer to the output file (i.e., `fp`).  This will close the file.