Short intro to UNIX

UNIX Architecture

Shells

Pathname

Working directory

chdir

Input & Output

Unbuffered I/O

process control

Threads and thread ID

Error handling

Error recovery

User Id

Signals

  1. ignore.
  2. let the default action occur. default is to terminate the process
  3. provide a functiont hat is called when the signal occurs("catching") the signal.

Many conditions generla signals. Two terminal keys, called the interrupt key - often the DELETE key or control -c and the quit key - often control-backslash -are used to interrupt the currently running process. Another way to generate a signal is by calling the kill function.

time values

System calls and library functions

UNIX standarization and implementations

FIPS (federal information processing standard)

FOPEN_MAX
//minimum number of standard I/O streams that can be opened at once.

TMP_MAX in <stdio.h>
//max number of unique filenames generated by tmpnam function

FILENAME_MAX, better alternatives (NAME_MAX & FOPEN_MAX, TMP_MAX)

POSIX Limits

  1. numerical limits: LONG_BIT, SSIZE_MAX, and WORD_BIT
  2. minimum values; 25 constants
  3. Max value: _POSIX_CLOCKERS_MIN
  4. runtime increasable values: CHARCLASS_NAME_MAX, COLL_WEIGHTS_MAX, LINE_MAX, NGROUPS)_MAX, RE_DUP_MAX
  5. runtime invariant values, possibly indetrminate: 17 constants
  6. other invariant values: NG_ARGMAX< NG_MSGMAX, NL_SETMAX, and NL_TEXTMAX
  7. pathname variable values: FILESIZEBITS, LINK_MAX, MAX_CANON, MAX_INPUT, NAME_MAX, PATH_MAX, PIPE_BUF and SYMLINK_MAX

XSI Limits

#include <unistd.h>
long sysconf (int name);
long pathconf(const char *pathname, int name);
long fpathconfg(int fd, int name);
//corresponding value if OK, -1 on error

This code is written in C programming language and it's using three functions from the "unistd.h" library: sysconf(), pathconf() and fpathconf(). These functions are used to obtain the configuration limits and options of the system and the filesystem.

The sysconf() function takes an integer argument name and returns the value of a specific system option or limit. name can be one of several predefined constants, such as _SC_CLK_TCK (clock ticks per second), _SC_OPEN_MAX (maximum number of files that a process can have open), or _SC_PAGE_SIZE (size of a page in bytes).

The pathconf() function takes a string argument pathname and an integer argument name, and returns the value of a specific filesystem option or limit for the file or directory specified by pathname. name can be one of several predefined constants, such as _PC_NAME_MAX (maximum length of a file name), _PC_PATH_MAX (maximum length of a pathname), or _PC_NO_TRUNC (whether filenames are truncated or not).

The fpathconf() function takes an integer argument fd (file descriptor) and an integer argument name, and returns the value of a specific filesystem option or limit for the file or directory associated with the file descriptor fd.

Both pathconf() and fpathconf() functions provide similar functionality, but pathconf() takes a pathname as an argument and fpathconf() takes a file descriptor.

These functions can be used to determine the maximum limits of the system or filesystem, such as the maximum length of a file name, the maximum number of open files, or the maximum length of a pathname.

A semaphore is a synchronization object that is used to control access to shared resources in a concurrent or multi-threaded program. A semaphore is a value that is shared between multiple threads or processes, and it can be used to control the access to a shared resource by regulating the number of threads that can access the resource at the same time.

A semaphore has a value, which is an integer, that represents the number of available resources. When a thread wants to access a resource, it first checks the value of the semaphore. If the value is greater than zero, the thread can proceed and the semaphore value is decremented. If the value is zero, the thread is blocked and waits until the semaphore value becomes greater than zero.

A semaphore can be binary, which means it can only have two values, 0 and 1, and it is used to control access to a single resource, or it can be a general semaphore, which means it can have any integer value, and it is used to control access to multiple resources.

There are two basic operations on a semaphore: wait and signal.

Semaphores are widely used in operating systems, real-time systems, and parallel programming to synchronize access to shared resources and to prevent race conditions and deadlocks.

The atexit() function is a standard C library function that is used to register a function to be executed when a program exits. The function that is registered with atexit() is called an exit handler.

The atexit() function takes a single argument, which is a pointer to a function. The function must have no arguments and must return void. Once a function is registered with atexit(), it will be automatically executed when the program exits, whether it exits normally or due to an error.

The functions registered with atexit() are called in the reverse order they were registered. Each time atexit() is called, the function pointer is added to a list of exit handlers. When the program exits, the functions in the list are called in reverse order.

Here is an example of how to use the atexit() function to register an exit handler:

#include <stdio.h>
#include <stdlib.h>

void my_exit_handler(void) {
    printf("Exiting...\n");
}

int main() {
    atexit(my_exit_handler);
    printf("Hello, World!\n");
    return 0;
}

In this example, the function my_exit_handler() is registered with atexit(). When the program exits, the function will be automatically called and the message "Exiting..." will be printed.

It's worth noting that atexit() has a limit on the number of functions that can be registered and it's implementation dependent, it's best to check the limits of your specific implementation. Also, it is worth noting that the exit handlers are not guaranteed to run in case of abnormal termination of the program, such as when a crash or a call to the abort() function occurs.

iovec structure: data structure used to represnt a scatter-gather list of memory buffers. It is used in several system calls adn library functions that perform input and output operations on multiple memory buffers. The iovec structure is defined in the <sys/uio.h> header file and it has the following form:

struct iovec {
    void  *iov_base;    /* pointer to the memory buffer */
    size_t iov_len;     /* size of the memory buffer */
};

The iov_base field is a pointer to a memory buffer and the iov_len field is the size of the memory buffer. An array of iovec structures can be used to represent a scatter-gather list of memory buffers.

The scatter-gather I/O operations are performed by system calls such as readv() and writev() that allow reading from/writing to multiple memory buffers in a single system call. These functions take a pointer to an iovec structure as an argument, and the number of memory buffers in the scatter-gather list.

Here is an example of how to use the iovec structure and the readv() function to read data from a file into multiple memory buffers:

#include <sys/uio.h>
#include <fcntl.h>

struct iovec iov[3];

int fd = open("file.txt", O_RDONLY);

/* set up the iovec structures */
iov[0].iov_base = buffer1;
iov[0].iov_len = sizeof(buffer1);

iov[1].iov_base = buffer2;
iov[1].iov_len = sizeof(buffer2);

iov[2].iov_base = buffer3;
iov[2].iov_len = sizeof(buffer3);

/* read data into the memory buffers */
ssize_t n = readv(fd, iov, 3);

In the context of the SIGQUEUE_MAX constant, "signal" refers to a mechanism for inter-process communication (IPC) in the UNIX operating system. A signal is a software interrupt that is sent to a process to notify it of an event or to request a specific action. Signals are used to communicate between processes, between a process and the kernel, or between a process and a device driver.

SIGQUEUE_MAX is a constant that defines the maximum number of signals that can be queued for a process. This means that a process can only have a limited number of signals pending at any given time. Once the limit is reached, any further signals will be discarded.

For example, if a process is currently processing a signal and it receives another signal before it finishes processing the first one, the second signal will be queued and will be processed after the first one. If the process receives a third signal before it finishes processing the second signal and the queue is full, the third signal will be discarded.

SIGQUEUE_MAX is used by the sigqueue() function, which sends a signal to a specific process or process group and allows to pass a value along with the signal, this is known as real-time signals. The sigqueue() function takes an additional argument, sigval_t, which is a union that can hold an integer value or a pointer. This value is used to pass data along with the signal. The SIGQUEUE_MAX constant is used to limit the number of queued signals that can be sent to a process, which can be useful in preventing a process from being overwhelmed with too many signals at once.

It's worth noting that the value of SIGQUEUE_MAX is implementation-dependent and it can vary across different UNIX-like operating systems and different versions of the same operating system. Also, it's possible to check the value of SIGQUEUE_MAX using the sysconf() function, passing _SC_SIGQUEUE_MAX as an argument.

In summary, SIGQUEUE_MAX is a constant that defines the maximum number of signals that can be queued for a process. It is used to limit the number of signals that can be sent to a process at any given time, and it's a way to prevent a process from being overwhelmed with too many signals. It's an implementation-dependent constant and you can check its value using the sysconf() function.

max number of open file

A common sequence of code in a daemon process- process that runs in the background, not connected to the terminal - is one that closes all open files.

The apue.h header file (Advanced Programming in the UNIX Environment) is a header file that is part of the book "Advanced Programming in the UNIX Environment" by W. Richard Stevens. The book provides an in-depth introduction to the UNIX operating system and its programming interfaces. The apue.h header file is a collection of definitions and declarations that are used in the examples and programs in the book.

The apue.h header file provides a set of standard library functions and macros that provide a more consistent and portable interface to the UNIX system. It defines various constants, macros, and functions that are used in the book's examples and that can be used to write portable and more robust UNIX programs.

The apue.h header file includes definitions for standard types such as pid_t, uid_t, and gid_t, as well as function prototypes for various system calls and library functions that are not part of the standard C library. It also includes macros that provide a more consistent interface to some of the more complex system calls and library functions.

It's worth noting that this apue.h header file is not a standard library provided by the operating system, it's a personal implementation by the author, it's not available in the standard library of any operating system, it's not guaranteed to be supported on all UNIX-like systems, and it's not guaranteed to be compatible with future versions of the operating system.

pid_t, uid_t, and gid_t are data types that are defined in the apue.h header file (Advanced Programming in the UNIX Environment). These types are used to represent process IDs, user IDs, and group IDs respectively.

These types are used in various system calls and library functions that deal with process, user and group management, such as getpid(), getuid(), getgid(), kill(), setuid(), setgid() and others.

The constants _POSIX_C_SOURCE and _XOPEN_SOURCE are called feature test macros. All feature test macros begin with an underscore. When used, they are typically defined in the cc command, as in cc -D_POSIX_C_SOURCE=200809L file.c

This causes the feature test macro to be defined before any header files are included by the C program. If we want to use only the POSIX.1 definitions, we can also set the first line of a source file to #define _POSIX_C_SOURCE 200809L

To enable the XSI option of Version 4 of the Single UNIX Specification, we need to define the constant _XOPEN_SOURCE to be 700. Besides enabling the XSI option, this has the same effect as defining _POSIX_C_SOURCE to be 200809L as far as POSIX.1 functionality is concerned. The Single UNIX Specification defines the c99 utility as the interface to the C compilation environment. With it we can compile a file as follows:

c99 -D_XOPEN_SOURCE=700 file.c -o file

To enable the 1999 ISO C extensions in the gcc C compiler, we use the -std=c99 option, as in gcc -D_XOPEN_SOURCE=700 -std=c99 file.c -o file

Exercises

2.1 We mentioned in Section 2.8 that some of the primitive system data types are defined in more than one header. For example, in FreeBSD 8.0, size_t is defined in 29 different headers. Because all 29 headers could be included in a program and because ISO C does not allow multiple typedefs for the same name, how must the headers be written?

/_ header contents _/

#endif /_ _HEADER_H _/ `

In this way, if the header file is included multiple times in the same program, the header guard macros will ensure that the header contents are only included once, avoiding multiple typedefs for the same name.

Another way to achieve this is to use the #pragma once directive, this directive indicates to the preprocessor that the header file should only be included once, this directive is not part of the standard C and it's implementation dependent.

2.2 Update the program in Figure 2.17 to avoid the needless processing that occurs when sysconf returns LONG_MAX as the limit for OPEN_MAX.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    long openmax = sysconf(_SC_OPEN_MAX);
    if (openmax == LONG_MAX) {
        fputs("unlimited", stdout);
    } else {
        printf("%ld\n", openmax);
    }
    exit(0);
}

In this example, the program first uses the sysconf() function to retrieve the limit for OPEN_MAX. If the returned value is equal to LONG_MAX, the program prints "unlimited" and exits. Otherwise, it prints the returned value and exits.

In this way, the program avoids performing unnecessary processing when sysconf() returns LONG_MAX as the limit for OPEN_MAX.

In summary, the program in Figure 2.17 can be updated to avoid the needless processing that occurs when sysconf() returns LONG_MAX as the limit for OPEN_MAX, by checking the return value of sysconf() and only perform additional processing when the return value is not equal to LONG_MAX.

FILE I/O

FILE Descriptors

#include <fcntl.h>
int open(const char *path, int oflag, ... /* mode_t mode */ );
int openat(int fd, const char *path, int oflag, ... /* mode_t mode */ );

Creat function

#include <unistd.h>
int creat(const char *path, mode_t mode); Returns: file descriptor opened for write-only if OK, −1 on error

close function

an open file is closed by calling the close fct

#include <unistd.h>
int close(int fd); Returns: 0 if OK, −1 on error

lseek function

the term "offset" is often used in the context of memory management and file handling.

For example, in memory management, an offset refers to the distance in bytes between a memory location and the beginning of a memory segment or block. This information is used to calculate the address of a specific memory location, given the base address of the memory segment and the offset.

In file handling, an offset refers to the position of the next byte to be read or written in a file. The operating system uses the file offset to keep track of the current position within a file, so it knows where to read or write the next byte. The lseek() function is used to change the current file offset of a file descriptor.

#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);
//Returns: new file offset if OK, −1 on error

The interpretation of the offset depends on the value of the whence argument.

The file’s offset can be greater than the file’s current size, in which case the next write to the file will extend the file. This is referred to as creating a hole in a file and is allowed. Any bytes in a file that have not been written are read back as 0.

#include "apue.h"
#include <fcntl.h>
char buf1[] = "abcdefghij";
char buf2[] = "ABCDEFGHIJ";
int main(void)
{
int fd;
if ((fd = creat("file.hole", FILE_MODE)) < 0)
err_sys("creat error");

main creates a new file with creat() name "file.hole" with FILE_MODE defined in apue.h header. If creat() returns a negatvie value, it indicates that an error occurred.

if (write(fd, buf1, 10) != 10)
err_sys("buf1 write error");

This block of code uses the write() function to write the contents of the first buffer (buf1) to the file. The first argument is the file descriptor returned by the creat() function, the second argument is the address of the buffer containing the data to be written, and the third argument is the number of bytes to be written.

/* offset now = 10 */
if (lseek(fd, 16384, SEEK_SET) == -1)
err_sys("lseek error");

lseek() change file offset of file descriptor. `

  1. first argument is file descriptor
  2. second argument number of bytes to offset file pointer from the beginning of file(16384)
  3. third arguemnt SEEK_SET constant, specifies that the offset is relative to the beginning of the file. After this call, the file is positioned 16384 bytes from the beginning of teh file, creating a hole of 16384 bytes in the file.
if (write(fd, buf2, 10) != 10)
err_sys("buf2 write error");

using write() to write the contents of second buffer(buf2) to file, starting at current file offset(16384)

exit(0);
}

File sharing

  1. every process has an entry in the process table. wthin each process table entry is a table of open file descriptors, which we can think of as a vector, with one entry per descriptor. Assciated with each file descriptor are
    • file descriptor flags
    • pointer to a file table entry
  2. kernel maintains a file table for all open files. each file table entry contains
    • file status flag such as read, write append, sync, nonblocking
    • current file offset
    • pointer to the v-node table entry for the file.
  3. each open file (or device) has a v-node structure that contains information about the type of file and ponters to functions that operate on the file.

v-node: virtual node used by file system in some UNIX-like operating system to represent a file or a directory. It is a software abstraction that provides a common interafce for different file systems to the operating system's kernel.

A v-node contains information about a file or directory, such as its type (regular file, directory, symbolic link, etc.), permissions, ownership, timestamps, and the location of the file's data on disk. It also contains pointers to the file system-specific functions that are used to perform operations on the file or directory, such as reading, writing, and deleting.

pread and pwrite

calling pread is the equivalent to calling lseek followed by a call to read, with the some exceptions:

dup and dup2 functions

#include <unistd.h>
int dup(int fd); int dup2(int fd, int fd2);
//Both return: new file descriptor if OK, −1 on error

sync, fsync, fdatasync functions


#include <unistd.h>
int fsync(int fd); int fdatasync(int fd);
//Returns: 0 if OK, −1 on error
void sync(void);

fcntl function

#include
int fcntl(int fd, int cmd, ... /* int arg */ );
//Returns: depends on cmd if OK (see following), −1 on error

The fcntl function is used for five different purposes.

  1. Duplicate an existing descriptor (cmd = F_DUPFD or F_DUPFD_CLOEXEC)

  2. Get/set file descriptor flags (cmd = F_GETFD or F_SETFD)

  3. Get/set file status flags (cmd = F_GETFL or F_SETFL)

  4. Get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN)

  5. Get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW)

  6. F_DUPFD :Duplicates the file descriptor passed as the argument, using the lowest-numbered available file descriptor greater than or equal to the third argument.

  7. F_GETFD :Retrieves the file descriptor flags associated with the file descriptor passed as the first argument.

  8. F_SETFD :Sets the file descriptor flags associated with the file descriptor passed as the first argument to the value specified in the third argument.

  9. F_GETFL :Retrieves the file status flags and file access mode flags associated with the file descriptor passed as the first argument.

  10. F_SETFL :Sets the file status flags and file access mode flags associated with the file descriptor passed as the first argument to the value specified in the third argument.

  11. F_GETLK :Retrieves information about a lock on a file or a specific range of bytes within a file.

  12. F_SETLK :Sets a lock on a file or a specific range of bytes within a file.

  13. F_SETLKW :Sets a lock on a file or a specific range of bytes within a file, blocking the calling process if the lock cannot be acquired immediately. In the context of the fcntl() function, flags are a set of bits that can be used to configure the behavior of a file descriptor. Flags are passed as the third argument to the fcntl() function, and they indicate the specific operation to be performed on the file descriptor, or to set or retrieve information about the file descriptor.

There are two types of flags that can be used with fcntl(): file descriptor flags and file status flags.

File descriptor flags are used to configure the behavior of a file descriptor, such as whether it should be closed when an exec() family function is called (the FD_CLOEXEC flag).

File status flags are used to configure the behavior of the file itself, such as whether it should be opened in non-blocking mode (the O_NONBLOCK flag).

Each flag is represented by a constant, such as O_RDONLY, O_WRONLY, O_RDWR, F_DUPFD, F_DUPFD_CLOEXEC and so on. These constants are usually defined in the <fcntl.h> header file.



Tags: UNIX, c

← Back home