Interprocess Communication: Sockets

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

We have discussed sockets and how the are used in inter processes communication between both local-local and local-remote systems

Table of contents.

Introduction.
Socket system calls.
Servers.
Local sockets.
Internet sockets.
Socket pairs.
Summary.
References.

Prerequisites.

Introduction.

In previous articles we have discussed four ways in which processes are able to exchange data with each other such as using a shared memory space and how these processes are synchronized using semaphores, use of a shared file in a file system and through pipes.

Now let's discuss sockets. These are bidirectional communication devices used to communicate with processes within the same machine or processes on different machines via a network.
Sockets find applications in many programs such as websites, ftp protocol, telnet and more.

When creating a socket we specify details such as the namespace, protocol we intend to use and the method of communication.
With the method of communication parameter, we define how a socket is supposed to treat transmitted data and the number of communication parties.
Data through a socket is in terms of packets, the method of communication will define how they are handled or addressed from the sender to the receiver.

Connection styles e.g TCP/IP are responsible for the delivery of packets in the order in which they were sent. If this order is disrupted or during transmission the packets are destroyed, a request for retransmission is sent to the sender.
This is possible since the sender's and receiver's address are fixed when a connection is established.

Datagram styles e.g UDP don't guarantee delivery or packet arrival, in other words this system will guarantee best effort and thus packets may be received in a different order from which they were sent if they arrive at all.
For each individual packet the sender's and receiver's address are attached to it.

A namespace defines how a socket address is written, e.g an address could be local for ordinary file names or internet namespaces which are composed of an Ip address to a host and the port number which distinguishes a socket if multiple sockets are opened.

A protocol e.g UDP, TCP, etc. is responsible for specifying how data will be transmitted.

Socket system calls.

The following are system calls used with sockets;

socket, to create a socket.
close, to terminate a socket.
connect, to connect two sockets.
bind, to label a server socket using an address.
listen, used for configuring a socket so that it can accept conditions.
accept, for accepting connections and creating new sockets for the connection.

Note that sockets are represented as file descriptors.
To access the manual pages for a system call we use the following command.

man system-call

Creation and termination of sockets.

To create a socket we specify the three parameters described in a previous section, these are protocol, namespace and communication method.

For a namespace we use constants that begin with PF_ which stands for protocol families e.g PF_LOCAL to specify a local namespace and PF_INET to specify an internet namespace.

Communication styles begin with SOCK_ e.g SOCK_STREAM for a connection style and SOCK_DGRAM for a datagram style socket.
A protocol is valid for a specific namespace-communication style combination.
There will always exit a single select protocol for each such combinations and thus commonly 0 is specified.
After creating a socket we can read or write and when done use close to terminate the socket.

Socket connection.

A client will call connect so as to create a connection between sockets. The address of the server socket should be specified in this call.

A client process initiates the connection while a server process waits to accept connections.

In the call, the server socket is the second argument while the third is the length of the address structure pointed to by the server socket in bytes.
According the the namespace addresses will be different.

To send data through sockets, similar methods used to write file descriptors are used here.

Servers.

A server's life cycle is as follows, first a connection-style socket is created, then an address is bound to the socket, calls to listen are placed so as to enable connections, calls to accept allow incoming connections and finally the socket is closed.

Each time a program accepts a new connection, Linux will create a separate socket which will be used for data transfer over the network.

The bind call, binds an address to a server's socket. It takes the socket file descriptor as its first argument, a pointer to a socket address structure as its second argument whose format depends on the socket's address family and the length of the address structure in bytes as the third argument.

When an address is bound to a connection-style socket, listen must be invoked as an indication of it being a server.
listen takes the socket file descriptor, queued pending connections and length of the socket address structure in bytes as its first, second and third arguments respectively.
Connections are rejected if the queue is full however this doesn't limit the total number of connections a sever is able to handle just the number of clients attempting to connect but have not yet been accepted.

accept should be called when a server wants to accept a connection request from a client. It takes a socket file descriptor, a pointer to a socket address structure and the length of the socket address structure in bytes as its first, second and third arguments respectively.
This call creates a new sockets for data exchange with the client and returns the clients corresponding file descriptor.

The recv call is used to read data from a socket without removing it from the queue, its arguments are similar to read, additional arguments are FLAGS such as MSG_PEEK which will read data but not remove it from the input queue.

Local sockets.

As stated earlier we use PF_LOCAL to represent local namespaces which are used by sockets to connect processes on the same computer. We can also use PF_UNIX symbol.

Sockets using such namespaces are referred to as local sockets or UNIX-domain sockets and their socket addresses are specified by filenames only used when creating connections.

The structure sockaddr_un is used to specify a socket's name, it has a sun_family field which we set to AF_LOCAL to indicate that it is a local namespace.

Another field is the sun_path field which specifies the filename to use. Its length is at most 108 bytes long.

We use SUN_LEN macro to compute struct sockaddr_un.
A process must have write permissions over a directory so that files can be added, it must also have read permissions for a file so as to connect to a socket.
Processes which are running on the same system are allowed to communicate with local namespace sockets even if different machines share a similar file system.
We use 0 as a protocol for a local namespace.

An example of a local namespace socket server- sock-server.c

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/socket.h>
#include<sys/un.h>
#include<string.h>

// read from socket and print
// Continue until socket terminates
// Return non-zero of 'quit' is sent by client
int server(int clientSocket){
    while(1){
        int length;
        char* text;

        // read length of message
        // read returns 0, client terminated connection
        if(read(clientSocket, &length, sizeof(length)) == 0)
            return 0;

        // allocate buffer to hold message
        text = (char*)malloc(length);

        // read and print text
        read(clientSocket, text, length);
        printf("%s \n", text);

        // free text buffer
        free(text);

        // if client sent 'quit', its done
        if(!strcmp(text, "quit"))
            return 1;
    }
}

int main(int argc, char* const argv[]){
    const char* const socketName = argv[1];
    int socketFD;
    struct sockaddr_un name;
    int clientQuitMessage;

    // create socket
    socketFD = socket(PF_LOCAL, SOCK_STREAM, 0);

    // label as socket server
    name.sun_family = AF_LOCAL;
    strcpy(name.sun_path, socketName);
    //bind(socketFD, &name, SUN_LEN(&name));

    // listen for connections
    listen(socketFD, 5);

    // accept connections continously for each client
    // until "quit" message is sent
    do{
        struct sockaddr_un clientName;
        socklen_t clientNameLen;
        int clientSocketFD;

        // accept, handle and terminate connection
        clientSocketFD = accept(socketFD, &clientName, &clientNameLen);
        clientQuitMessage = server(clientSocketFD);
        close(clientSocketFD);
    }while(!clientQuitMessage);
    
    // remove socket file
    close(socketFD);
    unlink(socketName);

    return 0;
}

A path to the socket is passed as a command-line argument.
The program creates a local namespace socket and listens for connections on it.

After a connection is received, it reads the messages sent and prints them out until the connection is terminated.
A socket can also be removed if one of the messages is 'quit'.

Let's create a client socket which will connect to a local namespace socket and send messages - sock-client.c

#include<stdio.h>
#include<unistd.h>
#include<sys/socket.h>
#include<sys/un.h>
#include<string.h>

// write text to socket given by file descriptor
void writeText(int socketFD, const char* text){
    // write number of bytes in string
    int length = strlen(text) + 1;
    write(socketFD, &length, sizeof(length));
    // write string
    write(socketFD, text, length);
}

int main(int argc, char* const argv[]){
    const char* const socketName = argv[1];
    const char* const message = argv[2];
    int socketFD;
    struct sockaddr_un name;

    // create socket
    socketFD = socket(PF_LOCAL, SOCK_STREAM, 0);
    // store server name in socket address
    name.sun_family = AF_LOCAL;
    strcpy(name.sun_path, socketName);
    // connect socket
    connect(socketFD, &name, SUN_LEN(&name));
    // write text from command line to socket
    writeText(socketFD, message);
    close(socketFD);
    return 0;
}

A path to the socket is passed as a command-line argument.
Before this client send a message first it sends the length of the text by sending bytes of the variable length.

The server will then read this length of text by reading from the sockets into an integer variable.
This process on the server side allows the allocation of space inside a buffer to hold the message text before it is read from the socket.

Execution,

gcc sock-server.c -o server
gcc sock-client.c -o client

Open two terminal windows, in the first, execute the following,

./server /tmp/socket

In the second

./client /tmp/socket "Testing socket"

The text 'Testing socket' should be printed out in the server terminal.
To quit write, in the second terminal,

./client /tmp/socket "quit"

Internet sockets.

These sockets are used to enable communication between processes on different machines which are connected via a network.

Their namespace symbol is represented as PF_INET, a common protocol used is TCP/IP.

IP transfers packets by splitting them so as to send them and when they arrive they are combined.
Packet delivery is best-effort and so they may not reach their intended recipients or may reach but be reordered differently to how they were previously ordered.
TCP protocol which is layered above IP is reliable and provides an ordered connection.
Data with this protocol is delivered and ordered just like it was previously ordered at the sender's socket.

Internet socket addresses are comprised of the machine number and the port number which are stored in a structure struct sockaddr_in.
It has sin_family field which is set to AF_INET so as to indicate that is it is an internet namespace address, sin_addr which stores the internet address of the desired machine.
A port number is used so as to distinguish different sockets.
We use htons to convert a port number into a network byte order since different machines will store different byte orders.
We use getbyhostname to convert human-readable hostnames and DNS names into 32-bit IP numbers. This function returns a pointer to hostent structure whose h_addr field has the host's IP.

Reading from a internet server - sock-inet.c

#include <stdlib.h>
#include <stdio.h>
#include <netinet/in.h>
#include <netdb.h>
#include <sys/socket.h>
#include <unistd.h>
#include <string.h>

// print home page contents for server socket
void getHomePage (int socketFD){
    char buffer[10000];
    ssize_t numberCharactersRead;
    // send GET request fr home page
    sprintf(buffer, "GET /\n");
    write(socketFD, buffer, strlen(buffer));
    // read from socket
    while (1) {
        numberCharactersRead = read(socketFD, buffer, 10000);
        if (numberCharactersRead == 0)
            return;
        // write to stdout
        fwrite(buffer, sizeof(char), numberCharactersRead, stdout);
    }
}

int main (int argc, char* const argv[]){
    int socketFD;
    struct sockaddr_in name;
    struct hostent* hostInfo;
    // create socket
    socketFD = socket(PF_INET, SOCK_STREAM, 0);
    // store server name in socket address
    name.sin_family = AF_INET;
    // convert from strings to IP
    hostInfo = gethostbyname(argv[1]);
    if(hostInfo == NULL) return 1;
    else name.sin_addr = *((struct in_addr *) hostInfo->h_addr);
    // webserver port 80
    name.sin_port = htons(80);
    // connect to web server
    if(connect(socketFD, &name, sizeof(struct sockaddr_in)) == -1){
        perror("connect");
        return 1;
    }
    // retrieve homepage
    getHomePage(socketFD);
    return 0;
}

The above program fetches a webpage from a web server whose hostname we specify as a command line argument.
gethostname translates the hostname into an IP and \ a TCP stream socket to port 80 is connected to the host.
We then issue a GET request and the server responds with the page.

Execution:

gcc sock-inet.c -o sockInet

./sockInet www.webpage.com

Socket pairs.

In previous articles we learnt how the pipe function creates two file descriptors one for the beginning and another for the end.

We also saw their limitations such as the file descriptors must be used by the related processes and the communication they support is unidirectional.
In sockets the socketpair function creates two file descriptors for two connected sockets on the same computer which permit bidirectional communication between related processes.

It takes the same first three parameters as socket system call which specifying the domain, connection method and protocol. Additionally it takes a fourth parameter which is a two-integer array filled with the file descriptions of the two sockets.
When we call socketpair we should specify PF_LOCAL as the domain.

Summary.

Sockets aid interprocess communication between computers connected in a local network and those connected over the internet using a network.

Sockets support bidirectional communication between computers.
For a socket to be created we specify the namespace, the protocol to be used and the method of communication.

System call used by sockets are socket to create a socket, close to terminate it, connect to connect two sockets, bind to label a server, listen to accept conditions and accept to accept connections.

References.

For the manual pages for system calls execute man system-call