Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Secure Coding in C++ or why we have to use strings and characters in C++ safely? What is C++, what are Strings and Buffer Overflows? And something about software vulnerabilities and exploits
Let us start from the very beginning and talk a bit about C++, strings and characters. Why is it so important to "code secure" since type-unsafe languages, like C and C++, are prone to such vulnerabilities.
C++ (/ˌsiːˌplʌsˈplʌs/) is a general-purpose programming language created by Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significantly over time, and modern C++ now has object-oriented, generic, and functional features in addition to facilities for low-level memory manipulation. It is almost always implemented as a compiled language, and many vendors provide C++ compilers, including the Free Software Foundation, LLVM, Microsoft, Intel, Oracle, and IBM, so it is available on many platforms.[9]
C++ was designed for resource-constrained software and large systems. C++ was also made to have great performance, efficiency, and flexibility. Of course was C++ useful in many other ways, just think about many desktop applications, video games(Games are built from game engines, and millions of games engines are written in C++ as it has always been the best programming language when we talk about efficiency.), servers (e.g. e-commerce, web search, or databases), and performance-critical applications (e.g. telephone switches or space probes).
So what is about strings and characters, why Strings Are a Problem?
Software vulnerabilities and exploits are caused by
Weaknesses in
- string representation
- string management
- string manipulation
Strings are a fundamental concept in software
engineering, but they are not a built-in type in C++
C++ programmers must choose between using
- std::basic_string
- null-terminated byte strings (NTBS)
- other string types
- some combination of the above
You can use all these variables for different tasks. The most usable is of course printing messages, this is not a big secret strings can also be used to read from and write to files, copy from one memory buffer to another, etc.
There are tons of C++ functions existing for C++ string manipulation. So functions like printf and scanf are also included in there.
So what creates vulnerabilities in a C++ applications? The definition of c++ functions include a requirement for a format string and if you fail while using format string proper it can easily lead to vulnerabilities in a C++ applications.
A C++ format string can be a simple collection of characters. However, this is not a requirement. C++ also permits the use of format specifiers to enable functions like printf to take additional arguments and use them to build the final string.
Crazy, huh?
Programming languages commonly associated with buffer overflows include C++, it is a critical type of vulnerability. Do you ever heard something about buffer overrun or a buffer overflow?
What is buffer overrun or buffer overflow?
From Wikipedia, the free encyclopedia
In information security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations.
Buffers are areas of memory set aside to hold data, often while moving it from one section of a program to another, or between programs. Buffer overflows can often be triggered by malformed inputs; if one assumes all inputs will be smaller than a certain size and the buffer is created to be that size, then an anomalous transaction that produces more data could cause it to write past the end of the buffer. If this overwrites adjacent data or executable code, this may result in erratic program behavior, including memory access errors, incorrect results, and crashes.
I am as always here for you to tell it in simple words so if you don't need to break it in parts by yourself.
So first of all what is a buffer? A buffer, simple, in other words, is just a temporary place for data storage. Imagine a two container with water, if you put more water than originally can be allocated or stored there, it's going to leak in other container. The same process with the buffer, when you put more data as originally can be allocated or stored there by a program or system process, the extra data overflows. It also causes some data leaking into the other buffers leading to corruption or overwriting data they were first holding. With other words and after that we can get reverse shell or just crash the program.
In a buffer-overflow attack, the extra data sometimes holds specific instructions for actions intended by a hacker or malicious user; for example, the data could trigger a response that damages files, changes data or unveils private information.
Buffer overflow have a log term story, the first documented buffer overflow was in 1988.
Because of that buffer overflow is also well known vulnerability. Most software developers knows what this vulnerability does. Thats why its also important for us to know as much as possible about this beast.
Very classic example of buffer overflow can be something like this:
Simple buffer overflow exploit, the attacker sends data to a victim program, this one is about to store it in an undersized buffer. The result of this manipulation is of course that information on the stack is overwritten.
Attackers that have malicious purpose use buffer overflows to corrupt the stack of a web application, but again, why? By sending carefully crafted input to our little web application, an malicious attacker can let the application execute arbitrary code – and after that easily and effectively take over the victims machine.
Examples
Some code for examples:
Example 1.a
The following sample code demonstrates a simple buffer overflow that is often caused by the first scenario in which the code relies on external data to control its behavior. The code uses the gets() function to read an arbitrary amount of data into a stack buffer. Because there is no way to limit the amount of data read by this function, the safety of the code depends on the user to always enter fewer than BUFSIZE characters.
char buf[BUFSIZE];
gets(buf);
Example 1.b
This example shows how easy it is to mimic the unsafe behavior of the gets() function in C++ by using the operator to read input into a char[] string.
char buf[BUFSIZE];
cin (buf);
Example 2
The code in this example also relies on user input to control its behavior, but it adds a level of indirection with the use of the bounded memory copy function memcpy(). This function accepts a destination buffer, a source buffer, and the number of bytes to copy. The input buffer is filled by a bounded call to read(), but the user specifies the number of bytes that memcpy() copies.
char buf[64], in[MAX_SIZE];
printf("Enter buffer contents:\n");
read(0, in, MAX_SIZE-1);
printf("Bytes to copy:\n");
scanf("%d", &bytes);
memcpy(buf, in, bytes);
Note: This type of buffer overflow vulnerability (where a program reads data and then trusts a value from the data in subsequent memory operations on the remaining data) has turned up with some frequency in image, audio, and other file processing libraries.
Example 3
This is an example of the second scenario in which the code depends on properties of the data that are not verified locally. In this example a function named lccopy() takes a string as its argument and returns a heap-allocated copy of the string with all uppercase letters converted to lowercase. The function performs no bounds checking on its input because it expects str to always be smaller than BUFSIZE. If an attacker bypasses checks in the code that calls lccopy(), or if a change in that code makes the assumption about the size of str untrue, then lccopy() will overflow buf with the unbounded call to strcpy().
char *lccopy(const char *str) {
char buf[BUFSIZE];
char *p;
strcpy(buf, str);
for (p = buf; *p; p++) {
if (isupper(*p)) {
*p = tolower(*p);
}
}
return strdup(buf);
}
With this article, you must have a strong idea of how to use Strings and Characters securely in C++.