Basics of Rich Text Format (RTF)

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

Does anyone remember this image when going to save documents?

Screenshot--362-

In this article, we are going to know a little detail about this .rtf

History and Introduction

In the earlier days of Windows, There was a very popular format for document creation is the .txt format.Files in this format could be created in the default Notepad viewer.

But there are some limitations of Notepad.

  • TXT files cannot retain any formatting (text formatting, photograph formatting) at that time.
  • TXT files cannot create lists.
  • Cannot able to include any images in the document.

But as technology is advanced, In 1987, the RTF format was developed by Microsoft Corporation. The biggest feature of RTF is cross-platform document interchange with Microsoft products.

Microsoft discontinued support for the RTF format in 2008. It is still widely used and a lot of files you could find online are in the RTF format.

Here we are going to create a brief idea about RTF. The following things we will cover here.

Overview:

Working of RTF

RTF is a specific type of word processing document evolved by Microsoft. It is mainly designed for transferring documents between word processing software.

rtf1

RTF solves several problems:

  • How do you pass Word files between Macs and PCs? ► Easy: just "Save As" an RTF file and pass that RTF file.
  • How do you format output from a database into a Word document? ► Easy: write it as an RTF file.
  • RTF provides codes that will let you do anything you can do in Word and a lot of things you can't do in HTML.
  • RTF format allows images and other entities within a document.
  • RTF file format also allows for encoding basic elements of the files, such as size, color, and font of the text. However, as we went forward the doc and docs format started dominating the market.
  • The extension used for the RTF file format is .rtf

How could we open the file in the RTF format?
The default editor for files in the RTF format is WordPad. The WordPad application comes along with the Windows operating system.

Interoperability:

Most word processing software support either RTF format importing and exporting for some RTF specification or direct editing, which makes it a "common" format between otherwise incompatible word processing software and operating systems. Most applications that read RTF files silently ignore unknown RTF control words.These factors contribute to its interoperability, though it is still dependent on the specific RTF version in use.

Code Syntax

RTF is a language like HTML that gets interpreted by various programs.

How can we be able to see the RTF code?
1.Create a word document first in Microsoft Word.
2.Save the word document as RTF file.
3.Open the file in Notepad editor.

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff31507\deff0\stshfdbch31506\stshfloch31506\stshfhich31506\stshfbi31507\deflang1033\deflangfe1033\themelang1033\themelangfe0\themelangcs0{\fonttbl{\f0\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f34\fbidi \froman\fcharset0\fprq2{\*\panose 02040503050406030204}Cambria Math;}
{\f37\fbidi \fswiss\fcharset0\fprq2{\*\panose 020f0502020204030204}Calibri;}{\f178\fbidi \fswiss\fcharset0\fprq2{\*\panose 020f0704030504030204}Arial Rounded MT Bold;}
{\flomajor\f31500\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\fdbmajor\f31501\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
{\fhimajor\f31502\fbidi \froman\fcharset0\fprq2{\*\panose 02040503050406030204}Cambria;}{\fbimajor\f31503\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
{\flominor\f31504\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\fdbminor\f31505\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
{\fhiminor\f31506\fbidi \fswiss\fcharset0\fprq2{\*\panose 020f0502020204030204}Calibri;}{\fbiminor\f31507\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f317\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}
{\f318\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\f320\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}{\f321\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\f322\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}
{\f323\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f324\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\f325\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}{\f657\fbidi \froman\fcharset238\fprq2 Cambria Math CE;}
{\f658\fbidi \froman\fcharset204\fprq2 Cambria Math Cyr;}{\f660\fbidi \froman\fcharset161\fprq2 Cambria Math Greek;}{\f661\fbidi \froman\fcharset162\fprq2 Cambria Math Tur;}{\f664\fbidi \froman\fcharset186\fprq2 Cambria Math Baltic;}
{\f665\fbidi \froman\fcharset163\fprq2 Cambria Math (Vietnamese);}{\f687\fbidi \fswiss\fcharset238\fprq2 Calibri CE;}{\f688\fbidi \fswiss\fcharset204\fprq2 Calibri Cyr;}{\f690\fbidi \fswiss\fcharset161\fprq2 Calibri Greek;}
{\f691\fbidi \fswiss\fcharset162\fprq2 Calibri Tur;}{\f692\fbidi \fswiss\fcharset177\fprq2 Calibri (Hebrew);}{\f693\fbidi \fswiss\fcharset178\fprq2 Calibri (Arabic);}{\f694\fbidi \fswiss\fcharset186\fprq2 Calibri Baltic;}
{\f695\fbidi \fswiss\fcharset163\fprq2 Calibri (Vietnamese);}{\flomajor\f31508\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\flomajor\f31509\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}
{\flomajor\f31511\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}{\flomajor\f31512\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\flomajor\f31513\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}
{\flomajor\f31514\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}{\flomajor\f31515\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\flomajor\f31516\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}
{\fdbmajor\f31518\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\fdbmajor\f31519\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\fdbmajor\f31521\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}
{\fdbmajor\f31522\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\fdbmajor\f31523\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\fdbmajor\f31524\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}
{\fdbmajor\f31525\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\fdbmajor\f31526\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}{\fhimajor\f31528\fbidi \froman\fcharset238\fprq2 Cambria CE;}
{\fhimajor\f31529\fbidi \froman\fcharset204\fprq2 Cambria Cyr;}

Here we are just focusing on the basics. So here we will only discuss code syntax and its types. We may go deeper into the code in a later article. Let's start it.

An RTF file consists of unformatted text, control words, control symbols, and groups.

1. Control Words:

A control word is a specially formatted command that RTF uses to mark printer control codes and information that applications use to manage documents.

Properties:

  • A control word cannot be longer than 32 characters.
  • A backslash begins each control word.
  • The LetterSequence is made up of lowercase alphabetic characters (a-z). RTF is case sensitive.
  • A control word takes the following form:
 \LetterSequence<Delimiter>
  • When a control word has no parameter or has a nonzero parameter, it is assumed that the control word turns on the property.
    Example: \b ► Bold the text.

  • When a control word has a parameter of 0, it is assumed that the control word turns off the property.
    Example: \b0 ► Turns off Bold.

2. Control Symbols:

A control symbol consists of a backslash followed by a single, nonalphabetic character. Control symbols take no delimiters.
Example: \~ ► Represents a nonbreaking space.

3. Groups:

A group consists of text and control words or control symbols enclosed in braces ({}). The RTF file can include groups for fonts, styles, screen color, pictures, footnotes, comments (annotations), headers and footers, summary information, fields, and bookmarks, as well as document-, section-, paragraph-, and character-formatting properties.

Properties:

  • The opening brace ({ ) indicates the start of the group and the closing brace ( }) indicates the end of the group.
{\f0\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
  • Each group specifies the text affected by the group and the different attributes of that text.
  • Formatting specified within a group affects only the text within that group.

RTF's syntax was influenced by the TeX typesetting language.These are some RTF code syntax.

Syntax Meaning
\i Italic
\b Bold
\u Underline
\par End of the paragraph(carriage return)
\qj Justify the text
\ql Flush left
\qr Flush right
\qc Center text

Formatting text with RTF codes:

Syntax Meaning
$RTF+/- Enable or disable formatting
$NL Inserts a new line
$CENTER+ Center justification
$RIGHT+ Right justification
$RESET Reset text justification (left justification)
$NUM+/- Numbered list
$B+/- Bold attribute
$I+/- Italic attribute
$U+/- Underline attribute
$UT+/- Thick underline attribute
$UDOT+/- Dot underline attribute
$UD+/- Dash underline attribute
$UDD+/- Dot dot dash underline attribute
$UW+/- Wavey underline attribute
$STR+/- Strikethrough attribute
$SUB+/- Subscript attribute
$SUP+/- Superscript attribute
$CAPS+/- All capitals attribute
$BULLET+/- Bullet list
$BLACK+/- Black color attribute
$BLUE+/- Blue color attribute
$B_+/- Background color attribute

Background color attribute where is one of the colors listed below.

  • Black ► $BLACK+/-
  • Blue ► $BLUE+/-
  • Cyan ► $CYAN+/-
  • Green ► $GREEN+/-
  • Magneta ► $MAGNETA+/-
  • Red ► $RED+/-
  • White ► $WHITE+/-
  • Yellow ► $YELLOW+/-
  • Dark Blue ► $DBLUE+/-
  • Dark Cyan ► $DCYAN+/-
  • Dark Green ► $DGREEN+/-
  • Dark Magneta ► $DMAGENTA+/-
  • Dark Red ► $DRED+/-
  • Dark Yellow ► $DYELLOW+/-
  • Automatic ► $AUTO

Character Encoding:

A standard RTF file can only consist of 7-bit ASCII characters. There is no set maximum line length for an RTF file.

How RTF handles characters outside the ASCII range?

RTF uses escape sequences to encode other characters. The two character escapes are

  1. code page escapes
  2. Starting with RTF 1.5, Unicode escapes.

In a code page escape, two hexadecimal digits following a backslash and typewriter apostrophe denote a character taken from a Windows code page. For example, if the code page is set to Windows-1256, the sequence \'c8 will encode the Arabic letter bāʼ ب.

These days Unicode is the canonical method for exchanging text characters. It's more robust because there's no chance for the receiving application to misunderstand (or simply not implement) a particular text encoding. For that reason, NWP prefers to use Unicode when saving RTF files.

For further information, visit here.

RTF file structure:

This is one of the most important sections of our article. In this section, we will try to give a basic idea about the structure of the RTF file.The file format specifications are available by Microsoft for public download and can be referred to from developer’s perspective.

RTF Header:

Field:
<header>
Description:
Header tables must appear in this order if they exist.

\rtf1\fbidis? <character set> <from>? <deffont> <deflang> <fonttbl>? 
<filetbl>? <colortbl>? <stylesheet>? <stylerestrictions>? 
<listtables>? <revtbl>? <rsidtable>? <mathprops>? <generator>?

Properties:

  • If the font, file, style, color, revision mark, and summary-information groups and document-formatting properties are included in the file, they must appear in the RTF header, which precedes the RTF body.

RTF Version:

An RTF document must start out with these six characters:

{\rtf1

Where the 1 shows the RTF version number.

Character Set:

After the RTF version, we will declare the character set, which we will use in our document.These are some commands we write at the time of the Character Set declaration.

  • \ansi ► The document is in the ANSI character set, also known as Code Page 1252, the usual MSWindows character set.

  • \mac ► The document is in the MacAscii character set, the usual character set under old (pre-10) versions of Mac OS.

  • \pc ► The document is in DOS Code Page 437, the default character set for MS-DOS.

  • \pca ► The document is in DOS Code Page 850, also known as the MS-DOS Multilingual Code Page.

Note: 
We will declare the character set using any one of these commands, according 
to the operating system

Font Command:

 \deffN

Where N is the font number and \deffN is the default font for the document.The Character set definition is followed by the \deffN command. This command is technically optional.

Let's see what we've been up to so far, with an example.

 {\rtf1\ansi\deff0

It refers to an RTF document whose version number is 1, and it uses the default font with font number 0. The document is in the ANSI character set.

Font Table:

In addition, we use a variety of fonts in our documents to make them more understandable, more user-friendly.
To keep track of all these fonts, we used in our document we can use a font table.

All the fonts that can be used in a document are listed in a font table 
where each font is represented by a font number. (Ex. \deff0)

Syntax:
The syntax for a font table is {\fonttbl //…declarations//…}, in which each declaration has this basic syntax:

{\fnumber\familycommand Fontname;}

A font can’t be used in a document until it is listed in the font table.

Implementation:
A font table with some font declarations:

{\fonttbl
{\f0\froman Times;}
{\f1\fswiss Arial;}
{\f2\fmodern Courier New;}
}

End: Every RTF document must end with a }

RTF111--2-

Libraries And Converters:


Can we able to change the RTF syntax to the formatting language according to our need?

Yes, we can.

  • The open-source script rtf2xml can partially convert RTF to XML.
  • GNU UnRTF is an open-source program to convert RTF into HTML, LaTeX, troff macros and other formats.
  • pyth is a Python library to create and convert documents in RTF,XHTML and PDF format.
  • Ruby RTF is a project to create Rich Text content via Ruby.
  • RaTFink is a library of Tcl routines, free software, to generate RTF output, and a Cost script to convert SGML to RTF.
  • RTF::Writer is a Perl module for generating RTF documents.
  • PHPRtfLite is an API enabling developers to create RTF documents with PHP.
  • Pandoc is an open source document converter with multiple output formats, including RTF.
  • RTFGen is a project to create RTF documents via pure PHP.
  • rtf.js is a JavaScript based library to render RTF documents in HTML.

The macOS command line tool textutil can convert files between rtf, rtfd, text, doc, docx, wordml, odt and webarchive formats.The editor Ted can also convert RTF files to HTML and PS format.

References: