Note: This document requires the installation of the fonts Georgia, Verdana and Andale Mono (code font) for proper viewing. These can be found at: http://sourceforge.net/project/showfiles.php?group_id=34153&release_id=105355

Revision 19—(August 23, 2003)

Finished Chapter 11, which is now going through review and copyediting. Modified a number of examples throughout the book so that they will compile with Linux g++ (basically fixing case-sensitive naming issues).

Revision 18—(August 2, 2003)

Chapter 5 is complete. Chapter 11 is updated and is near completion. Updated the front matter and index entries. Home stretch now.

Revision 17—(July 8, 2003)

Chapters 5 and 11 are 90% done!

Revision 16—(June 25, 2003)

Chapter 5 text is almost complete, but enough is added to justify a separate posting. The example programs for Chapter 11 are also fairly complete. Added a matrix multiplication example to the valarray material in chapter 7. Chapter 7 has been tech-edited. Many corrections due to comments from users have been integrated into the text (thanks!).

Revision 15—(March 1 ,2003)

Fixed an omission in C10:CuriousSingleton.cpp. Chapters 9 and 10 have been tech-edited.

Revision 14—(January ,2003)

Fixed a number of fuzzy explanations in response to reader feedback (thanks!). Chapter 9 has been copy-edited.

Revision 13—(December 31, 2002)

Updated the exercises for Chapter 7. Finished rewriting Chapter 9. Added a template variation of Singleton to chapter 10. Updated the build directives. Fixed lots of stuff. Chapters 5 and 11 still await rewrite.

Revision 12—(December 23, 2002)

Added material on Design Patterns as Chapter 10 (Concurrency will move to Chapter 11). Added exercises for Chapter 6. Here is the status of all chapters:

100% complete: 1-4, 6, 8

Copy-edited, waiting for tech edit: 7, 10

Incomplete: 5, 9, 11

Revision 11 (December 13, 2002) –

Chapter 7 has been updated. Chapter 6 has been copy-edited and a few bugs were fixed. Chapter 4 has been tech-edited. The exercises are still out of date except for chapters 1-3.

Revision 10 (October 15, 2002) –

Chapters 1 through 3 are now 100% complete (copy-edited and tech-edited). Chapter 4 has been copy-edited. Updated Chapter 6 to fit in its new position and adding introductory material. (Chapters 5 and 7-10 are still unfinished at this point).

Revision 9 (August 29, 2002) –

Finished Chapter 4 (IOStreams). Reordered the material and added material on wide stream and locales. Removed references to strstreams. Edited the “Iostreams examples” section. Added new exercises.

Revision 8 (August 6, 2002) --

Made ExtractCode.cpp in Chapter 3 work for GNU C++.

Copy-edited Chapters 1 through 3.

Revision 7 (July 31, 2002) --

Fixed omissions in comments for code extraction throughout text.

Edited Chapter 3:

Revision 6 (July 27, 2002) --

Finished Chapter 3 (Strings)

Revision 5 (July 20, 2002) --

Chapters 1 and 2 are “finished”.

Revision 4, August 19, 2001 --


“This book is a tremendous achievement. You owe it to yourself to have a copy on your shelf. The chapter on iostreams is the most comprehensive and understandable treatment of that subject I’ve seen to date.”

Al Stevens
Contributing Editor, Doctor Dobbs Journal

“Eckel’s book is the only one to so clearly explain how to rethink program construction for object orientation. That the book is also an excellent tutorial on the ins and outs of C++ is an added bonus.”

Andrew Binstock
Editor, Unix Review

“Bruce continues to amaze me with his insight into C++, and Thinking in C++ is his best collection of ideas yet. If you want clear answers to difficult questions about C++, buy this outstanding book.”

Gary Entsminger
Author, The Tao of Objects

Thinking in C++ patiently and methodically explores the issues of when and how to use inlines, references, operator overloading, inheritance and dynamic objects, as well as advanced topics such as the proper use of templates, exceptions and multiple inheritance. The entire effort is woven in a fabric that includes Eckel’s own philosophy of object and program design. A must for every C++ developer’s bookshelf, Thinking in C++ is the one C++ book you must have if you’re doing serious development with C++.”

Richard Hale Shaw
Contributing Editor, PC Magazine



 

Thinking

In

C++

Volume 2: Practical Programming

Bruce Eckel, President, MindView, Inc.
Chuck Allison, Utah Valley State College

 


 

©2004 MindView, Inc.

The information in this book is distributed on an “as is” basis, without warranty. While every precaution has been taken in the preparation of this book, neither the author nor the publisher shall have any liability to any person or entitle with respect to any liability, loss or damage caused or alleged to be caused directly or indirectly by instructions contained in this book or by the computer software or hardware products described herein.

All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means including information storage and retrieval systems without permission in writing from the publisher or authors, except by a reviewer who may quote brief passages in a review. Any of the names used in the examples and text of this book are fictional; any relationship to persons living or dead or to fictional characters in other works is purely coincidental.


Dedication

To all those who have worked tirelessly
on the development of the C++ language



 

What’s inside...


Preface                                                       19

Goals........................................ 19

Chapters................................... 20

Exercises................................... 23

Exercise solutions............. 23

Source code............................... 23

Language standards................... 25

Language support............. 25

Seminars, CD-ROMs & consulting. 25

Errors....................................... 26

About the cover......................... 26

Acknowledgements..................... 26

Building Stable Systems                             29

1: Exception handling                                31

Traditional error handling............. 32

Throwing an exception................ 34

Catching an exception................. 36

The try block.................... 36

Exception handlers........... 36

Exception matching..................... 39

Catching any exception..... 42

Re-throwing an exception.. 42

Uncaught exceptions......... 43

Cleaning up................................ 45

Resource management..... 47

Making everything an object 49

auto_ptr.......................... 52

Function-level try blocks.... 53

Standard exceptions................... 55

Exception specifications............... 58

Better exception specifications?          64

Exception specifications and inheritance           64

When not to use exception specifications         66

Exception safety......................... 66

Programming with exceptions....... 71

When to avoid exceptions.. 71

Typical uses of exceptions. 73

Overhead................................... 77

Summary................................... 79

Exercises................................... 80

2: Defensive programming                         83

Assertions................................. 86

A simple unit test framework........ 90

Automated testing............ 92

The TestSuite Framework.. 97

Test suites..................... 101

The test framework code. 102

Debugging techniques............... 110

Trace macros................. 110

Trace file....................... 111

Finding memory leaks..... 112

Summary................................. 119

Exercises................................. 120

The Standard C++ Library                        125

3: Strings in depth                                   127

What’s in a string?.................... 128

Creating and initializing C++ strings 130

Operating on strings................. 133

Appending, inserting,  and concatenating strings           134

Replacing string characters 136

Concatenation using  nonmember overloaded operators           141

Searching in strings.................. 142

Finding in reverse........... 147

Finding first/last of a set of characters 148

Removing characters from strings       150

Comparing strings.......... 153

Strings and character traits 157

A string application................... 164

Summary................................. 170

Exercises................................. 171

4: Iostreams                                             172

Why iostreams?........................ 172

Iostreams to the rescue............ 177

Inserters and extractors.. 177

Common usage.............. 183

Line-oriented input.......... 185

Handling stream errors.............. 187

File iostreams........................... 190

A File-Processing Example 192

Open modes................... 194

Iostream buffering.................... 195

Seeking in iostreams................. 198

String iostreams....................... 202

Input string streams........ 203

Output string streams...... 205

Output stream formatting.......... 209

Format flags................... 209

Format fields.................. 211

Width, fill, and precision.. 213

An exhaustive example... 214

Manipulators............................. 218

Manipulators with arguments 219

Creating manipulators..... 223

Effectors........................ 224

Iostream examples.................... 227

Maintaining class library source code   227

Detecting compiler errors 232

A simple datalogger........ 235

Internationalization................... 240

Wide Streams................. 240

Locales.......................... 243

Summary................................. 246

Exercises................................. 246

5: Templates in depth                              251

Template parameters................. 251

Non-type template parameters           252

Default template arguments 254

Template template parameters           256

The typename keyword... 263

Using the template keyword as a hint  266

Member Templates......... 268

Function template issues........... 271

Type deduction of function template arguments            271

Function template overloading            276

Taking the address of a generated function template    277

Applying a function to an STL sequence           281

Partial ordering of function templates  285

Template specialization............... 286

Explicit specialization....... 287

Partial Specialization....... 289

A practical example........ 291

Preventing template code bloat           295

Name lookup issues.................. 300

Names in templates........ 300

Templates and friends..... 306

Template programming idioms.... 312

Traits............................. 312

Policies.......................... 318

The curiously recurring template pattern          323

Template metaprogramming....... 326

Compile-time programming 327

Expression templates...... 337

Template compilation models...... 344

The inclusion model........ 344

Explicit instantiation........ 345

The separation model...... 348

Summary................................. 350

Exercises................................. 351

6: Generic algorithms                               355

A first look............................... 355

Predicates...................... 359

Stream iterators............. 361

Algorithm complexity...... 364

Function objects....................... 365

Classification of function objects         367

Automatic creation of function objects 368

Adaptable function objects 372

More function object examples            374

Function pointer adapters 383

Writing your own function object adapters        390

A catalog of STL algorithms....... 394

Support tools for example creation      397

Filling and generating...... 401

Counting........................ 403

Manipulating sequences... 404

Searching and replacing.. 410

Comparing ranges.......... 419

Removing elements........ 423

Sorting and operations on sorted ranges          427

Heap operations............. 438

Applying an operation to each element in a range         440

Numeric algorithms......... 449

General utilities.............. 453

Creating your own STL-style algorithms  455

Summary................................. 457

Exercises................................. 457

7: Generic containers                               465

Containers and iterators............ 465

STL reference documentation 467

A first look............................... 468

Containers of strings....... 474

Inheriting from STL containers            476

A plethora of iterators............... 479

Iterators in reversible containers        481

Iterator categories.......... 482

Predefined iterators........ 485

The basic sequences:  vector, list, deque 491

Basic sequence operations 491

vector............................ 495

deque............................ 502

Converting between sequences           505

Checked random-access. 508

list................................. 509

Swapping basic sequences 516

set.......................................... 517

A completely reusable tokenizer         520

stack....................................... 526

queue...................................... 530

Priority queues......................... 535

Holding bits.............................. 545

bitset<n>....................... 546

vector<bool>................. 551

Associative containers............... 552

Generators and fillers for associative containers           558

The magic of maps......... 561

Multimaps and duplicate keys 563

Multisets........................ 567

Combining STL containers.......... 571

Cleaning up  containers of pointers 574

Creating your own containers..... 576

STL extensions......................... 579

Non-STL containers.................. 581

Summary................................. 586

Exercises................................. 587

Special Topics                                          591

8: Runtime type identification                  593

Runtime casts.......................... 593

The typeid operator.................. 599

Casting to intermediate levels 602

void pointers.................. 603

Using RTTI with templates 604

Multiple inheritance.................... 605

Sensible uses for RTTI............... 606

A trash recycler.............. 607

Mechanism and overhead of RTTI 612

Summary................................. 613

Exercises................................. 614

9: Multiple inheritance                             615

Perspective............................... 615

Interface inheritance.................. 617

Implementation inheritance........ 621

Duplicate subobjects................. 628

Virtual base classes................... 633

Name lookup issues.................. 643

Avoiding MI.............................. 647

Extending an interface............... 648

Summary................................. 653

Exercises................................. 653

10: Design patterns                                  655

The pattern concept.................. 655

The singleton........................... 657

Variations on singleton.... 658

Classifying patterns................... 664

Features, idioms, patterns 664

Building complex objects........... 665

Factories: encapsulating object creation  673

Polymorphic factories...... 676

Abstract factories............ 680

Virtual constructors......... 683

Observer.................................. 690

The “inner class” idiom.... 693

The observer example.... 697

Multiple dispatching................... 701

Multiple dispatching with Visitor           705

Exercises................................. 708

11: Concurrency                                       710

Motivation................................ 711

Concurrency in C++.................. 712

Installing ZThreads......... 713

Defining Tasks.......................... 715

Using Threads.......................... 717

Creating responsive user interfaces    719

Simplifying with Executors 721

Yielding.......................... 725

Sleeping........................ 726

Priority.......................... 728

Sharing limited resources........... 730

Ensuring the existence of objects        731

Improperly accessing resources          735

Controlling access........... 738

Simplified coding with Guards 740

Thread local storage....... 744

Terminating tasks...................... 747

Preventing iostream collision 747

The Ornamental Garden.. 748

Terminating when blocked 754

Interruption.................... 755

Cooperation between threads..... 761

Wait and signal............... 762

Producer-Consumer relationships        767

Solving Threading problems with Queues         770

Broadcast...................... 777

Deadlock.................................. 784

Summary................................. 791

Exercises................................. 793

A: Recommended reading                         797

General C++............................ 797

Bruce’s books................. 798

Chuck’s books................ 798

In-depth C++........................... 798

Design Patterns........................ 800

B: Etc                                                       801

Index                                                       809

 


Preface

In Volume 1 of this book, you learn the fundamentals of C and C++. In this volume, we look at more advanced features, with an eye towards developing techniques and ideas that produce robust C++ programs.

Thus, in this volume we are assuming that you are familiar with the material developed in Volume 1. Comment

Goals

Our goals in this book are to: Comment

1.               Present the material a simple step at a time, so the reader can easily digest each concept before moving on.

2.               Teach “practical programming” techniques that you can use on a day-to-day basis.

3.               Give you what we think is important for you to understand about the language, rather than everything we know. We believe there is an “information importance hierarchy,” and there are some facts that 95% of programmers will never need to know, but that would just confuse people and add to their perception of the complexity of the language. To take an example from C, if you memorize the operator precedence table (we never did) you can write clever code. But if you have to think about it, it will confuse the reader/maintainer of that code. So forget about precedence, and use parentheses when things aren’t clear. This same attitude will be taken with some information in the C++ language, which is more important for compiler writers than for programmers.

4.               Keep each section focused enough so the lecture time—and the time between exercise periods—is small. Not only does this keep the audience’ minds more active and involved during a hands-on seminar, but it gives the reader a greater sense of accomplishment.

5.               We have endeavored not to use any particular vendor’s version of C++. We have tested the code on all the implementations we could, and when one implementation absolutely refused to work because it doesn’t conform to the C++ Standard, we’ve flagged that fact in the example (you’ll see the flags in the source code) to exclude it from the build process.

6.               Automate the compiling and testing of the code in the book. We have discovered that code that isn’t compiled and tested is probably broken, so in this volume we’ve instrumented the examples with test code. In addition, the code that you can download from http://www.MindView.net has been extracted directly from the text of the book using programs that also automatically create makefiles to compile and run the tests. This way we know that the code in the book is correct.

Chapters

Here is a brief description of the chapters contained in this book:

Part 1: Building Stable Systems

1. Exception handling. Error handling has always been a problem in programming. Even if you dutifully return error information or set a flag, the function caller may simply ignore it. Exception handling is a primary feature in C++ that solves this problem by allowing you to “throw” an object out of your function when a critical error happens. You throw different types of objects for different errors, and the function caller “catches” these objects in separate error handling routines. If you throw an exception, it cannot be ignored, so you can guarantee that something will happen in response to your error. The decision to use exceptions (a good one!) affects code design in fundamental ways. Comment

2. Defensive Programming. Many software problems can be prevented. To program defensively is to craft code in such a way that bugs can be found and fixed early before they have a chance to do damage in the field. The use of assertions is the single most important thing you can do to validate your code during development, while at the same time leaving an executable documentation trail in your code that reveals what you were thinking when you wrote the code in the first place. Before you let your code out of your hands it should be rigorously tested. A framework for automated unit testing is an indispensable tool for successful, everyday software development.

Part 2: The Standard C++ Library

3. Strings in Depth. Text processing is the most common programming activity by far. The C++ string class relieves the programmer from memory management issues, while at the same time delivering a powerhouse of text processing capability. C++ also supports the use of wide characters and locales for internationalized applications.

 4. Iostreams. One of the original C++ libraries—the one that provides the essential I/O facility—is called iostreams. Iostreams is intended to replace C’s stdio.h with an I/O library that is easier to use, more flexible, and extensible—you can adapt it to work with your new classes. This chapter teaches you the ins and outs of how to make the best use of the existing iostream library for standard I/O, file I/O, and in-memory formatting. Comment

5. Templates in Depth. The distinguishing feature of “modern C++” is the broad power of templates. Templates are for more than just generic containers; they support development of robust, generic, high-performance libraries. There is a lot to know about templates—they constitute, as it were, a sub-language within the C++ language, and give the programmer an impressive degree of control over the compilation process. It is not an understatement to say that templates have revolutionized C++ programming.

6. Generic Algorithms. Algorithms are at the core of computing, and C++, through its template facility, supports an impressive entourage of powerful, efficient, and easy-to-use generic algorithms. The standard algorithms are also customizable through function objects. This chapter looks at every algorithm in the library. (Chapters 6 and 7 cover that portion of the standard C++ library commonly-known as the Standard Template Library, or STL.)

7. Generic Containers & Iterators. C++ supports all the common data structures known to man in a type-safe manner. You never have to worry about what such a container holds; the homogeneity of its objects is guaranteed. Separating the traversing of a container from the container itself, another accomplishment of templates, is made possible through iterators. This ingenious arrangement allows a flexible application of algorithms to containers by means of the simplest of designs.

Part 3: Special Topics

8. Run-time type identification. Run-time type identification (RTTI) lets you find the exact type of an object when you only have a pointer or reference to the base type. Normally, you’ll want to intentionally ignore the exact type of an object and let the virtual function mechanism implement the correct behavior for that type. But occasionally (like when writing software tools such as debuggers) it is helpful to know the exact type of an object for which you only have a base pointer; often this information allows you to perform a special-case operation more efficiently. This chapter explains what RTTI is for and how to use it. Comment

9. Multiple inheritance. This sounds simple at first: A new class is inherited from more than one existing class. However, you can end up with ambiguities and multiple copies of base-class objects. That problem is solved with virtual base classes, but the bigger issue remains: When do you use it? Multiple inheritance is only essential when you need to manipulate an object through more than one common base class. This chapter explains the syntax for multiple inheritance, and shows alternative approaches—in particular, how templates solve one common problem. The use of multiple inheritance to repair a “damaged” class interface is demonstrated as a genuinely valuable use of this feature. Comment

10. Design Patterns. The most revolutionary advance in programming since objects is the introduction of design patterns. A design pattern is a language-independent codification of a solution to a common programming problem, expressed in such a way that it can apply to many contexts. Patterns such as Singleton, Factory Method, and Visitor now find their way into daily discussions around the keyboard. This chapter shows how to implement and use some of the more useful design patterns in C++.

11. Concurrent Programming. Users have long been used to responsive user interfaces that (seem to) process multiple tasks simultaneously. Modern operating systems allow processes to have multiple threads that share the process address space. Multi-threaded programming requires a different mindset, however, and comes with its own set of “gotchas.” This chapter uses a freely available library (the ZThread library by Eric Crahen of IBM) to show how to effectively manage multi-threaded applications in C++.

Exercises

We have discovered that simple exercises are exceptionally useful during a seminar to complete a student’s understanding, so you’ll find a set at the end of each chapter. Comment

These are fairly simple, so they can be finished in a reasonable amount of time in a classroom situation while the instructor observes, making sure all the students are absorbing the material. Some exercises are a bit more challenging to keep advanced students entertained. They’re all designed to be solved in a short time and are only there to test and polish your knowledge rather than present major challenges (presumably, you’ll find those on your own—or more likely they’ll find you).

Exercise solutions

Solutions to exercises can be found in the electronic document The C++ Annotated Solution Guide, Volume 2, available for a nominal fee from www.MindView.net. Comment

Source code

The source code for this book is copyrighted freeware, distributed via the web site http://www.MindView.net. The copyright prevents you from republishing the code in print media without permission. Comment

In the starting directory where you unpacked the code you will find the following copyright notice: Comment

//:! :CopyRight.txt

Copyright (c) MindView, Inc., 2003

Source code file from the book

"Thinking in C++, 2nd Edition, Volume 2."

All rights reserved EXCEPT as allowed by the

following statements: You can freely use this file

for your own work (personal or commercial),

including modifications and distribution in

executable form only. Permission is granted to use

this file in classroom situations, including its

use in presentation materials, as long as the book

"Thinking in C++" is cited as the source.

Except in classroom situations, you cannot copy

and distribute this code; instead, the sole

distribution point is http://www.MindView.net

(and official mirror sites) where it is

freely available. You cannot remove this

copyright and notice. You cannot distribute

modified versions of the source code in this

package. You cannot use this file in printed

media without the express permission of the

author. The authors makes no representation about

the suitability of this software for any purpose.

It is provided "as is" without express or implied

warranty of any kind, including any implied

warranty of merchantability, fitness for a

particular purpose or non-infringement. The entire

risk as to the quality and performance of the

software is with you. The authors and publisher shall not be liable for any damages suffered by you or any third party as a result of using or distributing software. In no event will the authors or the publisher be liable for any

lost revenue, profit, or data, or for direct,

indirect, special, consequential, incidental, or

punitive damages, however caused and regardless of

the theory of liability, arising out of the use of

or inability to use software, even if Bruce Eckel

and the publisher have been advised of the

possibility of such damages. Should the software

prove defective, you assume the cost of all

necessary servicing, repair, or correction. If you

think you've found an error, please submit the

correction using the form you will find at

www.MindView.net. (Please use the same

form for non-code errors found in the book.)

///:~

 

You may use the code in your projects and in the classroom as long as the copyright notice is retained. Comment

Language standards

Throughout this book, when referring to conformance to the ANSI/ISO C standard, we will be referring to the 1989 standard, and will generally just say ‘C.’ Only if it is necessary to distinguish between Standard 1989 C and older, pre-Standard versions of C will we make the distinction. We do not reference C99 in this book. Comment

As this book goes to press the ANSI/ISO C++ committee has long ago finished working on the first C++ standard, commonly known as C++98. We will use the term Standard C++ to refer to this standardized language. If we simply refer to C++, assume we mean “Standard C++.” The C++ standards committee continues to address issues important to the C++ community that will find expression in C++0x, a future C++ standard not likely to be available for many years. Comment

Language support

Your compiler may not support all the features discussed in this book, especially if you don’t have the newest version of your compiler. Implementing a language like C++ is a Herculean task, and you can expect that the features will appear in pieces rather than all at once. But if you attempt one of the examples in the book and get a lot of errors from the compiler, it’s not necessarily a bug in the code or the compiler—it may simply not be implemented in your particular compiler yet. On the Windows platform we have validated all examples with the C++ compiler found in Microsoft’s Visual Studio .NET 2003; Borland C++ Builder version 6; the GNU projects g++ compiler, version 3.2, running under Cygwin; and the Edison Design Group’s C++ front end using the Dinkumware full C++ library. We have also run all the examples on Mac OS X with Metrowerks C++ version 8. In those instances where a compiler does not support the feature required by a sample program, we have so indicated in comments in the source code. Comment

Seminars, CD-ROMs & consulting

Bruce Eckel’s company, MindView, Inc., provides public hands-on training seminars based on the material in this book, and also for advanced topics. Selected material from each chapter represents a lesson, which is followed by a monitored exercise period so each student receives personal attention. We also provide on-site training, consulting, mentoring, and design & code walkthroughs. Information and sign-up forms for upcoming seminars and other contact information can be found at http://www.MindView.net. Comment

Errors

No matter how many tricks a writer uses to detect errors, some always creep in and these often leap off the page for a fresh reader. If you discover anything you believe to be an error, please use the feedback system built into the electronic version of this book, which you will find at http://www.MindView.net. The feedback system uses unique identifiers on the paragraphs in the book, so click on the identifier next to the paragraph that you wish to comment on. Your help is appreciated. Comment

About the cover

The cover artwork was painted by Larry O’Brien’s wife, Tina Jensen (yes, the Larry O’Brien who was the editor of Software Development Magazine for so many years). Not only are the pictures beautiful, but they are excellent suggestions of polymorphism. The idea for using these images came from Daniel Will-Harris, the cover designer (www.Will-Harris.com), working with Bruce Eckel.

Acknowledgements

Volume 2 of this book languished in a half-completed state for a long time while Bruce got distracted with other things, notably Java, Design Patterns and especially Python (see www.Python.org). If Chuck hadn’t been willing (foolishly, he has sometimes thought) to finish the other half and bring things up-to-date, this book almost certainly wouldn’t have happened. There aren’t that many people whom Bruce would have felt comfortable entrusting this book to. Chuck’s penchant for precision, correctness and clear explanation is what has made this book as good as it is.

Jamie King acted as an intern under Chuck’s direction during the completion of this book. He was instrumental in making sure the book got finished, not only by providing feedback for Chuck, but especially because of his relentless questioning and picking of every single possible nit that he didn’t completely understand. If your questions are answered by this book, it’s probably because Jamie asked them first. Jamie also enhanced a number of the sample programs and created many of the exercises at the end of each chapter.

Eric Crahen of IBM was instrumental in the completion of Chapter 11 (Concurrent Programming). When we were looking for a threads package, we sought out one that was intuitive and easy to use, while being sufficiently robust to do the job. With Eric we got that and then some—he was extremely cooperative and has used our feedback to enhance his library, while we have benefited from his insights as well.

We are grateful to have had Pete Becker as a technical editor. Few people are as articulate and discriminating as Pete, not to mention as expert in C++ and software development in general. We also thank Bjorn Karlsson for his gracious and timely technical assistance as he reviewed the entire manuscript with little notice.

The ideas and understanding in this book have come from many other sources, as well: friends like Andrea Provaglio, Dan Saks, Scott Meyers, Charles Petzold, and Michael Wilk; pioneers of the language like Bjarne Stroustrup, Andrew Koenig, and Rob Murray; members of the C++ Standards Committee like Nathan Myers (who was particularly helpful and generous with his insights), Herb Sutter, PJ Plauger, Pete Becker, Kevlin Henney, David Abrahams, Tom Plum, Reg Charney, Tom Penello, Sam Druker, and Uwe Steinmueller, John Spicer, Steve Adamczyk, and Daveed Vandevoorde; people who have spoken in the C++ track at the Software Development Conference (which Bruce created and developed, and Chuck spoke in); and often students in seminars, who ask the questions we need to hear to make the material clearer. Comment

The book design, cover design, and cover photo were created by Bruce’s friend Daniel Will-Harris, noted author and designer, who used to play with rub-on letters in junior high school while he awaited the invention of computers and desktop publishing. However, we produced the camera-ready pages ourselves, so the typesetting errors are ours. Microsoft® Word XP was used to write the book and to create camera-ready pages. The body typeface is Georgia and the headlines are in Verdana. The code type face is Andale Mono. Comment

We also wish to thank the generous professionals at the Edison Design Group and Dinkumware, Ltd., for giving us complimentary copies of their compiler and library (respectively). Without their assistance some of the examples in this book could not have been tested. We also wish to thank Howard Hinnant and the folks at Metrowerks for a copy of their compiler, and Sandy Smith and the folks at SlickEdit for keeping Chuck supplied with a world-class editing environment for so many years. Greg Comeau also provided a copy of his successful EDG-based compiler, Comeau C++.

A special thanks to all our teachers, and all our students (who are our teachers as well).

Evan Cofsky (Evan@TheUnixMan.com) provided all sorts of assistance on the server as well as development of programs in his now-favorite language, Python. Sharlynn Cobaugh and Paula Steuer were instrumental assistants, preventing Bruce from being washed away in a flood of projects.

Dawn McGee provided much-appreciated inspiration and enthusiasm during this project. The supporting cast of friends includes, but is not limited to: Mark Western, Gen Kiyooka, Kraig Brockschmidt, Zack Urlocker, Andrew Binstock, Neil Rubenking, Steve Sinofsky, JD Hildebrandt, Brian McElhinney, Brinkley Barr, Bill Gates at Midnight Engineering Magazine, Larry Constantine & Lucy Lockwood, Tom Keffer, Greg Perry, Dan Putterman, Christi Westphal, Gene Wang, Dave Mayer, David Intersimone, Claire Sawyers, The Italians (Andrea Provaglio, Laura Fallai, Marco Cantu, Michael Seaver, Huston Franklin, David Wagstaff, Corrado, Ilsa and Christina Giustozzi), Chris & Laura Strand, The Almquists, Brad Jerbic, John Kruth & Marilyn Cvitanic, Holly Payne (yes, the famous novelist!), Mark Mabry, The Robbins Families, The Moelter Families (& the McMillans), The Wilks, Dave Stoner, Laurie Adams, The Cranstons, Larry Fogg, Mike & Karen Sequeira, Gary Entsminger & Allison Brody, Chester Andersen, Joe Lordi, Dave & Brenda Bartlett, The Rentschlers, The Sudeks, Lynn & Todd, and their families. And of course, Mom & Dad, Sandy, James & Natalie, Kim& Jared, Isaac, and Abbi.


Text Box: 	Part 1
Building Stable Systems

 

Software engineers spend about as much time validating code as they do creating it. Quality is or should be the goal of every programmer, and one can go a long way towards that goal by eliminating problems before they rear their ugly heads. In addition, software systems should be robust enough to behave reasonably in the presence of unforeseen environmental problems.

Exception handling was introduced into C++ to support sophisticated error handling without cluttering code with an inordinate amount of error-handling logic. Chapter 1 shows how proper use of exceptions can make for well-behaved software, and also introduces the design principles that underlie exception-safe code. In Chapter 2 we cover unit testing and debugging techniques intended to maximize code quality long before it’s released. The use of assertions to express and enforce program invariants is a sure sign of an experienced software engineer. We also introduce a simple framework to help mitigate the tedium of unit testing.


1: Exception handling

Improving error recovery is one of the most powerful ways you can increase the robustness of your code.

Unfortunately, it’s almost accepted practice to ignore error conditions, as if we’re in a state of denial about errors. One reason, no doubt, is the tediousness and code bloat of checking for many errors. For example, printf( ) returns the number of characters that were successfully printed, but virtually no one checks this value. The proliferation of code alone would be disgusting, not to mention the difficulty it would add in reading the code. Comment

The problem with C’s approach to error handling could be thought of as coupling—the user of a function must tie the error-handling code so closely to that function that it becomes too ungainly and awkward to use. Comment

One of the major features in C++ is exception handling, which is a better way of thinking about and handling errors. With exception handling the following statements apply: Comment

1.               Error-handling code is not nearly so tedious to write, and it doesn't become mixed up with your "normal" code. You write the code you want to happen; later in a separate section you write the code to cope with the problems. If you make multiple calls to a function, you handle the errors from that function once, in one place.

2.               Errors cannot be ignored. If a function needs to send an error message to the caller of that function, it “throws” an object representing that error out of the function. If the caller doesn’t “catch” the error and handle it, it goes to the next enclosing dynamic scope, and so on until the error is either caught or the program terminates because there was no handler to catch that type of exception.

This chapter examines C’s approach to error handling (such as it is), discusses why it did not work well for C, and explains why it won’t work at all for C++. This chapter also covers try, throw, and catch, the C++ keywords that support exception handling. Comment

Traditional error handling

In most of the examples in these volumes, we use assert( ) as it was intended: for debugging during development with code that can be disabled with #define NDEBUG for the shipping product. Runtime error checking uses the require.h functions (assure( ) and require( )) developed in Chapter 9 in Volume 1. These functions are a convenient way to say, “There’s a problem here you’ll probably want to handle with some more sophisticated code, but you don’t need to be distracted by it in this example.” The require.h functions might be enough for small programs, but for complicated products you might need to write more sophisticated error-handling code. Comment

Error handling is quite straightforward in situations in which you know exactly what to do because you have all the necessary information in that context. Of course, you just handle the error at that point. Comment

The problem occurs when you don’t have enough information in that context, and you need to pass the error information into a different context where that information does exist. In C, you can handle this situation using three approaches: Comment

1.               Return error information from the function or, if the return value cannot be used this way, set a global error condition flag. (Standard C provides errno and perror( ) to support this.) As mentioned earlier, the programmer is likely to ignore the error information because tedious and obfuscating error checking must occur with each function call. In addition, returning from a function that hits an exceptional condition might not make sense.

2.               Use the little-known Standard C library signal-handling system, implemented with the signal( ) function (to determine what happens when the event occurs) and raise( ) (to generate an event). Again, this approach involves high coupling because it requires the user of any library that generates signals to understand and install the appropriate signal-handling mechanism; also in large projects the signal numbers from different libraries might clash.

3.               Use the nonlocal goto functions in the Standard C library: setjmp( ) and longjmp( ). With setjmp( ) you save a known good state in the program, and if you get into trouble, longjmp( ) will restore that state. Again, there is high coupling between the place where the state is stored and the place where the error occurs.

When considering error-handling schemes with C++, there’s an additional critical problem: The C techniques of signals and setjmp( )/longjmp( ) do not call destructors, so objects aren’t properly cleaned up. (In fact, if longjmp( ) jumps past the end of a scope where destructors should be called, the behavior of the program is undefined.) This makes it virtually impossible to effectively recover from an exceptional condition because you’ll always leave objects behind that haven’t been cleaned up and that can no longer be accessed. The following example demonstrates this with setjmp/longjmp: Comment

//: C01:Nonlocal.cpp

// setjmp() & longjmp()

#include <iostream>

#include <csetjmp>

using namespace std;

 

class Rainbow {

public:

  Rainbow() { cout << "Rainbow()" << endl; }

  ~Rainbow() { cout << "~Rainbow()" << endl; }

};

 

jmp_buf kansas;

 

void oz() {

  Rainbow rb;

  for(int i = 0; i < 3; i++)

    cout << "there's no place like home\n";

  longjmp(kansas, 47);

}

 

int main() {

  if(setjmp(kansas) == 0) {

    cout << "tornado, witch, munchkins...\n";

    oz();

  } else {

    cout << "Auntie Em! "

         << "I had the strangest dream..."

         << endl;

  }

} ///:~

 

The setjmp( ) function is odd because if you call it directly, it stores all the relevant information about the current processor state (such as the contents of the instruction pointer and runtime stack pointer) in the jmp_buf and returns zero. In this case it behaves like an ordinary function. However, if you call longjmp( ) using the same jmp_buf, it’s as if you’re returning from setjmp( ) again—you pop right out the back end of the setjmp( ). This time, the value returned is the second argument to longjmp( ), so you can detect that you’re actually coming back from a longjmp( ). You can imagine that with many different jmp_bufs, you could pop around to many different places in the program. The difference between a local goto (with a label) and this nonlocal goto is that you can return to any pre-determined location higher up in the runtime stack with setjmp( )/longjmp( ) (wherever you’ve placed a call to setjmp( )). Comment

The problem in C++ is that longjmp( ) doesn’t respect objects; in particular it doesn’t call destructors when it jumps out of a scope.[1] Destructor calls are essential, so this approach won’t work with C++. In fact, the C++ standard states that branching into a scope with goto (effectively bypassing constructor calls), or branching out of a scope with longjmp( ) where an object on the stack has a destructor, constitutes undefined behavior. Comment

Throwing an exception

If you encounter an exceptional situation in your code—that is, one in which you don’t have enough information in the current context to decide what to do—you can send information about the error into a larger context by creating an object that contains that information and “throwing” it out of your current context. This is called throwing an exception. Here’s what it looks like: Comment

//: C01:MyError.cpp

class MyError {

   const char* const data;

public:

   MyError(const char* const msg = 0) : data (msg) {}

};

 

void f() {

   // Here we "throw" an exception object:

   throw MyError("something bad happened");

}

 

int main() {

   // As you’ll see shortly,

   // we’ll want a "try block" here:

   f();

} ///:~

 

MyError is an ordinary class, which in this case takes a char* as a constructor argument. You can use any type when you throw (including built-in types), but usually you’ll create special classes for throwing exceptions. Comment

The keyword throw causes a number of relatively magical things to happen. First, it creates a copy of the object you’re throwing and, in effect, “returns” it from the function containing the throw expression, even though that object type isn’t normally what the function is designed to return. A naive way to think about exception handling is as an alternate return mechanism (although you find you can get into trouble if you take the analogy too far). You can also exit from ordinary scopes by throwing an exception. In any case, a value is returned, and the function or scope exits. Comment

Any similarity to function returns ends there because where you return is some place completely different from where a normal function call returns. (You end up in an appropriate part of the code—called an exception handler—that might be far removed from where the exception was thrown.) In addition, any local objects created by the time the exception occurs are destroyed. This automatic cleanup of local objects is often called “stack unwinding.” Comment

In addition, you can throw as many different types of objects as you want. Typically, you’ll throw a different type for each category of error. The idea is to store the information in the object and in the name of its class so that someone in a calling context can figure out what to do with your exception. Comment

Catching an exception

As mentioned earlier, one of the advantages of C++ exception handling is that it allows you to concentrate on the problem you’re actually trying to solve in one place, and then deal with the errors from that code in another place. Comment

The try block

If you’re inside a function and you throw an exception (or a called function throws an exception), the function exits in the process of throwing. If you don’t want a throw to leave a function, you can set up a special block within the function where you try to solve your actual programming problem (and potentially generate exceptions). This block is called the try block because you try your various function calls there. The try block is an ordinary scope, preceded by the keyword try: Comment

try {

  // Code that may generate exceptions

}

 

If you check for errors by carefully examining the return codes from the functions you use, you need to surround every function call with setup and test code, even if you call the same function several times. With exception handling, you put everything in a try block and handle exceptions after the try block. Thus, your code is a lot easier to write and easier to read because the goal of the code is not confused with the error checking. Comment

Exception handlers

Of course, the thrown exception must end up some place. This place is the exception handler, and you need one exception handler for every exception type you want to catch. Exception handlers immediately follow the try block and are denoted by the keyword catch: Comment

try {

  // Code that may generate exceptions

} catch(type1 id1) {

  // Handle exceptions of type1

} catch(type2 id2) {

  // Handle exceptions of type2

} catch(type3 id3)

  // Etc...

} catch(typeN idN)

  // Handle exceptions of typeN

}

// Normal execution resumes here...

 

The syntax of a catch clause resembles functions that take a single argument. The identifier (id1, id2, and so on) can be used inside the handler, just like a function argument, although you can omit the identifier if it’s not needed in the handler. The exception type usually gives you enough information to deal with it. Comment

The handlers must appear directly after the try block. If an exception is thrown, the exception-handling mechanism goes hunting for the first handler with an argument that matches the type of the exception. It then enters that catch clause, and the exception is considered handled. (The search for handlers stops once the catch clause is found.) Only the matching catch clause executes; control then resumes after the last handler associated with that try block. Comment

Notice that, within the try block, a number of different function calls might generate the same type of exception, but you need only one handler. Comment

To illustrate using try and catch, the following variation of Nonlocal.cpp replaces the call to setjmp( ) with a try block and replaces the call to longjmp( ) with a throw statement. Comment

//: C01:Nonlocal2.cpp

// Illustrates exceptions

#include <iostream>

using namespace std;

 

class Rainbow {

public:

  Rainbow() { cout << "Rainbow()" << endl; }

  ~Rainbow() { cout << "~Rainbow()" << endl; }

};

 

void oz() {

  Rainbow rb;

  for(int i = 0; i < 3; i++)

    cout << "there's no place like home\n";

  throw 47;

}

 

int main() {

  try {

    cout << "tornado, witch, munchkins...\n";

    oz();

  }

  catch (int) {

    cout << "Auntie Em! "

         << "I had the strangest dream..."

         << endl;

  }

} ///:~

 

When the throw statement in oz( ) executes, program control backtracks until it finds the catch clause that takes an int parameter, at which point execution resumes with the body of that catch clause. The most important difference between this program and Nonlocal.cpp is that the destructor for the object rb is called when the throw statement causes execution to leave the function oz( ). Comment

There are two basic models in exception-handling theory: termination and resumption. In termination (which is what C++ supports), you assume the error is so critical that there’s no way to automatically resume execution at the point where the exception occurred. In other words, “whoever” threw the exception decided there was no way to salvage the situation, and they don’t want to come back. Comment

The alternative error-handling model is called resumption, first introduced with the PL/I language in the 1960s.[2] Using resumption semantics means that the exception handler is expected to do something to rectify the situation, and then the faulting code is automatically retried, presuming success the second time. If you want resumption in C++, you must explicitly transfer execution back to the code where the error occurred, usually by repeating the function call that sent you there in the first place. It is not unusual, therefore, to place your try block inside a while loop that keeps reentering the try block until the result is satisfactory. Comment

Historically, programmers using operating systems that supported resumptive exception handling eventually ended up using termination-like code and skipping resumption. Although resumption sounds attractive at first, it seems it isn’t quite so useful in practice. One reason may be the distance that can occur between the exception and its handler; it is one thing to terminate to a handler that’s far away, but to jump to that handler and then back again may be too conceptually difficult for large systems on which the exception can be generated from many points. Comment

Exception matching

When an exception is thrown, the exception-handling system looks through the “nearest” handlers in the order they appear in the source code. When it finds a match, the exception is considered handled and no further searching occurs. Comment

Matching an exception doesn’t require a perfect correlation between the exception and its handler. An object or reference to a derived-class object will match a handler for the base class. (However, if the handler is for an object rather than a reference, the exception object is “sliced”— truncated to the base type — as it is passed to the handler; this does no damage but loses all the derived-type information.) For this reason, as well as to avoid making yet another copy of the exception object, it is always better to catch an exception by reference instead of by value.[3] If a pointer is thrown, the usual standard pointer conversions are used to match the exception. However, no automatic type conversions are used to convert from one exception type to another in the process of matching, for example: Comment

//: C01:Autoexcp.cpp

// No matching conversions

#include <iostream>

using namespace std;

 

class Except1 {};

class Except2 {

public:

  Except2(const Except1&) {}

};

 

void f() { throw Except1(); }

 

int main() {

  try { f();

  } catch (Except2&) {

    cout << "inside catch(Except2)" << endl;

  } catch (Except1&) {

    cout << "inside catch(Except1)" << endl;

  }

} ///:~

 

Even though you might think the first handler could be used by converting an Except1 object into an Except2 using the constructor conversion, the system will not perform such a conversion during exception handling, and you’ll end up at the Except1 handler. Comment

The following example shows how a base-class handler can catch a derived-class exception: Comment

//: C01:Basexcpt.cpp

// Exception hierarchies

#include <iostream>

using namespace std;

 

class X {

public:

  class Trouble {};

  class Small : public Trouble {};

  class Big : public Trouble {};

  void f() { throw Big(); }

};

 

int main() {

  X x;

  try {

    x.f();

  } catch(X::Trouble&) {

    cout << "caught Trouble" << endl;

  // Hidden by previous handler:

  } catch(X::Small&) {

    cout << "caught Small Trouble" << endl;

  } catch(X::Big&) {

    cout << "caught Big Trouble" << endl;

  }

} ///:~

 

Here, the exception-handling mechanism will always match a Trouble object, or anything that is a Trouble (through public inheritance),[4] to the first handler. That means the second and third handlers are never called because the first one captures them all. It makes more sense to catch the derived types first and put the base type at the end to catch anything less specific. Comment

Notice that these examples catch exceptions by reference, although for these classes it isn’t important because there are no additional members in the derived classes, and there are no argument identifiers in the handlers anyway. You’ll usually want to use reference arguments rather than value arguments in your handlers to avoid slicing off information. Comment

Catching any exception

Sometimes you want to create a handler that catches any type of exception. You do this using the ellipsis in the argument list: Comment

catch(...) {

  cout << "an exception was thrown" << endl;

}

 

An ellipsis catches any exception, so you’ll want to put it at the end of your list of handlers to avoid pre-empting any that follow it. Comment

Because the ellipsis gives you no possibility to have an argument, you can’t know anything about the exception or its type. It’s a “catchall.” Such a catch clause is often used to clean up some resources and then rethrow the exception. Comment

Re-throwing an exception

You usually want to re-throw an exception when you have some resource that needs to be released, such as a network connection or heap memory that needs to be deallocated. (See the section “Resource Management” later in this chapter for more detail). If an exception occurs, you don’t necessarily care what error caused the exception—you just want to close the connection you opened previously. After that, you’ll want to let some other context closer to the user (that is, higher up in the call chain) handle the exception. In this case the ellipsis specification is just what you want. You want to catch any exception, clean up your resource, and then re-throw the exception so that it can be handled elsewhere. You re-throw an exception by using throw with no argument inside a handler: Comment

catch(...) {

cout << "an exception was thrown" << endl;

// Deallocate your resource here, and then re-throw…

  throw;

}

 

Any further catch clauses for the same try block are still ignored—the throw causes the exception to go to the exception handlers in the next-higher context. In addition, everything about the exception object is preserved, so the handler at the higher context that catches the specific exception type can extract any information the object may contain. Comment

Uncaught exceptions

As we explained in the beginning of this chapter, exception handling is considered better than the traditional return-an-error-code technique because exceptions can’t be ignored. If none of the exception handlers following a particular try block matches an exception, that exception moves to the next-higher context, that is, the function or try block surrounding the try block that did not catch the exception. (The location of this try block is not always obvious at first glance, since it’s higher up in the call chain.) This process continues until, at some level, a handler matches the exception. At that point, the exception is considered “caught,” and no further searching occurs. Comment

The terminate( ) function

If no handler at any level catches the exception, the special library function terminate( ) (declared in the <exception> header) is automatically called. By default, terminate( ) calls the Standard C library function abort( ), which abruptly exits the program. On Unix systems, abort( ) also causes a core dump. When abort( ) is called, no calls to normal program termination functions occur, which means that destructors for global and static objects do not execute. The terminate( ) function also executes if a destructor for a local object throws an exception during stack unwinding (interrupting the exception that was in progress) or if a global or static object’s constructor or destructor throws an exception. In general, do not allow a destructor to throw an exception. Comment

The set_terminate( ) function

You can install your own terminate( ) function using the standard set_terminate( ) function, which returns a pointer to the terminate( ) function you are replacing (which will be the default library version the first time you call it), so you can restore it later if you want. Your custom terminate( ) must take no arguments and have a void return value. In addition, any terminate( ) handler you install must not return or throw an exception, but instead must execute some sort of program-termination logic. If terminate( ) is called, the problem is unrecoverable. Comment

The following example shows the use of set_terminate( ). Here, the return value is saved and restored so that the terminate( ) function can be used to help isolate the section of code in which the uncaught exception is occurring: Comment

//: C01:Terminator.cpp

// Use of set_terminate()

// Also shows uncaught exceptions

#include <exception>

#include <iostream>

#include <cstdlib>

using namespace std;

 

void terminator() {

  cout << "I'll be back!" << endl;

  exit(0);

}

 

void (*old_terminate)()

  = set_terminate(terminator);

 

class Botch {

public:

  class Fruit {};

  void f() {

    cout << "Botch::f()" << endl;

    throw Fruit();

  }

  ~Botch() { throw 'c'; }

};

 

int main() {

  try {

    Botch b;

    b.f();

  } catch(...) {

    cout << "inside catch(...)" << endl;

  }

} ///:~

 

The definition of old_terminate looks a bit confusing at first: it not only creates a pointer to a function, but it initializes that pointer to the return value of set_terminate( ). Even though you might be familiar with seeing a semicolon right after a pointer-to-function declaration, here it’s just another kind of variable and can be initialized when it is defined. Comment

The class Botch not only throws an exception inside f( ), but also in its destructor. As we explained earlier, this situation causes a call to terminate( ), as you can see in main( ). Even though the exception handler says catch(...), which would seem to catch everything and leave no cause for terminate( ) to be called, terminate( ) is called anyway. In the process of cleaning up the objects on the stack to handle one exception, the Botch destructor is called, and that generates a second exception, forcing a call to terminate( ). Thus, a destructor that throws an exception or causes one to be thrown is usually a sign of poor design or sloppy coding. Comment

Cleaning up

Part of the magic of exception handling is that you can pop from normal program flow into the appropriate exception handler. Doing so wouldn’t be useful, however, if things weren’t cleaned up properly as the exception was thrown. C++ exception handling guarantees that as you leave a scope, all objects in that scope whose constructors have been completed will have destructors called. Comment

Here’s an example that demonstrates that constructors that aren’t completed don’t have the associated destructors called. It also shows what happens when an exception is thrown in the middle of the creation of an array of objects: Comment

//: C01:Cleanup.cpp

// Exceptions clean up complete objects only

#include <iostream>

using namespace std;

 

class Trace {

  static int counter;

  int objid;

public:

  Trace() {

    objid = counter++;

    cout << "constructing Trace #" << objid << endl;

    if(objid == 3) throw 3;

  }

  ~Trace() {

    cout << "destructing Trace #" << objid << endl;

  }

};

 

int Trace::counter = 0;

 

int main() {

  try {

    Trace n1;

    // Throws exception:

    Trace array[5];

    Trace n2;  // won't get here

  } catch(int i) {

    cout << "caught " << i << endl;

  }

} ///:~

 

The class Trace keeps track of objects so that you can trace program progress. It keeps a count of the number of objects created with a static data member counter and tracks the number of the particular object with objid Comment

The main program creates a single object, n1 (objid 0), and then attempts to create an array of five Trace objects, but an exception is thrown before the third object is fully created. The object n2 is never created. You can see the results in the output of the program: Comment

constructing Trace #0

constructing Trace #1

constructing Trace #2

constructing Trace #3

destructing Trace #2

destructing Trace #1

destructing Trace #0

caught 3

 

Three array elements are successfully created, but in the middle of the constructor for the fourth element, an exception is thrown. Because the fourth construction in main( ) (for array[2]) never completes, only the destructors for objects array[1] and array[0] are called. Finally, object n1 is destroyed, but not object n2, because it was never created. Comment

Resource management

When writing code with exceptions, it’s particularly important that you always ask, “If an exception occurs, will my resources be properly cleaned up?” Most of the time you’re fairly safe, but in constructors there’s a particular problem: if an exception is thrown before a constructor is completed, the associated destructor will not be called for that object. Thus, you must be especially diligent while writing your constructor. Comment

The general difficulty is allocating resources in constructors. If an exception occurs in the constructor, the destructor doesn’t get a chance to deallocate the resource. This problem occurs most often with “naked” pointers. For example: Comment

//: C01:Rawp.cpp

// Naked pointers

#include <iostream>

using namespace std;

 

class Cat {

public:

  Cat() { cout << "Cat()" << endl; }

  ~Cat() { cout << "~Cat()" << endl; }

};

 

class Dog {

public:

  void* operator new(size_t sz) {

    cout << "allocating a Dog" << endl;

    throw 47;

  }

  void operator delete(void* p) {

    cout << "deallocating a Dog" << endl;

    ::operator delete(p);

  }

};

 

class UseResources {

  Cat* bp;

  Dog* op;

public:

  UseResources(int count = 1) {

    cout << "UseResources()" << endl;

    bp = new Cat[count];

    op = new Dog;

  }

  ~UseResources() {

    cout << "~UseResources()" << endl;

    delete [] bp; // Array delete

    delete op;

  }

};

 

int main() {

  try {

    UseResources ur(3);

  } catch(int) {

    cout << "inside handler" << endl;

  }

} ///:~

 

The output is the following: Comment

UseResources()

Cat()

Cat()

Cat()

allocating a Dog

inside handler

 

The UseResources constructor is entered, and the Cat constructor is successfully completed for the three array objects. However, inside Dog::operator new( ), an exception is thrown (to simulate an out-of-memory error). Suddenly, you end up inside the handler, without the UseResources destructor being called. This is correct because the UseResources constructor was unable to finish, but it also means the Cat objects that were successfully created on the heap were never destroyed. Comment

Making everything an object

To prevent such resource leaks, you must guard against these “raw” resource allocations in one of two ways:

·         You can catch exceptions inside the constructor and then release the resource.

·         You can place the allocations inside an object’s constructor, and you can place the deallocations inside an object’s destructor.

Using the latter approach, each allocation becomes atomic, by virtue of being part of the lifetime of a local object, and if it fails, the other resource allocation objects are properly cleaned up during stack unwinding. This technique is called Resource Acquisition Is Initialization (RAII for short) , because it equates resource control with object lifetime. Using templates is an excellent way to modify the previous example to achieve this: Comment

//: C01:Wrapped.cpp

// Safe, atomic pointers

#include <iostream>

using namespace std;

 

// Simplified. Yours may have other arguments.

template<class T, int sz = 1> class PWrap {

  T* ptr;

public:

  class RangeError {}; // Exception class

  PWrap() {

    ptr = new T[sz];

    cout << "PWrap constructor" << endl;

  }

  ~PWrap() {

    delete [] ptr;

    cout << "PWrap destructor" << endl;

  }

  T& operator[](int i) throw(RangeError) {

    if(i >= 0 && i < sz) return ptr[i];

    throw RangeError();

  }

};

 

class Cat {

public:

  Cat() { cout << "Cat()" << endl; }

  ~Cat() { cout << "~Cat()" << endl; }

  void g() {}

};

 

class Dog {

public:

  void* operator new[](size_t) {

    cout << "Allocating a Dog" << endl;

    throw 47;

  }

  void operator delete[](void* p) {

    cout << "Deallocating a Dog" << endl;

    ::operator delete[](p);

  }

};

 

class UseResources {

  PWrap<Cat, 3> cats;

  PWrap<Dog> dog;

public:

  UseResources() {

    cout << "UseResources()" << endl;

  }

  ~UseResources() {

    cout << "~UseResources()" << endl;

  }

  void f() { cats[1].g(); }

};

 

int main() {

  try {

    UseResources ur;

  } catch(int) {

    cout << "inside handler" << endl;

  } catch(...) {

    cout << "inside catch(...)" << endl;

  }

} ///:~

 

The difference is the use of the template to wrap the pointers and make them into objects. The constructors for these objects are called before the body of the UseResources constructor, and any of these constructors that complete before an exception is thrown will have their associated destructors called during stack unwinding. Comment

The PWrap template shows a more typical use of exceptions than you’ve seen so far: A nested class called RangeError is created to use in operator[ ] if its argument is out of range. Because operator[ ] returns a reference, it cannot return zero. (There are no null references.) This is a true exceptional condition—you don’t know what to do in the current context, and you can’t return an improbable value. In this example, RangeError is simple and assumes all the necessary information is in the class name, but you might also want to add a member that contains the value of the index, if that is useful. Comment

Now the output is: Comment

Cat()

Cat()

Cat()

PWrap constructor

allocating a Dog

~Cat()

~Cat()

~Cat()

PWrap destructor

inside handler

 

Again, the storage allocation for Dog throws an exception, but this time the array of Cat objects is properly cleaned up, so there is no memory leak. Comment

auto_ptr

Since dynamic memory is the most frequent resource used in a typical C++ program, the standard provides an RAII wrapper for pointers to heap memory that automatically frees the memory. The auto_ptr class template, defined in the <memory> header, has a constructor that takes a pointer to its generic type (whatever you use in your code). The auto_ptr class template also overloads the pointer operators * and -> to forward these operations to the original pointer the auto_ptr object is holding. You can, therefore, use the auto_ptr object as if it were a raw pointer. Here’s how it works: Comment

//: C01:Auto_ptr.cpp

// Illustrates the RAII nature of auto_ptr

#include <memory>

#include <iostream>

using namespace std;

 

class TraceHeap {

  int i;

public:

  static void* operator new(size_t siz) {

    void* p = ::operator new(siz);

    cout << "Allocating TraceHeap object on the heap "

         << "at address " << p << endl;

    return p;

  }

  static void operator delete(void* p) {

    cout << "Deleting TraceHeap object at address "

         << p << endl;

    ::operator delete(p);

  }

  TraceHeap(int i) : i(i) {}

  int getVal() const {

    return i;

  }

};

 

int main() {

  auto_ptr<TraceHeap> pMyObject(new TraceHeap(5));

  cout << pMyObject->getVal() << endl;  // prints 5

} ///:~

 

The TraceHeap class overloads the operator new and operator delete so you can see exactly what’s happening. Notice that, like any other class template, you specify the type you’re going to use in a template parameter. You don’t say TraceHeap*, however; auto_ptr already knows that it will be storing a pointer to your type. The second line of main( ) verifies that auto_ptr’s operator->( ) function applies the indirection to the original, underlying pointer. Most important, even though we didn’t explicitly delete the original pointer (in fact we can’t here, since we didn’t save its address in a variable anywhere), pMyObject’s destructor deletes the original pointer during stack unwinding, as the following output verifies: Comment

Allocating TraceHeap object on the heap at address 8930040

5

Deleting TraceHeap object at address 8930040

 

The auto_ptr class template is also handy for pointer data members. Since class objects contained by value are always destructed, auto_ptr members always delete the raw pointer they wrap when the containing object is destructed.[5]Comment

Function-level try blocks

Since constructors can routinely throw exceptions, you might want to handle exceptions that occur when an object’s member or base subobjects are initialized. To do this, you can place the initialization of such subobjects in a function-level try block. In a departure from the usual syntax, the try block for constructor initializers is the constructor body, and the associated catch block follows the body of the constructor, as in the following example. Comment

//: C01:InitExcept.cpp

// Handles exceptions from subobjects

//{-bor}

#include <iostream>

using namespace std;

 

class Base {

  int i;

public:

  class BaseExcept {};

  Base(int i) : i(i) {

    throw BaseExcept();

  }

};

 

class Derived : public Base {

public:

  class DerivedExcept {

    const char* msg;

  public:

    DerivedExcept(const char* msg) : msg(msg) {}

    const char* what() const {

      return msg;

    }

  };

  Derived(int j)

  try

    : Base(j) {

    // Constructor body

    cout << "This won't print\n";

  }

  catch (BaseExcept&) {

    throw DerivedExcept("Base subobject threw");;

  }

};

 

int main() {

  try {

    Derived d(3);

  }

  catch (Derived::DerivedExcept& d) {

    cout << d.what() << endl;  // "Base subobject threw"

  }

} ///:~

 

Notice that the initializer list in the constructor for Derived goes after the try keyword but before the constructor body. If an exception does indeed occur, the contained object is not constructed, so it makes no sense to return to the code that created it. For this reason, the only sensible thing to do is to throw an exception in the function-level catch clause. Comment

Although it is not terribly useful, C++ also allows function-level try blocks for any function, as the following example illustrates:

//: C01:FunctionTryBlock.cpp

// Function-level try blocks

//{-bor}

#include <iostream>

using namespace std;

 

int main() try {

  throw "main";

} catch(const char* msg) {

cout << msg << endl;

return 1;

} ///:~

 

In this case, the catch block can return in the same manner that the function body normally returns. Using this type of function-level try block isn’t much different from inserting a try-catch around the code inside of the function body. Comment

Standard exceptions

The set of exceptions used with the Standard C++ library is also available for your use. Generally it’s easier and faster to start with a standard exception class than to try to define your own. If the standard class doesn’t do exactly what you need, you can derive from it. Comment

All standard exception classes derive ultimately from the class exception, defined in the header <exception>. The two main derived classes are logic_error and runtime_error, which are found in <stdexcept> (which itself includes <exception>). The class logic_error represents errors in programming logic, such as passing an invalid argument. Runtime errors are those that occur as the result of unforeseen forces such as hardware failure or memory exhaustion. Both runtime_error and logic_error provide a constructor that takes a std::string argument so that you can store a message in the exception object and extract it later with exception::what( ) , as the following program illustrates. Comment

//: C01:StdExcept.cpp

// Derives an exception class from std::runtime_error

#include <stdexcept>

#include <iostream>

using namespace std;

 

class MyError : public runtime_error {

public:

  MyError(const string& msg = "") : runtime_error(msg) {}

};

 

int main() {

  try {

    throw MyError("my message");

  }

  catch (MyError& x) {

    cout << x.what() << endl;

  }

} ///:~

 

Although the runtime_error constructor passes the message up to its std::exception subobject to hold, std::exception does not provide a constructor that takes a std::string argument. Therefore, you usually want to derive your exception classes from either runtime_error or logic_error (or one of their derivatives), and not from std::exception. Comment

The following tables describe the standard exception classes.

exception

The base class for all the exceptions thrown by the C++ standard library. You can ask what( ) and retrieve the optional string with which the exception was initialized.

logic_error

Derived from exception. Reports program logic errors, which could presumably be detected by inspection.

runtime_error

Derived from exception. Reports runtime errors, which can presumably be detected only when the program executes.

 

The iostream exception class ios::failure is also derived from exception, but it has no further subclasses. Comment

You can use the classes in both of the following tables as they are, or you can use them as base classes from which to derive your own more specific types of exceptions. Comment

Exception classes derived from logic_error

domain_error

Reports violations of a precondition.

invalid_argument

Indicates an invalid argument to the function from which it’s thrown.

length_error

Indicates an attempt to produce an object whose length is greater than or equal to npos (the largest representable value of type size_t).

Out_of_range

Reports an out-of-range argument.

Bad_cast

Thrown for executing an invalid dynamic_cast expression in runtime type identification (see Chapter 8).

bad_typeid

Reports a null pointer p in an expression typeid(*p). (Again, a runtime type identification feature in Chapter 8).

Comment

 

Exception classes derived from runtime_error

range_error

Reports violation of a postcondition.

overflow_error

Reports an arithmetic overflow.

bad_alloc

Reports a failure to allocate storage.

Exception specifications

You’re not required to inform the people using your function what exceptions you might throw. Failure to do so can be considered uncivilized, however, because it means that users cannot be sure what code to write to catch all potential exceptions. Of course, if they have your source code, they can hunt through and look for throw statements, but often a library doesn’t come with sources. Good documentation can help alleviate this problem, but how many software projects are well documented? C++ provides syntax that allows you to tell the user what exceptions this function throws, so the user can handle them. This is the optional exception specification, which adorns a function’s declaration, appearing after the argument list. Comment

The exception specification reuses the keyword throw, followed by a parenthesized list of all the types of potential exceptions that the function can throw. Your function declaration might look like this: Comment

void f() throw(toobig, toosmall, divzero);

 

As far as exceptions are concerned, the traditional function declaration

void f();

 

means that any type of exception can be thrown from the function. If you say

void f() throw();

 

no exceptions whatsoever will be thrown from the function (so you’d better be sure that no functions farther down in the call chain let any exceptions propagate up!).

For good coding policy, good documentation, and ease-of-use for the function caller, always consider using exception specifications when you write functions that throw exceptions. (Exceptions to this guideline are discussed later in this chapter.)Comment

The unexpected( ) function

If your exception specification claims you’re going to throw a certain set of exceptions and then you throw something that isn’t in that set, what’s the penalty? The special function unexpected( ) is called when you throw something other than what appears in the exception specification. Should this unfortunate situation occur, the default implementation of unexpected calls the terminate( ) function mentioned earlier in this chapter. Comment

The set_unexpected( ) function

Like terminate( ), the unexpected( ) mechanism allows you to install your own function to respond to unexpected exceptions. You do so with a function called set_unexpected( ), which, like set_terminate( ), takes the address of a function with no arguments and void return value. Also, because it returns the previous value of the unexpected( ) pointer, you can save it and restore it later. To use set_unexpected( ), include the header file <exception>. Here’s an example that shows a simple use of the features discussed so far in this section: Comment

//: C01:Unexpected.cpp

// Exception specifications & unexpected()

//{-msc} Doesn’t terminate properly

#include <exception>

#include <iostream>

#include <cstdlib>

using namespace std;

 

class Up {};

class Fit {};

void g();

 

void f(int i) throw (Up, Fit) {

  switch(i) {

    case 1: throw Up();

    case 2: throw Fit();

  }

  g();

}

 

// void g() {} // Version 1

void g() { throw 47; } // Version 2

 

void my_unexpected() {

  cout << "unexpected exception thrown" << endl;

  exit(0);

}

 

int main() {

  set_unexpected(my_unexpected);

  // (ignores return value)

  for(int i = 1; i <=3; i++)

    try {

      f(i);

    } catch(Up) {

      cout << "Up caught" << endl;

    } catch(Fit) {

      cout << "Fit caught" << endl;

    }

} ///:~

 

The classes Up and Fit are created solely to throw as exceptions. Often exception classes will be small, but they can certainly hold additional information so that the handlers can query for it. Comment

The f( ) function promises in its exception specification to throw only exceptions of type Up and Fit, and from looking at the function definition, this seems plausible. Version one of g( ), called by f( ), doesn’t throw any exceptions, so this is true. But if someone changes g( ) so that it throws a different type of exception (like the second version in this example, which throws an int), the exception specification for f( ) is violated. Comment

The my_unexpected( ) function has no arguments or return value, following the proper form for a custom unexpected( ) function. It simply displays a message so that you can see that it was called, and then exits the program (exit(0) is used here so that the book’s make process is not aborted). Your new unexpected( ) function should not have a return statement. Comment

In main( ), the try block is within a for loop, so all the possibilities are exercised. In this way, you can achieve something like resumption. Nest the try block inside a for, while, do, or if and cause any exceptions to attempt to repair the problem; then attempt the try block again. Comment

Only the Up and Fit exceptions are caught because those are the only exceptions that the programmer of f( ) said would be thrown. Version two of g( ) causes my_unexpected( ) to be called because f( ) then throws an int. Comment

In the call to set_unexpected( ), the return value is ignored, but it can also be saved in a pointer to function and be restored later, as we did in the set_terminate( ) example earlier in this chapter. Comment

A typical unexpected handler logs the error and terminates the program by calling exit( ). It can, however, throw another exception (or re-throw the same exception) or call abort( ). If it throws an exception of a type allowed by the function whose specification was originally violated, the search resumes at the call of the function with this exception specification. (This behavior is unique to unexpected( ).)

If the exception thrown from your unexpected handler is not allowed by the original function’s specification, one of the following occurs:

1.       If std::bad_exception (defined in <exception>) was in the function’s exception specification, the exception thrown from the unexpected handler is replaced with a std::bad_exception object, and the search resumes from the function as before.

2.      If the original function’s specification did not include std::bad_exception, terminate( ) is called.

The following program illustrates this behavior. Comment

//: C01:BadException.cpp

//{-bor}

#include <exception>    // for std::bad_exception

#include <iostream>

#include <cstdio>

using namespace std;

 

// Exception classes:

class A {};

class B {};

 

// terminate() handler

void my_thandler() {

  cout << "terminate called\n";

  exit(0);

}

 

// unexpected() handlers

void my_uhandler1() {

  throw A();

}

void my_uhandler2() {

  throw;

}

 

// If we embed this throw statement in f or g,

// the compiler detects the violation and reports

// an error, so we put it in its own function.

void t() {

  throw B();

}

 

void f() throw(A) {

  t();

}

void g() throw(A, bad_exception) {

  t();

}

 

int main() {

  set_terminate(my_thandler);

  set_unexpected(my_uhandler1);

  try {

    f();

  }

  catch (A&) {

    cout << "caught an A from f\n";

  }

  set_unexpected(my_uhandler2);

  try {

    g();

  }

  catch (bad_exception&) {

    cout << "caught a bad_exception from g\n";

  }

  try {

    f();

  }

  catch (...) {

    cout << "This will never print\n";

  }

} ///:~

 

The my_uhandler1( ) handler throws an acceptable exception (A), so execution resumes at the first catch, which succeeds. The my_uhandler2( ) handler does not throw a valid exception (B), but since g specifies bad_exception, the B exception is replaced by a bad_exception object, and the second catch also succeeds. Since f does not include bad_exception in its specification, my_thandler( ) is called as a terminate handler. Thus, the output from this program is as follows: Comment

caught an A from f

caught a bad_exception from g

terminate called

 

Better exception specifications?

You may feel that the existing exception specification rules aren’t very safe, and that

void f();

 

should mean that no exceptions are thrown from this function. If the programmer wants to throw any type of exception, you might think he or she should have to say Comment

void f() throw(...); // Not in C++

 

This would surely be an improvement because function declarations would be more explicit. Unfortunately, you can’t always know by looking at the code in a function whether an exception will be thrown—it could happen because of a memory allocation, for example. Worse, existing functions written before exception handling was introduced may find themselves inadvertently throwing exceptions because of the functions they call (which might be linked into new, exception-throwing versions). Hence, the uninformative situation whereby Comment

void f();

 

means, “Maybe I’ll throw an exception, maybe I won’t.” This ambiguity is necessary to avoid hindering code evolution. If you want to specify that f throws no exceptions, use the empty list, as in: Comment

void f() throw();

 

Exception specifications and inheritance

Each public function in a class essentially forms a contract with the user; if you pass it certain arguments, it will perform certain operations and/or return a result. The same contract must hold true in derived classes; otherwise the expected “is-a” relationship between derived and base classes is violated. Since exception specifications are logically part of a function’s declaration, they too must remain consistent across an inheritance hierarchy. For example, if a member function in a base class says it will only throw an exception of type A, an override of that function in a derived class must not add any other exception types to the specification list, because that would result in unexpected exceptions for the user, breaking any programs that adhere to the base class interface. You can, however, specify fewer exceptions or none at all, since that doesn’t require the user to do anything differently. You can also specify anything that “is-a” A in place of A in the derived function’s specification. Here’s an example. Comment

// C01:Covariance.cpp

// Compile Only!

//{-msc}

#include <iostream>

using namespace std;

 

class Base {

public:

  class BaseException {};

  class DerivedException : public BaseException {};

  virtual void f() throw (DerivedException) {

    throw DerivedException();

  }

  virtual void g() throw (BaseException) {

    throw BaseException();

  }

};

 

class Derived : public Base {

public:

  void f() throw (BaseException) {

    throw BaseException();

  }

  virtual void g() throw (DerivedException) {

    throw DerivedException();

  }

};

 

A compiler should flag the override of Derived::f( ) with an error (or at least a warning) since it changes its exception specification in a way that violates the specification of Base::f( ). The specification for Derived::g( ) is acceptable because DerivedException “is-a” BaseException (not the other way around). You can think of Base/Derived and BaseException/DerivedException as parallel class hierarchies; when you are in Derived, you can replace references to BaseException in exception specifications and return values with DerivedException. This behavior is called covariance  (since both sets of classes vary down their respective hierarchies together). (Reminder from Volume 1: parameter types are not covariant—you are not allowed to change the signature of an overridden virtual function.) Comment

When not to use exception specifications

If you peruse the function declarations throughout the Standard C++ library, you’ll find that not a single exception specification occurs anywhere! Although this might seem strange, there is a good reason for this seeming incongruity: the library consists mainly of templates, and you never know what a generic might do. For example, suppose you are developing a generic stack template and attempt to affix an exception specification to your pop function, like this:

T pop() throw(logic_error);

 

Since the only error you anticipate is a stack underflow, you might think it’s safe to specify a logic_error or some other appropriate exception type. But since you don’t know much about the type T, what if its copy constructor could possibly throw an exception (it’s not unreasonable, after all)? Then unexpected( ) would be called, and your program would terminate. The point is that you shouldn’t make guarantees that you can’t stand behind. If you don’t know what exceptions might occur, don’t use exception specifications. That’s why template classes, which constitute 90 percent of the Standard C++ library, do not use exception specifications—they specify the exceptions they know about in documentation and leave the rest to you. Exception specifications are mainly for non-template classes. Comment

Exception safety

In Chapter 7 we’ll take an in-depth look at the containers in the Standard C++ library, including the stack container. One thing you’ll notice is that the declaration of the pop( ) member function looks like this:

void pop();

 

You might think it strange that pop( ) doesn’t return a value. Instead, it just removes the element at the top of the stack. To retrieve the top value, call top( ) before you call pop( ). There is an important reason for this behavior, and it has to do with exception safety, a crucial consideration in library design. Comment

Suppose you are implementing a stack with a dynamic array (we’ll call it data and the counter integer count), and you try to write pop( ) so that it returns a value. The code for such a pop( ) might look something like this:

template<class T>

T stack<T>::pop() {

  if (count == 0)

    throw logic_error("stack underflow");

  else

    return data[--count];

}

 

What happens if the copy constructor that is called for the return value in the last line throws an exception when the value is returned? The popped element is not returned because of the exception, and yet count has already been decremented, so the top element you wanted is lost forever! The problem is that this function attempts to do two things at once: (1) return a value, and (2) change the state of the stack. It is better to separate these two actions into two separate member functions, which is exactly what the standard stack class does. (In other words, follow the time-worn design practice of cohesion—every function should do one thing well.) Exception-safe code leaves objects in a consistent state and does not leak resources. Comment

You also need to be careful writing custom assignment operators. In Chapter 12 of Volume 1, you saw that operator= should adhere to the following pattern:

1.       Make sure you’re not assigning to self. If you are, go to step 6. (This is strictly an optimization.)

2.      Allocate new memory required by pointer data members.

3.      Copy data from the old memory to the new.

4.      Delete the old memory.

5.      Update the object’s state by assigning the new heap pointers to the pointer data members.

6.      Return *this.

It’s important to not change the state of your object until all the new pieces have been safely allocated and initialized. A good technique is to move all of steps 2 and 3 into a separate function, often called clone( ). The following example does this for a class that has two pointer members, theString and theInts. Comment

//: C01:SafeAssign.cpp

// Shows an Exception-safe operator=

#include <iostream>

#include <new>       // For std::bad_alloc

#include <cstring>

using namespace std;

 

// A class that has two pointer members using the heap

class HasPointers {

  // A Handle class to hold the data

  struct MyData {

    const char* theString;

    const int* theInts;

    size_t numInts;

    MyData(const char* pString, const int* pInts,

           size_t nInts)

    : theString(pString), theInts(pInts),

    numInts(nInts) {}

  } *theData;  // The handle

 

  // clone and cleanup functions

  static MyData* clone(const char* otherString,

                       const int* otherInts, size_t nInts){

    char* newChars = new char[strlen(otherString)+1];

    int* newInts;

    try {

      newInts = new int[nInts];

    } catch (bad_alloc&) {

      delete [] newChars;

      throw;

    }

    try {

      // This example uses built-in types, so it won't

      // throw, but for class types it could throw, so we

      // use a try block for illustration. (This is the

      // point of the example!)

      strcpy(newChars, otherString);

      for (size_t i = 0; i < nInts; ++i)

        newInts[i] = otherInts[i];

    } catch (...) {

      delete [] newInts;

      delete [] newChars;

      throw;

    }

    return new MyData(newChars, newInts, nInts);

  }

  static MyData* clone(const MyData* otherData) {

    return clone(otherData->theString,

                 otherData->theInts,

                 otherData->numInts);

  }

  static void cleanup(const MyData* theData) {

    delete [] theData->theString;

    delete [] theData->theInts;

    delete theData;

  }

public:

  HasPointers(const char* someString, const int* someInts,

              size_t numInts) {

    theData = clone(someString, someInts, numInts);

  }

  HasPointers(const HasPointers& source) {

    theData = clone(source.theData);

  }

  HasPointers& operator=(const HasPointers& rhs) {

    if (this != &rhs) {

      MyData* newData =

      clone(rhs.theData->theString,

            rhs.theData->theInts,

            rhs.theData->numInts);

      cleanup(theData);

      theData = newData;

    }

    return *this;

  }

  ~HasPointers() {

    cleanup(theData);

  }

  friend ostream& operator<<(ostream& os,

                             const HasPointers& obj) {

    os << obj.theData->theString << ": ";

    for (size_t i = 0; i < obj.theData->numInts; ++i)

      os << obj.theData->theInts[i] << ' ';

    return os;

  }

};

 

int main() {

  int someNums[] = {1, 2, 3, 4};

  size_t someCount = sizeof someNums / sizeof someNums[0];

  int someMoreNums[] = {5, 6, 7};

  size_t someMoreCount =

  sizeof someMoreNums / sizeof someMoreNums[0];

  HasPointers h1("Hello", someNums, someCount);

  HasPointers h2("Goodbye", someMoreNums, someMoreCount);

  cout << h1 << endl;  // Hello: 1 2 3 4

  h1 = h2;

  cout << h1 << endl;  // Goodbye: 5 6 7

} ///:~

 

For convenience, HasPointers uses the MyData class as a handle to the two pointers. Whenever it’s time to allocate more memory, whether during construction or assignment, the first clone function is ultimately called to do the job. If memory fails for the first call to the new operator, a bad_alloc exception is thrown automatically. If it happens on the second allocation (for theInts), we have to clean up the memory for theString—hence the first try block that catches a bad_alloc exception. The second try block isn’t crucial here because we’re just copying ints and pointers (so no exceptions will occur), but whenever you copy objects, their assignment operators can possibly cause an exception, in which case everything needs to be cleaned up. In both exception handlers, notice that we rethrow the exception. That’s because we’re just managing resources here; the user still needs to know that something went wrong, so we let the exception propagate up the dynamic chain. Software libraries that don’t silently swallow exceptions are called exception neutral. Always strive to write libraries that are both exception safe and exception neutral.[6] Comment

If you inspect the previous code closely, you’ll notice that none of the delete operations will throw an exception. This code actually depends on that fact. Recall that when you call delete on an object, the object’s destructor is called. It turns out to be practically impossible, therefore, to design exception-safe code without assuming that destructors don’t throw exceptions. Don’t let destructors throw exceptions! (We’re going to remind you about this once more before this chapter is done).[7] Comment

Programming with exceptions

For most programmers, especially C programmers, exceptions are not available in their existing language and take a bit of adjustment. Here are some guidelines for programming with exceptions. Comment

When to avoid exceptions

Exceptions aren’t the answer to all problems. In fact, if you simply go looking for something to pound with your new hammer, you’ll cause trouble. The following sections point out situations in which exceptions are not warranted. Probably the best advice for deciding when to use exceptions is to throw exceptions only when a function fails to meet its specification. Comment

Not for asynchronous events

The Standard C signal( ) system and any similar system handle asynchronous events: events that happen outside the flow of a program, and thus events the program cannot anticipate. You cannot use C++ exceptions to handle asynchronous events because the exception and its handler are on the same call stack. That is, exceptions rely on the dynamic chain of function calls on the program’s runtime stack (dynamic scope, if you will), whereas asynchronous events must be handled by completely separate code that is not part of the normal program flow (typically, interrupt service routines or event loops). Don’t throw exceptions from interrupt handlers. Comment

This is not to say that asynchronous events cannot be associated with exceptions. But the interrupt handler should do its job as quickly as possible and then return. The typical way to handle this situation is to set a flag in the interrupt handler, and check it synchronously in the mainline code. Comment

Not for benign error conditions

If you have enough information to handle an error, it’s not an exception. Take care of it in the current context rather than throwing an exception to a larger context. Comment

Also, C++ exceptions are not thrown for machine-level events such as divide-by-zero.[8] It’s assumed that some other mechanism, such as the operating system or hardware, deals with these events. In this way, C++ exceptions can be reasonably efficient, and their use is isolated to program-level exceptional conditions. Comment

Not for flow-of-control

An exception looks somewhat like an alternate return mechanism and somewhat like a switch statement, so you might be tempted to use an exception instead of these ordinary language mechanisms. This is a bad idea, partly because the exception-handling system is significantly less efficient than normal program execution; exceptions are a rare event, so the normal program shouldn’t pay for them. Also, exceptions from anything other than error conditions are quite confusing to the user of your class or function. Comment

You’re not forced to use exceptions

Some programs are quite simple (small utilities, for example). You might only need to take input and perform some processing. In these programs, you might attempt to allocate memory and fail, try to open a file and fail, and so on. It is acceptable in these programs to display a message and exit the program, allowing the system to clean up the mess, rather than to work hard to catch all exceptions and recover all the resources yourself. Basically, if you don’t need to use exceptions, you don’t have to use them. Comment

New exceptions, old code

Another situation that arises is the modification of an existing program that doesn’t use exceptions. You might introduce a library that does use exceptions and wonder if you need to modify all your code throughout the program. Assuming you have an acceptable error-handling scheme already in place, the most straightforward thing to do is surround the largest block that uses the new library (this might be all the code in main( )) with a try block, followed by a catch(...) and basic error message). You can refine this to whatever degree necessary by adding more specific handlers, but, in any case, the code you’re forced to add can be minimal. It’s even better, of course, to isolate your exception-generating code in a try block and write handlers to convert the exceptions into your existing error-handling scheme. Comment

It’s truly important to think about exceptions when you’re creating a library for someone else to use, especially in situations in which you can’t know how they need to respond to critical error conditions (recall the earlier discussions on exception safety and why there are no exception specifications in the Standard C++ Library). Comment

Typical uses of exceptions

Do use exceptions to do the following:

·         Fix the problem and call the function which caused the exception again.

·         Patch things up and continue without retrying the function.

·         Do whatever you can in the current context and rethrow the same exception to a higher context.

·         Do whatever you can in the current context and throw a different exception to a higher context.

·         Terminate the program.

·         Wrap functions (especially C library functions) that use ordinary error schemes so they produce exceptions instead.

·         Simplify. If your exception scheme makes things more complicated, it is painful and annoying to use.

·         Make your library and program safer. This is a short-term investment (for debugging) and a long-term investment (for application robustness). Comment

When to use exception specifications

The exception specification is like a function prototype: it tells the user to write exception-handling code and what exceptions to handle. It tells the compiler the exceptions that might come out of this function so that it can detect violations at runtime. Comment

Of course, you can’t always look at the code and anticipate which exceptions will arise from a particular function. Sometimes, the functions it calls produce an unexpected exception, and sometimes an old function that didn’t throw an exception is replaced with a new one that does, and you get a call to unexpected( ). Any time you use exception specifications or call functions that do, consider creating your own unexpected( ) function that logs a message and then either throws an exception or aborts the program. Comment

As we explained earlier, you should avoid using exception specifications in template classes, since you can’t anticipate what types of exceptions the template parameter classes might throw. Comment

Start with standard exceptions

Check out the Standard C++ library exceptions before creating your own. If a standard exception does what you need, chances are it’s a lot easier for your user to understand and handle. Comment

If the exception type you want isn’t part of the standard library, try to derive one from an existing standard exception. It’s nice if your users can always write their code to expect the what( ) function defined in the exception( ) class interface. Comment

Nest your own exceptions

If you create exceptions for your particular class, it’s a good idea to nest the exception classes either inside your class or inside a namespace containing your class, to provide a clear message to the reader that this exception is used only for your class. In addition, it prevents the pollution of the global namespace. Comment

You can nest your exceptions even if you’re deriving them from C++ standard exceptions. Comment

Use exception hierarchies

Using exception hierarchies is a valuable way to classify the types of critical errors that might be encountered with your class or library. This gives helpful information to users, assists them in organizing their code, and gives them the option of ignoring all the specific types of exceptions and just catching the base-class type. Also, any exceptions added later by inheriting from the same base class will not force all existing code to be rewritten—the base-class handler will catch the new exception. Comment

Of course, the Standard C++ exceptions are a good example of an exception hierarchy and one on which you can build. Comment

Multiple inheritance (MI)

As you’ll read in Chapter 9, the only essential place for MI is if you need to upcast an object pointer to two different base classes—that is, if you need polymorphic behavior with both of those base classes. It turns out that exception hierarchies are useful places for multiple inheritance because a base-class handler from any of the roots of the multiply inherited exception class can handle the exception. Comment

Catch by reference, not by value

We explained in the section “Exception matching” earlier that you should catch exceptions by reference for two reasons:

·         To avoid making a needless copy of the exception object when it is passed to the handler,

·         To avoid object slicing when catching a derived exception as a base class object

Although you can also throw and catch pointers, by doing so you introduce more coupling—the thrower and the catcher must agree on how the exception object is allocated and cleaned up. This is a problem because the exception itself might have occurred from heap exhaustion. If you throw exception objects, the exception-handling system takes care of all storage. Comment

Throw exceptions in constructors

Because a constructor has no return value, you’ve previously had two ways to report an error during construction: Comment

·         Set a nonlocal flag and hope the user checks it.

·         Return an incompletely created object and hope the user checks it.

This problem is serious because C programmers have come to rely on an implied guarantee that object creation is always successful, which is not unreasonable in C in which types are so primitive. But continuing execution after construction fails in a C++ program is a guaranteed disaster, so constructors are one of the most important places to throw exceptions—now you have a safe, effective way to handle constructor errors. However, you must also pay attention to pointers inside objects and the way cleanup occurs when an exception is thrown inside a constructor. Comment

Don’t cause exceptions in destructors

Because destructors are called in the process of throwing other exceptions, you’ll never want to throw an exception in a destructor or cause another exception to be thrown by some action you perform in the destructor. If this happens, a new exception can be thrown before the catch-clause for an existing exception is reached, which will cause a call to terminate( ). Comment

If you call any functions inside a destructor that can throw exceptions, those calls should be within a try block in the destructor, and the destructor must handle all exceptions itself. None must escape from the destructor. Comment

Avoid naked pointers

See Wrapped.cpp earlier in this chapter. A naked pointer usually means vulnerability in the constructor if resources are allocated for that pointer. A pointer doesn’t have a destructor, so those resources aren’t released if an exception is thrown in the constructor. Use auto_ptr for pointers that reference heap memory. Comment

Overhead

When an exception is thrown, there’s considerable runtime overhead (but it’s good overhead, since objects are cleaned up automatically!). For this reason, you never want to use exceptions as part of your normal flow-of-control, no matter how tempting and clever it may seem. Exceptions should occur only rarely, so the overhead is piled on the exception and not on the normally executing code. One of the important design goals for exception handling was that it could be implemented with no impact on execution speed when it wasn’t used; that is, as long as you don’t throw an exception, your code runs as fast as it would without exception handling. Whether this is actually true depends on the particular compiler implementation you’re using. (See the description of the “zero-cost model” later in this section.) Comment

You can think of a throw expression as a call to a special system function that takes the exception object as an argument and backtracks up the chain of execution. For this to work, extra information needs to be put on the stack by the compiler, to aid in stack unwinding. To understand this, you need to know about the runtime stack. Whenever a function is called, information about that function is pushed onto the runtime stack in an activation record instance (ARI), also called a stack frame. A typical stack frame contains the address of the calling function (so execution can return to it), a pointer to the ARI of the function’s static parent (the scope that lexically contains the called function, so variables global to the function can be accessed), and a pointer to the function that called it (its dynamic parent). The path that logically results from repetitively following the dynamic parent links is the dynamic chain, or call chain, that we’ve mentioned previously in this chapter. This is how execution can backtrack when an exception is thrown, and it is the mechanism that makes it possible for components developed without knowledge of one another to communicate errors at runtime. Comment

To enable stack unwinding for exception handling, extra exception-related information about each function needs to be available for each stack frame. This information describes which destructors need to be called (so that local objects can be cleaned up), indicates whether the current function has a try block, and lists which exceptions the associated catch clauses can handle. Naturally there is space penalty for this extra information, so programs that support exception handling can be somewhat larger than those that don’t.[9] Even the compile-time size of programs using exception handling is greater, since the logic of how to generate the expanded stack frames during runtime must be generated by the compiler. Comment

To illustrate this, we compiled the following program both with and without exception-handling support in Borland C++ Builder and Microsoft Visual C++.[10]

struct HasDestructor {

  ~HasDestructor(){}

};

 

void g();      // for all we know, g may throw

 

void f() {

   HasDestructor h;

   g();

}

 

If exception handling is enabled, the compiler must keep information about ~HasDestructor( ) available at runtime in the ARI for f( ) (so it can destroy h properly should g( ) throw an exception). The following table summarizes the result of the compilations in terms of the size of the compiled (.obj) files (in bytes). Comment

Compiler\Mode

With Exception Support

Without Exception Support

Borland

616

234

Microsoft

1162

680

 

Don’t take the percentage differences between the two modes too seriously. Remember that exceptions (should) typically constitute a small part of a program, so the space overhead tends to be much smaller (usually between 5 and 15 percent).

You might think that this extra housekeeping would slow down execution, and you’d be correct. A clever compiler implementation can avoid that cost, however. Since information about exception-handling code and the offsets of local objects can be computed once at compile time, such information can be kept in a single place associated with each function, but not in each ARI. You essentially remove exception overhead from each ARI and, therefore, avoid the extra time to push them onto the stack. This approach is called the zero-cost model[11] of exception handling, and the optimized storage mentioned earlier is known as the shadow stack.[12] Comment

Summary

Error recovery is a fundamental concern for every program you write, and it’s especially important in C++, in which one of the goals is to create program components for others to use. To create a robust system, each component must be robust. Comment

The goals for exception handling in C++ are to simplify the creation of large, reliable programs using less code than currently possible, with more confidence that your application doesn’t have an unhandled error. This is accomplished with little or no performance penalty and with low impact on existing code. Comment

Basic exceptions are not terribly difficult to learn; begin using them in your programs as soon as you can. Exceptions are one of those features that provide immediate and significant benefits to your project. Comment

Exercises

                            4.             Create a class with member functions that throw exceptions. Within this class, make a nested class to use as an exception object. It takes a single char* as its argument; this represents a description string. Create a member function that throws this exception. (State this in the function’s exception specification.) Write a try block that calls this function and a catch clause that handles the exception by displaying its description string.

                            5.             Rewrite the Stash class from Chapter 13 of Volume 1 so that it throws out_of_range exceptions for operator[].

                            6.             Write a generic main( ) that takes all exceptions and reports them as errors.

                            7.             Create a class with its own operator new. This operator should allocate ten objects, and on the eleventh object “run out of memory” and throw an exception. Also add a static member function that reclaims this memory. Now create a main( ) with a try block and a catch clause that calls the memory-restoration routine. Put these inside a while loop, to demonstrate recovering from an exception and continuing execution.

                            8.             Create a destructor that throws an exception, and write code to prove to yourself that this is a bad idea by showing that if a new exception is thrown before the handler for the existing one is reached, terminate( ) is called.

                            9.             Prove to yourself that all exception objects (the ones that are thrown) are properly destroyed.

                       10.             Prove to yourself that if you create an exception object on the heap and throw the pointer to that object, it will not be cleaned up.

                        11.             Write a function with an exception specification that can throw four exception types: a char, an int, a bool, and your own exception class. Catch each in main( ) and verify the catch. Derive your exception class from a standard exception. Write the function in such a way that the system recovers and tries to execute it again.

                        12.             Modify your solution to the exercise 8 to throw a double from the function, violating the exception specification. Catch the violation with your own unexpected handler that displays a message and exits the program gracefully (meaning abort( ) is not called).

                        13.             Write a Garage class that has a Car that is having troubles with its Motor. Use a function-level try block in the Garage class constructor to catch an exception (thrown from the Motor class) when its Car object is initialized. Throw a different exception from the body of the Garage constructor’s handler and catch it in main( ).


2: Defensive programming

Writing “perfect software” may be an elusive Holy Grail for developers, but a few defensive techniques, routinely applied, can go a long way toward narrowing the gap between code and ideal.

Although the complexity of typical production software guarantees that testers will always have a job, chances are you still yearn to produce defect-free software. (At least we hope you do!) Object-oriented design techniques do much to corral the difficulty of large projects, to be sure. Eventually, however, you have to get down to writing loops and functions. These details of “programming in the small” become the building blocks of the implementation of larger components called for by your design efforts. If your loops are off by one or your functions calculate the correct values only “most” of the time, you’re in deep trouble no matter how fancy your overall methodology. In this chapter, we’re interested in coding practices that keep you on track toward a working solution regardless of the size of your project. Comment

Your code is, among other things, an expression of your attempt to solve a problem. It should be clear to the reader (including yourself) exactly what you were thinking when you designed that loop. At certain points in your program, you should be able to make bold statements that some condition or other holds. (If you can’t, you really haven’t yet solved the problem.) Such statements are called invariants, since they should invariably be true at the point where they appear in the code; if not, either your design is faulty, or your code does not accurately reflect your design. (In other words, you’ve got bugs!) Comment

To illustrate, consider how to write a program that plays the guessing game of Hi-lo. You play this game by having one person think of a number between 1 and 100, and having the other person guess the number. (We’ll let the computer do the guessing.) The person who holds the number tells the guesser whether their guess is high, low or correct. The best strategy for the guesser is of course binary search, which chooses the midpoint of the range of numbers where the sought-after number resides. The high-low response tells the guesser which half of the list holds the number, and the process repeats, halving the size of the active search range on each iteration. So how do you write a loop to drive the repetition properly? It’s not sufficient to just say Comment

bool guessed = false;

while (!guessed) {

  …

}

 

because a malicious user might respond deceitfully, and you could spend all day guessing. What assumption, however simple, are you making each time you guess? In other words, what condition should hold by design on each loop iteration? Comment

The simple assumption we’re after is, of course, that the secret number is within the current active range of unguessed numbers, beginning with the range [1, 100]. Suppose we label the endpoints of the range with the variables low and high. Each time you pass through the loop you need to make sure that if the number was in the range [low, high] at the beginning of the loop, you calculate the new range so that it still contains the number at the end of the current loop iteration. Comment

The goal is to express the loop invariant in code so that a violation can be detected at runtime. Unfortunately, since the computer doesn’t know the secret number, you can’t express this condition directly in code, but you can at least make a comment to that effect:

while (!guessed) {

  // INVARIANT: the number is in the range [low, high]

  …

}

 

If we were to stop this thread of discussion right here, we would have accomplished a great deal if it helps clarify how you design loops. Fortunately, we can do better than that. What happens when the user says that a guess is too high when it isn’t or that it’s too low when it in fact is not? The deception will in effect exclude the secret number from the new subrange. Because one lie always leads to another, eventually your range will diminish to nothing (since you shrink it by half each time and the secret number isn’t in there). We can easily express this condition concretely, as the following program illustrates. Comment

//: C02:HiLo.cpp

// Plays the game of Hi-lo to illustrate a loop invariant

#include <cstdlib>

#include <iostream>

#include <string>

using namespace std;

 

int main() {

  cout << "Think of a number between 1 and 100\n";

cout << "I will make a guess; ";

cout << "tell me if I'm (H)igh or (L)ow\n";

  int low = 1, high = 100;

  bool guessed = false;

  while (!guessed) {

    // Invariant: the number is in the range [low, high]

    if (low > high) {  // Invariant violation

      cout << "You cheated! I quit\n";

      return EXIT_FAILURE;

    }

    int guess = (low + high) / 2;

    cout << "My guess is " << guess << ". ";

    cout << "(H)igh, (L)ow, or (E)qual? ";

    string response;

    cin >> response;

    switch(toupper(response[0])) {

      case 'H':

        high = guess - 1;

        break;

      case 'L':

        low = guess + 1;

        break;

      case 'E':

        guessed = true;

        break;

      default:

        cout << "Invalid response\n";

        continue;

    }

  }

  cout << "I got it!\n";

  return EXIT_SUCCESS;

} ///:~

 

The violation of the invariant is easily detected with the condition if (low > high), because if the user always tells the truth, we will always find the secret number before we run out of numbers to guess from. (See the last paragraph of the text that follows the program extractCode.cpp at the end of Chapter 3 for an explanation of the macros EXIT_FAILURE and EXIT_SUCCESS). Comment

Assertions

The condition in the Hi-lo program depends on user input, so you’re powerless to always prevent a violation of the invariant. Most often, however, invariants depend only on the code you write, so they will always hold, if you’ve implemented your design correctly. In this case, it is clearer to make an assertion, which is a positive statement that reveals your design decisions. Comment

For example, suppose you are implementing a vector of integers, which, as you know, is an expandable array that grows on demand. The function that adds an element to the vector must first verify that there is an open slot in the underlying array that holds the elements; otherwise, it needs to request more heap space and copy the existing elements to the new space before adding the new element (and of course deleting the old array). Such a function might look like the following: Comment

void MyVector::push_back(int x) {

   if (nextSlot == capacity)

      grow();

   assert(nextSlot < capacity);

   data[nextSlot++] = x;

}

 

In this example, data is a dynamic array of ints with capacity slots and nextSlot slots in use. The purpose of grow( ) is to expand the size of data so that the new value of capacity is strictly greater than nextSlot. Proper behavior of MyVector depends on this design decision, and it will never fail if the rest of the supporting code is correct, so we assert the condition with the assert( ) macro (defined in the header <cassert>). Comment

The Standard C library assert( ) macro is brief, to the point, and portable. If the condition in its parameter evaluates to non-zero, execution continues uninterrupted; if it doesn’t, a message containing the text of the offending expression along with its source file name and line number is printed to the standard error channel and the program aborts. Is that too drastic? In practice, it is much more drastic to let execution continue when a basic design assumption has failed. Your program needs to be fixed. Comment

If all goes well, you will have thoroughly tested your code with all assertions intact by the time the final product is deployed. (We’ll say more about testing later.) Depending on the nature of your application, the machine cycles needed to test all assertions at runtime might be too much of a performance hit in the field. If that’s the case, you can remove all the assertion code automatically by defining the macro NDEBUG and rebuilding the application. Comment

To see how this works, note that a typical implementation of assert( ) looks something like this:

#ifdef NDEBUG

  #define assert(cond) ((void)0)

#else

  void assertImpl(const char*, const char*, long);

#define assert(cond) \

  ((cond) ? (void)0 : assertImpl(???))

#endif

 

When the macro NDEBUG is defined, the code decays to the expression (void) 0, so all that’s left in the compilation stream is an essentially empty statement as a result of the semicolon you appended to each assert( ) invocation. If NDEBUG is not defined, assert(cond) expands to a conditional statement that, when cond is zero, calls a compiler-dependent function (which we named assertImpl( )) with a string argument representing the text of cond, along with the file name and line number where the assertion appeared. (We used “???” as a place holder in the example, but the string mentioned is actually computed there, along with the file name and the line number where the macro occurs in that file. How these values are obtained is immaterial to our discussion.) If you want to turn assertions on and off at different points in your program, you not only have to #define or #undef NDEBUG, but you have to re-include <cassert>. Macros are evaluated as the preprocessor encounters them and therefore use whatever NDEBUG state applies at that point in time. The most common way to define NDEBUG once for an entire program is as a compiler option, whether through project settings in your visual environment or via the command line, as in

mycc –DNDEBUG myfile.cpp

 

Most compilers use the –D flag to define macro names. (Substitute the name of your compiler’s executable for mycc above.) The advantage of this approach is that you can leave your assertions in the source code as an invaluable bit of documentation, and yet there is no runtime penalty. Because the code in an assertion disappears when NDEBUG is defined, it is important that you never do work in an assertion. Only test conditions that do not change the state of your program. Comment

Whether using NDEBUG for released code is a good idea remains a subject of debate. Tony Hoare, one of the most influential computer scientists of all time,[13] has suggested that turning off runtime checks such as assertions is similar to a sailing enthusiast who wears a life jacket while training on land and then discards it when he actually goes to sea.[14] If an assertion fails in production, you have a problem much worse than degradation in performance, so choose wisely. Comment

Not all conditions should be enforced by assertions, of course. User errors and runtime resource failures should be signaled by throwing exceptions, as we explained in detail in Chapter 1. It is tempting to use assertions for most error conditions while roughing out code, with the intent to replace many of them later with robust exception handling. Like any other temptation, use caution, since you might forget to make all the necessary changes later. Remember: assertions are intended to verify design decisions that will only fail because of faulty programmer logic. The ideal is to solve all assertion violations during development. Don’t use assertions for conditions that aren’t totally in your control (for example, conditions that depend on user input). In particular, you wouldn’t want to use assertions to validate function arguments; throw a logic_error instead. Comment

The use of assertions as a tool to ensure program correctness was formalized by Bertrand Meyer in his Design by Contract methodology.[15] Every function has an implicit contract with clients that, given certain pre-conditions, guarantees certain post-conditions. In other words, the pre-conditions are the requirements for using the function, such as supplying arguments within certain ranges, and the post-conditions are the results delivered by the function, either by return value or by side-effect. Comment

What should you do when clients fail to give you valid input? They have broken the contract, and you need to let them know. As we mentioned earlier, this is not the best time to abort the program (although you’re justified in doing so since the contract was violated), but an exception is certainly in order. This is why the Standard C++ library throws exceptions derived from logic_error, such as out_of_range.[16] If there are functions that only you call, however, such as private functions in a class of your own design, the assert( ) macro is appropriate, since you have total control over the situation and you certainly want to debug your code before shipping. Comment

Since post-conditions are totally your responsibility, you might think assertions also apply, and you would be partially right. It is appropriate to use an assertion for any invariant at any time, including when a function has finished its work. This especially applies to class member functions that maintain the state of an object. In the MyVector example earlier, for instance, a reasonable invariant for all public member functions would be

assert(0 <= nextSlot && nextSlot <= capacity);

 

or, if nextSlot is an unsigned integer, simply

assert(nextSlot <= capacity);

 

Such an invariant is called a class invariant and can reasonably be enforced by an assertion. Subclasses play the role of subcontractor to their base classes in that they must maintain the original contract the base class has with its clients. For this reason, the pre-conditions in derived classes must impose no extra requirements beyond those in the base contract, and the post-conditions must deliver at least as much.[17] Comment

Validating results returned to the client, however, is nothing more or less than testing, so using post-condition assertions in this case would be duplicating work. There’s nothing wrong with it; it’s just an exercise in redundancy. Yes, it’s good documentation, but more than one developer has been fooled into using post-condition assertions as a substitute for unit testing. Bad idea! Comment

A simple unit test framework

Writing software is all about meeting requirements.[18] It doesn’t take much experience, however, to figure out that coming up with requirements in the first place is no easy task, and, more important, requirements are not static. It’s not unheard of to discover at a weekly project meeting that what you just spent the week doing is not exactly what the users really want. Comment

Frustrating? Yes. Reasonable? Also, yes! It is unreasonable to expect mere humans to be able to articulate software requirements in detail without sampling an evolving, working system. It's much better to specify a little, design a little, code a little, test a little. Then, after evaluating the outcome, do it all over again. The ability to develop from soup to nuts in such an iterative fashion is one of the great advances of this object-oriented era in software history. It requires nimble programmers who can craft resilient code. Change is hard. Comment

Ironically, another impetus for change comes from you, the programmer. The craftsperson in you likely has the habit of continually improving the physical design of working code. What maintenance programmer hasn’t had occasion to curse the aging, flagship company product as a convoluted patchwork of spaghetti, wholly resistant to modification? Management’s knee-jerk reluctance to let you tamper with a functioning system, while not totally unfounded, robs code of the resilience it needs to endure. "If it’s not broken, don’t fix it" eventually gives way to, "We can’t fix it—rewrite it." Change is necessary. Comment

Fortunately, our industry has finally gotten used to the discipline of refactoring, the art of internally restructuring code to improve its design, without changing the functionality visible to the user.[19] Such tweaks include extracting a new function from another, or its inverse, combining member functions; replacing a member function with an object; parameterizing a member function or class; and replacing conditionals with polymorphism. Refactoring helps code embrace evolution. Comment

Whether the force for change comes from users or programmers, however, there is still the risk that changes today will break what worked yesterday. What is needed is a way to build code that withstands the winds of change and actually improves over time. Comment

Many practices purport to support such a quick-on-your-feet motif, of which Extreme Programming is only one.[20] In this section we explore what we think is the key to making flexible, incremental development succeed: a ridiculously easy-to-use automated unit test framework. (Please note that we in no way mean to de-emphasize the role of testers, software professionals who test others’ code for a living. They are indispensable. We are merely describing a way to help developers write better code.) Comment

Developers write unit tests to gain the confidence to say the two most important things that any developer can say:

1.               I understand the requirements.

2.               My code meets those requirements to the best of my knowledge.

There is no better way to ensure that you know what the code you're about to write should do than to write the unit tests first. This simple exercise helps focus the mind on the task ahead and will likely lead to working code faster than just jumping into coding. Or, to express it in XP terms: Testing + Programming is faster than just Programming. Writing tests first also puts you on guard up front against boundary conditions that might cause your code to break, so your code is more robust right out of the chute. Comment

Once your code passes all your tests, you have the peace of mind that if the system you contribute to isn't working, it's not your fault. The statement "All my tests pass" is a powerful trump card in the workplace that cuts through any amount of politics and hand waving. Comment

Automated testing

So what does a unit test look like? Too often developers just use some well-behaved input to produce some expected output, which they inspect visually. Two dangers exist in this approach. First, programs don't always receive only well-behaved input. We all know that we should test the boundaries of program input, but it's hard to think about this when you're trying to just get things working. If you write the test for a function first before you start coding, you can wear your “tester hat” and ask yourself, "What could possibly make this break?" Code a test that will prove the function you'll write isn't broken, and then put on your developer hat and make it happen. You'll write better code than if you hadn't written the test first. Comment

The second danger is that inspecting output visually is tedious and error prone. Most any such thing a human can do a computer can do, but without human error. It's better to formulate tests as collections of Boolean expressions and have a test program report any failures. Comment

For example, suppose you need to build a Date class that has the following properties:

·         A date can be initialized with a string (YYYYMMDD), three integers (Y, M, D), or nothing (giving today's date).

·         A date object can yield its year, month, and day or a string of the form "YYYYMMDD".

·         All relational comparisons are available, as well as computing the duration between two dates (in years, months, and days).

·         Dates to be compared need to be able to span an arbitrary number of centuries (for example, 1600–2200).

Your class can store three integers representing the year, month, and day. (Just be sure the year is at least 16 bits in size to satisfy the last bulleted item.) The interface for your Date class might look like this: Comment

// A first pass at Date.h

#ifndef DATE_H

#define DATE_H

#include <string>

 

class Date {

public:

  // A struct to hold elapsed time:

  struct Duration {

    int years;

    int months;

    int days;

    Duration(int y, int m, int d)

      : years(y), months(m), days(d) {}

  };

  Date();

  Date(int year, int month, int day);

  Date(const std::string&);

  int getYear() const;

  int getMonth() const;

  int getDay() const;

  std::string toString() const;

friend bool operator<(const Date&, const Date&);

friend bool operator>(const Date&, const Date&);

friend bool operator<=(const Date&, const Date&);

friend bool operator>=(const Date&, const Date&);

friend bool operator==(const Date&, const Date&);

friend bool operator!=(const Date&, const Date&);

  friend Duration duration(const Date&, const Date&);

};

#endif

 

Before you even think about implementation, you can solidify your grasp of the requirements for this class by writing the beginnings of a test program. You might come up with something like the following:

//: C02:SimpleDateTest.cpp

//{L} Date

// You’ll need the full Date.h from the Appendix:

#include "Date.h"

#include <iostream>

using namespace std;

 

// Test machinery

int nPass = 0, nFail = 0;

void test(bool t) {

  if(t) nPass++; else nFail++;

}

 

int main() {

  Date mybday(1951, 10, 1);

  test(mybday.getYear() == 1951);

  test(mybday.getMonth() == 10);

  test(mybday.getDay() == 1);

  cout << "Passed: " << nPass << ", Failed: "

       << nFail << endl;

}

/* Expected output:

Passed: 3, Failed: 0

*/ ///:~

 

In this trivial case, the function test( ) maintains the global variables nPass and nFail. The only visual inspection you do is to read the final score. If a test failed, a more sophisticated test( ) displays an appropriate message. The framework described later in this chapter has such a test function, among other things. Comment

You can now implement enough of the Date class to get these tests to pass, and then you can proceed iteratively in like fashion until all the requirements are met. By writing tests first, you are more likely to think of corner cases that might break your upcoming implementation, and you’re more likely to write the code correctly the first time. Such an exercise might produce the following “final” version of a test for the Date class: Comment

//: C02:SimpleDateTest2.cpp

//{L} Date

#include <iostream>

#include "Date.h"

using namespace std;

 

// Test machinery

int nPass = 0, nFail = 0;

void test(bool t) {

  if(t) nPass++; else nFail++;

}

 

int main() {

  Date mybday(1951, 10, 1);

  Date today;

Date myevebday("19510930");

 

  // Test the operators

  test(mybday < today);

  test(mybday <= today);

  test(mybday != today);

  test(mybday == mybday);

  test(mybday >= mybday);

  test(mybday <= mybday);

  test(myevebday < mybday);

  test(mybday > myevebday);

  test(mybday >= myevebday);

  test(mybday != myevebday);

 

  // Test the functions

  test(mybday.getYear() == 1951);

  test(mybday.getMonth() == 10);

  test(mybday.getDay() == 1);

  test(myevebday.getYear() == 1951);

  test(myevebday.getMonth() == 9);

  test(myevebday.getDay() == 30);

  test(mybday.toString() == "19511001");

  test(myevebday.toString() == "19510930");

 

  // Test duration

  Date d2(2003, 7, 4);

  Date::Duration dur = duration(mybday, d2);

  test(dur.years == 51);

  test(dur.months == 9);

  test(dur.days == 3);

 

  // Report results:

  cout << "Passed: " << nPass << ", Failed: "

       << nFail << endl;

} ///:~

 

The word “final” above was quoted because this test can of course be more fully developed. For example we haven’t tested that long durations are handled correctly. To save space on the printed page we’ll stop here, but you get the idea. The full implementation for the Date class is available in the files Date.h and Date.cpp in the appendix and on the MindView website.[21] Comment

The TestSuite Framework

Some automated C++ unit test tools are available on the World Wide Web for download, such as CppUnit.[22] These are well designed and implemented, but our purpose here is not only to present a test mechanism that is easy to use, but also easy to understand internally and even tweak if necessary. So, in the spirit of “TheSimplestThingThatCouldPossiblyWork,” we have developed the TestSuite Framework, a namespace named TestSuite that contains two key classes: Test and Suite. Comment

The Test class is an abstract class you derive from to define a test object. It keeps track of the number of passes and failures for you and displays the text of any test condition that fails. Your main task in defining a test is simply to override the run( ) member function, which should in turn call the test_( ) macro for each Boolean test condition you define. Comment

To define a test for the Date class using the framework, you can inherit from Test as shown in the following program:

//: C02:DateTest.h

#ifndef DATE_TEST_H

#define DATE_TEST_H

#include "Date.h"

#include "../TestSuite/Test.h"

 

class DateTest : public TestSuite::Test {

  Date mybday;

  Date today;

  Date myevebday;

public:

  DateTest() : mybday(1951, 10, 1), myevebday("19510930") {

  }

  void run() {

    testOps();

    testFunctions();

    testDuration();

  }

  void testOps() {

    test_(mybday < today);

    test_(mybday <= today);

    test_(mybday != today);

    test_(mybday == mybday);

    test_(mybday >= mybday);

    test_(mybday <= mybday);

    test_(myevebday < mybday);

    test_(mybday > myevebday);

    test_(mybday >= myevebday);

    test_(mybday != myevebday);

  }

  void testFunctions() {

    test_(mybday.getYear() == 1951);

    test_(mybday.getMonth() == 10);

    test_(mybday.getDay() == 1);

    test_(myevebday.getYear() == 1951);

    test_(myevebday.getMonth() == 9);

    test_(myevebday.getDay() == 30);

    test_(mybday.toString() == "19511001");

    test_(myevebday.toString() == "19510930");

  }

  void testDuration() {

    Date d2(2003, 7, 4);

    Date::Duration dur = duration(mybday, d2);

    test_(dur.years == 51);

    test_(dur.months == 9);

    test_(dur.days == 3);

  }

};

#endif ///:~

 

Running the test is a simple matter of instantiating a DateTest object and calling its run( ) member function. Comment

//: C02:DateTest.cpp

// Automated Testing (with a Framework)

//{L} Date ../TestSuite/Test

#include <iostream>

#include "DateTest.h"

using namespace std;

 

int main() {

  DateTest test;

  test.run();

  return test.report();

}

/* Output:

Test "DateTest":

        Passed: 21,      Failed: 0

*/ ///:~

 

The Test::report( ) function displays the previous output and returns the number of failures, so it is suitable to use as a return value from main( ). Comment

The Test class uses RTTI[23] to get the name of your class (for example, DateTest) for the report. There is also a setStream( ) member function if you want the test results sent to a file instead of to the standard output (the default). You’ll see the Test class implementation later in this chapter. Comment

The test_ ( ) macro can extract the text of the Boolean condition that fails, along with its file name and line number.[24] To see what happens when a failure occurs, you can introduce an intentional error in the code, say by reversing the condition in the first call to test_( ) in DateTest::testOps( ) in the previous example code. The output indicates exactly what test was in error and where it happened: Comment

DateTest failure: (mybday > today) , DateTest.h (line 31)

Test "DateTest":

        Passed: 20      Failed: 1

 

In addition to test_( ), the framework includes the functions succeed_( ) and fail_( ), for cases in which a Boolean test won't do. These functions apply when the class you’re testing might throw exceptions. During testing, you want to arrange an input set that will cause the exception to occur to make sure it’s doing its job. If it doesn’t, it’s an error, in which case you call fail_( ) explicitly to display a message and update the failure count. If it does throw the exception as expected, you call succeed_ ( ) to update the success count. Comment

To illustrate, suppose we update the specification of the two non-default Date constructors to throw a DateError exception (a type nested inside Date and derived from std::logic_error) if the input parameters do not represent a valid date: Comment

Date(const string& s) throw(DateError);

Date(int year, int month, int day) throw(DateError);

 

The DateTest::run( ) member function can now call the following function to test the exception handling:

  void testExceptions() {

    try {

      Date d(0,0,0);  // Invalid

      fail_("Invalid date undetected in Date int ctor");

    }

    catch (Date::DateError&) {

      succeed_();

    }

    try {

      Date d("");  // Invalid

      fail_("Invalid date undetected in Date string ctor");

    }

    catch (Date::DateError&) {

      succeed_();

    }

  }

 

In both cases, if an exception is not thrown, it is an error. Notice that you have to manually pass a message to fail_( ), since no Boolean expression is being evaluated. Comment

Test suites

Real projects usually contain many classes, so you need a way to group tests so that you can just push a single button to test the entire project.[25] The Suite class allows you to collect tests into a functional unit. You derive Test objects to a Suite with the addTest( ) member function, or you can swallow an entire existing suite with addSuite( ). We have a number of date-related classes to illustrate how to use a test suite. Here's an actual test run: Comment

// Illustrates a suite of related tests

#include <iostream>

#include "suite.h"         // includes test.h

#include "JulianDateTest.h"

#include "JulianTimeTest.h"

#include "MonthInfoTest.h"

#include "DateTest.h"

#include "TimeTest.h"

using namespace std;

 

int main() {

    Suite s("Date and Time Tests");

    s.addTest(new MonthInfoTest);

    s.addTest(new JulianDateTest);

    s.addTest(new JulianTimeTest);

    s.addTest(new DateTest);

    s.addTest(new TimeTest);

    s.run();

    long nFail = s.report();

    s.free();

    return nFail;

}

/* Output:

Suite "Date and Time Tests"

===========================

Test "MonthInfoTest":

   Passed: 18  Failed: 0

Test "JulianDateTest":

   Passed: 36  Failed: 0

Test "JulianTimeTest":

   Passed: 29  Failed: 0

Test "DateTest":

   Passed: 57  Failed: 0

Test "TimeTest":

   Passed: 84  Failed: 0

===========================

*/

 

Each of the five test files included as headers tests a unique date component. You must give the suite a name when you create it. The Suite::run( ) member function calls Test::run( ) for each of its contained tests. Much the same thing happens for Suite::report( ), except that it is possible to send the individual test reports to a destination stream that is different from that of the suite report. If the test passed to addSuite( ) has a stream pointer assigned already, it keeps it. Otherwise, it gets its stream from the Suite object. (As with Test, there is a second argument to the suite constructor that defaults to std::cout.) The destructor for Suite does not automatically delete the contained Test pointers because they don’t have to reside on the heap; that’s the job of Suite::free( ). Comment

The test framework code

The test framework code library is in a subdirectory called TestSuite in the code distribution available on the MindView website. To use it, include the search path for the TestSuite subdirectory in your header, link the object files, and include the TestSuite subdirectory in the library search path. Here is the header for Test.h:

//: TestSuite:Test.h

#ifndef TEST_H

#define TEST_H

#include <string>

#include <iostream>

#include <cassert>

using std::string;

using std::ostream;

using std::cout;

 

// The following have underscores because

// they are macros. For consistency,

// succeed_() also has an underscore.

 

#define test_(cond) \

  do_test(cond, #cond, __FILE__, __LINE__)

#define fail_(str) \

  do_fail(str, __FILE__, __LINE__)

 

namespace TestSuite {

 

class Test {

public:

  Test(ostream* osptr = &cout);

  virtual ~Test(){}

  virtual void run() = 0;

  long getNumPassed() const;

  long getNumFailed() const;

  const ostream* getStream() const;

  void setStream(ostream* osptr);

  void succeed_();

  long report() const;

  virtual void reset();

protected:

  void do_test(bool cond, const string& lbl,

    const char* fname, long lineno);

  void do_fail(const string& lbl,

    const char* fname, long lineno);

private:

  ostream* osptr;

  long nPass;

  long nFail;

  // Disallowed:

  Test(const Test&);

  Test& operator=(const Test&);

};

 

inline Test::Test(ostream* osptr) {

  this->osptr = osptr;

  nPass = nFail = 0;

}

 

inline long Test::getNumPassed() const {

  return nPass;

}

 

inline long Test::getNumFailed() const {

  return nFail;

}

 

inline const ostream* Test::getStream() const {

  return osptr;

}

 

inline void Test::setStream(ostream* osptr) {

  this->osptr = osptr;

}

 

inline void Test::succeed_() {

  ++nPass;

}

 

inline void Test::reset() {

  nPass = nFail = 0;

}

 

} // namespace TestSuite

#endif // TEST_H ///:~

 

There are three virtual functions in the Test class:

·         A virtual destructor

·         The function reset( )

·         The pure virtual function run( )

As explained in Volume 1, it is an error to delete a derived heap object through a base pointer unless the base class has a virtual destructor. Any class intended to be a base class (usually evidenced by the presence of at least one other virtual function) should have a virtual destructor. The default implementation of the Test::reset( ) resets the success and failure counters to zero. You might want to override this function to reset the state of the data in your derived test object; just be sure to call Test::reset( ) explicitly in your override so that the counters are reset. The Test::run( ) member function is pure virtual, of course, since you are required to override it in your derived class. Comment

The test_( ) and fail_( ) macros can include file name and line number information available from the preprocessor. We originally omitted the trailing underscores in the names, but the original fail( ) macro collided with ios::fail( ), causing all kinds of compiler errors. Comment

Here is the implementation of Test:

//: TestSuite:Test.cpp {O}

#include "Test.h"

#include <iostream>

#include <typeinfo>  // Note: Visual C++ requires /GR

using namespace std;

using namespace TestSuite;

 

void Test::do_test(bool cond,

  const std::string& lbl, const char* fname,

  long lineno) {

  if (!cond)

    do_fail(lbl, fname, lineno);

  else

    succeed_();

}

 

void Test::do_fail(const std::string& lbl,

  const char* fname, long lineno) {

  ++nFail;

  if (osptr) {

    *osptr << typeid(*this).name()

           << "failure: (" << lbl << ") , "

           << fname

           << " (line " << lineno << ")\n";

  }

}

 

long Test::report() const {

  if (osptr) {

    *osptr << "Test \"" << typeid(*this).name()

           << "\":\n\tPassed: " << nPass

           << "\tFailed: " << nFail

           << endl;

  }

  return nFail;

} ///:~

 

No rocket science here. The Test class just keeps track of the number of successes and failures as well as the stream where you want Test::report( ) to display the results. The test_( ) and fail_( ) macros extract the current file name and line number information from the preprocessor and pass the file name to do_test( ) and the line number to do_fail( ), which do the actual work of displaying a message and updating the appropriate counter. We can’t think of a good reason to allow copy and assignment of test objects, so we have disallowed these operations by making their prototypes private and omitting their respective function bodies. Comment

Here is the header file for Suite: Comment

//: TestSuite:Suite.h

#ifndef SUITE_H

#define SUITE_H

#include "../TestSuite/Test.h"

#include <vector>

#include <stdexcept>

using std::vector;

using std::logic_error;

 

namespace TestSuite {

 

class TestSuiteError : public logic_error {

public:

  TestSuiteError(const string& s = "")

    : logic_error(s) {}

};

 

class Suite {

public:

  Suite(const string& name, ostream* osptr = &cout);

  string getName() const;

  long getNumPassed() const;

  long getNumFailed() const;

  const ostream* getStream() const;

  void setStream(ostream* osptr);

  void addTest(Test* t) throw (TestSuiteError);

  void addSuite(const Suite&);

  void run();  // Calls Test::run() repeatedly

  long report() const;

  void free();  // Deletes tests

private:

  string name;

  ostream* osptr;

  vector<Test*> tests;

  void reset();

  // Disallowed ops:

  Suite(const Suite&);

  Suite& operator=(const Suite&);

};

 

inline

Suite::Suite(const string& name, ostream* osptr)

   : name(name) {

  this->osptr = osptr;

}

 

inline string Suite::getName() const {

  return name;

}

 

inline const ostream* Suite::getStream() const {

  return osptr;

}

 

inline void Suite::setStream(ostream* osptr) {

  this->osptr = osptr;

}

 

} // namespace TestSuite

#endif // SUITE_H ///:~

 

The Suite class holds pointers to its Test objects in a vector. Notice the exception specification on the addTest( ) member function. When you add a test to a suite, Suite::addTest( ) verifies that the pointer you pass is not null; if it is null, it throws a TestSuiteError exception. Since this makes it impossible to add a null pointer to a suite, addSuite( ) asserts this condition on each of its tests, as do the other functions that traverse the vector of tests (see the following implementation). Copy and assignment are disallowed as they are in the Test class. Comment

//: TestSuite:Suite.cpp {O}

#include "Suite.h"

#include <iostream>

#include <cassert>

using namespace std;

using namespace TestSuite;

 

void Suite::addTest(Test* t) throw(TestSuiteError) {

  // Verify test is valid and has a stream:

  if (t == 0)

    throw TestSuiteError(

      "Null test in Suite::addTest");

  else if (osptr && !t->getStream())

    t->setStream(osptr);

  tests.push_back(t);

  t->reset();

}

 

void Suite::addSuite(const Suite& s) {

for (size_t i = 0; i < s.tests.size(); ++i) {

  assert(tests[i]);

addTest(s.tests[i]);

  }

}

 

void Suite::free() {

  for (size_t i = 0; i < tests.size(); ++i) {

    delete tests[i];

    tests[i] = 0;

  }

}

 

void Suite::run() {

  reset();

  for (size_t i = 0; i < tests.size(); ++i) {

    assert(tests[i]);

    tests[i]->run();

  }

}

 

long Suite::report() const {

  if (osptr) {

    long totFail = 0;

    *osptr << "Suite \"" << name

             << "\"\n=======";

    size_t i;

    for (i = 0; i < name.size(); ++i)

      *osptr << '=';

    *osptr << "=\n";

    for (i = 0; i < tests.size(); ++i) {

      assert(tests[i]);

      totFail += tests[i]->report();

    }

    *osptr << "=======";

    for (i = 0; i < name.size(); ++i)

      *osptr << '=';

    *osptr << "=\n";

    return totFail;

  }

  else

    return getNumFailed();

}

 

long Suite::getNumPassed() const {

  long totPass = 0;

  for (size_t i = 0; i < tests.size(); ++i) {

    assert(tests[i]);

    totPass += tests[i]->getNumPassed();

  }

  return totPass;

}

 

long Suite::getNumFailed() const {

  long totFail = 0;

  for (size_t i = 0; i < tests.size(); ++i) {

    assert(tests[i]);

    totFail += tests[i]->getNumFailed();

  }

  return totFail;

}

 

void Suite::reset() {

  for (size_t i = 0; i < tests.size(); ++i) {

    assert(tests[i]);

    tests[i]->reset();

  }

} ///:~

 

We will be using the TestSuite framework wherever it applies throughout the rest of this book. Comment

Debugging techniques

The best debugging habit to get into is to use assertions as explained in the beginning of this chapter; by doing so you’ll be more likely to find logic errors before they cause real trouble. This section contains some other tips and techniques that might help during debugging. Comment

Trace macros

Sometimes it’s helpful to print the code of each statement as it is executed, either to cout or to a trace file. Here’s a preprocessor macro to accomplish this: Comment

#define TRACE(ARG) cout << #ARG << endl; ARG

 

Now you can go through and surround the statements you trace with this macro. Of course, it can introduce problems. For example, if you take the statement: Comment

for(int i = 0; i < 100; i++)

  cout << i << endl;

 

and put both lines inside TRACE( ) macros, you get this:

TRACE(for(int i = 0; i < 100; i++))

TRACE(  cout << i << endl;)

 

which expands to this:

cout << "for(int i = 0; i < 100; i++)" << endl;

for(int i = 0; i < 100; i++)

  cout << "cout << i << endl;" << endl;

cout << i << endl;

 

which isn’t exactly what you want. Thus, you must use this technique carefully. Comment

The following is a variation on the TRACE( ) macro:

#define D(a) cout << #a "=[" << a << "]" << '\n';

 

If you want to display an expression, you simply put it inside a call to D( ). The expression is displayed, followed by its value (assuming there’s an overloaded operator << for the result type). For example, you can say D(a + b). Thus, you can use this macro any time you want to test an intermediate value to make sure things are okay. Comment

Of course, these two macros are actually just the two most fundamental things you do with a debugger: trace through the code execution and display values. A good debugger is an excellent productivity tool, but sometimes debuggers are not available, or it’s not convenient to use them. These techniques always work, regardless of the situation. Comment

Trace file

DISCLAIMER: This section and the next contain code which is officially unsanctioned by the C++ standard. In particular, we redefine cout and new via macros, which can cause surprising results if you’re not careful. Our examples work on all the compilers we use, however, and provide useful information. This is the only place in this book where we will depart from the sanctity of standard-compliant coding practice. Use at your own risk!

The following code allows you to easily create a trace file and send all the output that would normally go to cout into the file. All you have to do is #define TRACEON and include the header file (of course, it’s fairly easy just to write the two key lines right into your file): Comment

//: C03:Trace.h

// Creating a trace file

#ifndef TRACE_H

#define TRACE_H

#include <fstream>

 

#ifdef TRACEON

ofstream TRACEFILE__("TRACE.OUT");

#define cout TRACEFILE__

#endif

 

#endif // TRACE_H ///:~

 

 

Here’s a simple test of the previous file:

//: C03:Tracetst.cpp

// Test of trace.h

#include "../require.h"

#include <iostream>

#include <fstream>

using namespace std;

 

#define TRACEON

#include "Trace.h"

 

int main() {

  ifstream f("Tracetst.cpp");

  assure(f, "Tracetst.cpp");

  cout << f.rdbuf(); // Dumps file contents to file

} ///:~

 

 

Finding memory leaks

The following straightforward debugging techniques are explained Volume 1.

1.       For array bounds checking, use the Array template in C16:Array3.cpp of Volume 1 for all arrays. You can turn off the checking and increase efficiency when you’re ready to ship. (This doesn’t deal with the case of taking a pointer to an array, though—perhaps that could be made into a template somehow as well). Comment

2.      Check for non-virtual destructors in base classes. Comment

Tracking new/delete and malloc/free

Common problems with memory allocation include mistakenly calling delete for memory not on the free store, deleting the free store more than once, and, most often, forgetting to delete such a pointer at all. This section discusses a system that can help you track down these kinds of problems.

As an additional disclaimer beyond that of the preceding section: because of the way we overload new, the following technique may not work on all platforms, and will only work for programs that do not call the function operator new( ) explicitly. We have been quite careful in this book to only present code that fully conforms to the C++ standard, but in this one instance we’re making an exception for the following reasons:

                            1.             Even though it’s technically illegal, it works on many compilers.[26]

                            2.             We illustrate some useful thinking along the way.

 

To use the memory checking system, you simply include the header file MemCheck.h, link the MemCheck.obj file into your application, so that all the calls to new and delete are intercepted, and call the macro MEM_ON( ) (explained later in this section) to initiate memory tracing. A trace of all allocations and deallocations is printed to the standard output (via stdout). When you use this system, all calls to new store information about the file and line where they were called. This is accomplished by using the placement syntax for operator new.[27] Although you typically use the placement syntax when you need to place objects at a specific point in memory, it also allows you to create an operator new( ) with any number of arguments. This is used to advantage in the following example to store the results of the __FILE__ and __LINE__ macros whenever new is called: Comment

//: C02:MemCheck.h

#ifndef MEMCHECK_H

#define MEMCHECK_H

#include <cstddef>  // for size_t

 

// Hijack the new operator (both scalar and array versions)

void* operator new(std::size_t, const char*, long);

void* operator new[](std::size_t, const char*, long);

#define new new (__FILE__, __LINE__)

 

extern bool traceFlag;

#define TRACE_ON() traceFlag = true

#define TRACE_OFF() traceFlag = false

 

extern bool activeFlag;

#define MEM_ON() activeFlag = true

#define MEM_OFF() activeFlag = false

 

#endif

///:~

 

It is important that you include this file in any source file in which you want to track free store activity, but include it last (after your other #include directives). Most headers in the standard library are templates, and since most compilers use the inclusion model of template compilation (meaning all source code is in the headers), the macro that replaces new in MemCheck.h would usurp all instances of the new operator in the library source code (and would likely result in compile errors). Besides, you are only interested in tracking your own memory errors, not the library’s. Comment

In the following file, which contains the memory tracking implementation, everything is done with C standard I/O rather than with C++ iostreams. It shouldn’t make a difference, really, since we’re not interfering with iostreams’ use of the free store, but it’s safer to not take a chance. (Besides, we tried it. Some compilers complained, but all compilers were happy with the <stdio> version.) Comment

//: C02:MemCheck.cpp {O}

#include <cstdio>

#include <cstdlib>

#include <cassert>

using namespace std;

#undef new

 

// Global flags set by macros in MemCheck.h

bool traceFlag = true;

bool activeFlag = false;

 

namespace {

 

// Memory map entry type

struct Info {

  void* ptr;

  const char* file;

  long line;

};

 

// Memory map data

const size_t MAXPTRS = 10000u;

Info memMap[MAXPTRS];

size_t nptrs = 0;

 

// Searches the map for an address

int findPtr(void* p) {

  for (int i = 0; i < nptrs; ++i)

    if (memMap[i].ptr == p)

      return i;

  return -1;

}

 

void delPtr(void* p) {

  int pos = findPtr(p);

  assert(p >= 0);

  // Remove pointer from map

  for (size_t i = pos; i < nptrs-1; ++i)

    memMap[i] = memMap[i+1];

  --nptrs;

}

 

// Dummy type for static destructor

struct Sentinel {

  ~Sentinel() {

    if (nptrs > 0) {

      printf("Leaked memory at:\n");

      for (size_t i = 0; i < nptrs; ++i)

        printf("\t%p (file: %s, line %ld)\n",

          memMap[i].ptr, memMap[i].file, memMap[i].line);

    }

    else

      printf("No user memory leaks!\n");

  }

};

 

// Static dummy object

Sentinel s;

 

} // End anonymous namespace

 

// Overload scalar new

void* operator new(size_t siz, const char* file,

  long line) {

  void* p = malloc(siz);

  if (activeFlag) {

    if (nptrs == MAXPTRS) {

      printf("memory map too small (increase MAXPTRS)\n");

      exit(1);

    }

    memMap[nptrs].ptr = p;

    memMap[nptrs].file = file;

    memMap[nptrs].line = line;

    ++nptrs;

  }

  if (traceFlag) {

    printf("Allocated %u bytes at address %p ", siz, p);

    printf("(file: %s, line: %ld)\n", file, line);

  }

  return p;

}

 

// Overload array new

void* operator new[](size_t siz, const char* file,

  long line) {

  return operator new(siz, file, line);

}

 

// Override scalar delete

void operator delete(void* p) {

  if (findPtr(p) >= 0) {

    free(p);

    assert(nptrs > 0);

    delPtr(p);

    if (traceFlag)

      printf("Deleted memory at address %p\n", p);

  }

  else if (!p && activeFlag)

    printf("Attempt to delete unknown pointer: %p\n", p);

}

 

// Override array delete

void operator delete[](void* p) {

  operator delete(p);

} ///:~

 

The Boolean flags traceFlag and activeFlag are global, so they can be modified in your code by the macros TRACE_ON( ), TRACE_OFF( ), MEM_ON( ), and MEM_OFF( ). In general, enclose all the code in your main( ) within a MEM_ON( )-MEM_OFF( ) pair so that memory is always tracked. Tracing, which echoes the activity of the replacement functions for operator new( ) and operator delete( ), is on by default, but you can turn it off with TRACE_OFF( ). In any case, the final results are always printed (see the test runs later in this chapter).

The MemCheck facility tracks memory by keeping all addresses allocated by operator new( ) in an array of Info structures, which also holds the file name and line number where the call to new occurred. As much information as possible is kept inside the anonymous namespace so as not to collide with any names you might have placed in the global namespace. The Sentinel class exists solely to have a static object’s destructor called as the program shuts down. This destructor inspects memMap to see if any pointers are waiting to be deleted (in which case you have a memory leak). Comment

Our operator new( ) uses malloc( ) to get memory, and then adds the pointer and its associated file information to memMap. The operator delete( ) function undoes all that work by calling free( ) and decrementing nptrs, but first it checks to see if the pointer in question is in the map in the first place. If it isn’t, either you’re trying to delete an address that isn’t on the free store, or you’re trying to delete one that’s already been deleted and therefore previously removed from the map. The activeFlag variable is important here because we don’t want to process any deallocations from any system shutdown activity. By calling MEM_OFF( ) at the end of your code, activeFlag will be set to false, and such subsequent calls to delete will be ignored. (Of course, that’s bad in a real program, but as we said earlier, our purpose here is to find your leaks; we’re not debugging the library.) For simplicity, we forward all work for array new and delete to their scalar counterparts. Comment

The following is a simple test using the MemCheck facility.

//: C02:MemTest.cpp

//{L} MemCheck

// Test of MemCheck system

#include <iostream>

#include <vector>

#include <cstring>

#include "MemCheck.h"   // Must appear last!

using namespace std;

 

class Foo {

  char* s;

public:

  Foo(const char*s ) {

    this->s = new char[strlen(s) + 1];

    strcpy(this->s, s);

  }

  ~Foo() {

    delete [] s;

  }

};

 

int main() {

  MEM_ON();

  cout << "hello\n";

  int* p = new int;

  delete p;

  int* q = new int[3];

  delete [] q;

  int* r;

  delete r;

  vector<int> v;

  v.push_back(1);

  Foo s("goodbye");

  MEM_OFF();

} ///:~

 

This example verifies that you can use MemCheck in the presence of streams, standard containers, and classes that allocate memory in constructors. The pointers p and q are allocated and deallocated without any problem, but r is not a valid heap pointer, so the output indicates the error as an attempt to delete an unknown pointer. Comment

hello

Allocated 4 bytes at address 0xa010778 (file: memtest.cpp, line: 25)

Deleted memory at address 0xa010778

Allocated 12 bytes at address 0xa010778 (file: memtest.cpp, line: 27)

Deleted memory at address 0xa010778

Attempt to delete unknown pointer: 0x1

Allocated 8 bytes at address 0xa0108c0 (file: memtest.cpp, line: 14)

Deleted memory at address 0xa0108c0

No user memory leaks!

 

Because of the call to MEM_OFF( ), no subsequent calls to operator delete( ) by vector or ostream are processed. You still might get some calls to delete from reallocations performed by the containers. Comment

If you call TRACE_OFF( ) at the beginning of the program, the output is as follows:

hello

Attempt to delete unknown pointer: 0x1

No user memory leaks! Comment

 

Summary

Much of the headache of software engineering can be avoided by being deliberate about what you’re doing. You’ve probably been using mental assertions as you’ve crafted your loops and functions anyway, even if you haven’t routinely used the assert( ) macro. If you’ll use assert( ), you’ll find logic errors sooner and end up with more readable code as well. Remember to only use assertions for invariants, though, and not for runtime error handling.

Nothing will give you more peace of mind than thoroughly tested code. If it’s been a hassle for you in the past, use an automated framework, such as the one we’ve presented here, to integrate routine testing into your daily work. You (and your users!) will be glad you did.

Exercises

                            1.             Write a test program using the TestSuite Framework for the standard vector class that thoroughly tests the following member functions with a vector of integers: push_back( ) (appends an element to the end of the vector), front( ) (returns the first element in the vector), back( ) (returns the last element in the vector), pop_back( ) (removes the last element without returning it), at( ) (returns the element in a specified index position), and size( ) (returns the number of elements). Be sure to verify that vector::at( ) throws a std::out_of_range exception if the supplied index is out of range.

                        14.             Suppose you are asked to develop a class named Rational that supports rational numbers (fractions). The fraction in a Rational object should always be stored in lowest terms, and a denominator of zero is an error. Here is a sample interface for such a Rational class:

class Rational {

public:

   Rational(int numerator = 0, int denominator = 1);

   Rational operator-() const;

   friend Rational operator+(const Rational&,

                             const Rational&);

   friend Rational operator-(const Rational&,

                             const Rational&);

   friend Rational operator*(const Rational&,

                             const Rational&);

   friend Rational operator/(const Rational&,

                             const Rational&);

   friend ostream& operator<<(ostream&,

                              const Rational&);

   friend istream& operator>>(istream&, Rational&);

   Rational& operator+=(const Rational&);

   Rational& operator-=(const Rational&);

   Rational& operator*=(const Rational&);

   Rational& operator/=(const Rational&);

   friend bool operator<(const Rational&,

                         const Rational&);

   friend bool operator>(const Rational&,

                         const Rational&);

   friend bool operator<=(const Rational&,

                          const Rational&);

   friend bool operator>=(const Rational&,

                          const Rational&);

   friend bool operator==(const Rational&,

                          const Rational&);

   friend bool operator!=(const Rational&,

                          const Rational&);

};

 

Write a complete specification for this class, including pre-conditions, post-conditions, and exception specifications.

                        15.             Write a test using the TestSuite framework that thoroughly tests all the specifications from the previous exercise, including testing exceptions.

                        16.             Implement the Rational class so that all the tests from the previous exercise pass. Use assertions only for invariants.

                        17.             The file BuggedSearch.cpp below contains a binary search function that searches the range [beg, end) for what. There are some bugs in the algorithm. Use the trace techniques from this chapter to debug the search function.

 

// BuggedSearch.cpp

#include "../TestSuite/Test.h"

#include <cstdlib>

#include <ctime>

#include <cassert>

#include <fstream>

using namespace std;

 

// This function is only one with bugs

int* binarySearch(int* beg, int* end, int what) {

  while(end - beg != 1) {

    if(*beg == what) return beg;

    int mid = (end - beg) / 2;

    if(what <= beg[mid]) end = beg + mid;

    else beg = beg + mid;

  }

  return 0;

}

class BinarySearchTest : public TestSuite::Test {

  enum { sz = 10 };

  int* data;

  int max; //Track largest number

  int current; // Current non-contained number

               // Used in notContained()

  // Find the next number not contained in the array

  int notContained() {

    while(data[current] + 1 == data[current + 1])

     current++;

    if(current >= sz) return max + 1;

    int retValue = data[current++] + 1;

    return retValue;

  }

  void setData() {

    data = new int[sz];

    assert(!max);

    // Input values with increments of one.  Leave

    // out some values on both odd and even indexes.

    for(int i = 0; i < sz;

        rand() % 2 == 0 ? max += 1 : max += 2)

      data[i++] = max;

  }

  void testInBound() {

    // Test locations both odd and even

    // not contained and contained

    for(int i = sz; --i >=0;)

      test_(binarySearch(data, data + sz, data[i]));

    for(int i = notContained(); i < max;

                                  i = notContained())

      test_(!binarySearch(data, data + sz, i));

  }

  void testOutBounds() {

    // Test lower values

    for(int i = data[0]; --i > data[0] - 100;)

      test_(!binarySearch(data, data + sz, i));

    // Test higher values

    for(int i = data[sz - 1];

        ++i < data[sz -1] + 100;)

      test_(!binarySearch(data, data + sz, i));

  }

public:

  BinarySearchTest() {

    max = current = 0;

  }

  void run() {

    srand(time(0));

    setData();

    testInBound();

    testOutBounds();

    delete [] data;

  }

};

int main() {

  BinarySearchTest t;

  t.run();

  return t.report();

}


 

          Part 2

 
The Standard C++ Library

Standard C++ not only incorporates all the Standard C libraries (with small additions and changes to support type safety), it also adds libraries of its own. These libraries are far more powerful than those in Standard C; the leverage you get from them is analogous to the leverage you get from changing from C to C++.

This part of the book gives you an in-depth introduction to key portions of the Standard C++ library. Comment

The most complete and also the most obscure reference to the full libraries is the Standard itself. Bjarne Stroustrup’s The C++ Programming Language, Third Edition (Addison-Wesley, 2000) remains a reliable reference for both the language and the library. The most celebrated library-only reference is The C++ Standard Library: A Tutorial and Reference, by Nicolai Josuttis (Addison-Wesley, 1999). The goal of the chapters in this part of the book is to provide you with an encyclopedia of descriptions and examples so that you’ll have a good starting point for solving any problem that requires the use of the Standard libraries. However, some techniques and topics are rarely used and are not covered here. If you can’t find it in these chapters, reach for the other two books; this book is not intended to replace those books but rather to complement them. In particular, we hope that after going through the material in the following chapters you’ll have a much easier time understanding those books. Comment

You will notice that these chapters do not contain exhaustive documentation describing every function and class in the Standard C++ library. We’ve left the full descriptions to others; in particular to P.J. Plauger’s Dinkumware C/C++ Library Reference at http://www.dinkumware.com. This is an excellent online source of standard library documentation in HTML format that you can keep resident on your computer and view with a Web browser whenever you need to look up something. . You can view this online and purchase it for local viewing. It contains complete reference pages for the both the C and C++ libraries (so it’s good to use for all your Standard C/C++ programming questions). Electronic documentation is effective not only because you can always have it with you, but also because you can do an electronic search for what you want. Comment

When you’re actively programming, these resources should adequately satisfy your reference needs (and you can use them to look up anything in this chapter that isn’t clear to you). Appendix A lists additional references. Comment

The first chapter in this section introduces the Standard C++ string class, which is a powerful tool that simplifies most of the text-processing chores you might have. The string class might be the most thorough string manipulation tool you’ve ever seen. Chances are, anything you’ve done to character strings with lines of code in C can be done with a member function call in the string class. Comment

Chapter 4 covers the iostreams library, which contains classes for processing input and output with files, string targets, and the system console. Comment

Although Chapter 5, “Templates in Depth,” is not explicitly a library chapter, it is necessary preparation for the two that follow. In Chapter 6 we examine the generic algorithms offered by the Standard C++ library. Because they are implemented with templates, these algorithms can be applied to any sequence of objects. Chapter 7 covers the standard containers and their associated iterators. We cover algorithms first because they can be fully explored by using only arrays and the vector container (which we have been using since early in Volume 1). It is also natural to use the standard algorithms in connection with containers, so it’s a good idea to be familiar with the algorithm before studying the containers.

 

 

3: Strings in depth

One of the biggest time-wasters in C is using character arrays for string processing: keeping track of the difference between static quoted strings and arrays created on the stack and the heap, and the fact that sometimes you’re passing around a char* and sometimes you must copy the whole array.

Especially because string manipulation is so common, character arrays are a great source of misunderstandings and bugs. Despite this, creating string classes remained a common exercise for beginning C++ programmers for many years. The Standard C++ library string class solves the problem of character array manipulation once and for all, keeping track of memory even during assignments and copy-constructions. You simply don’t need to think about it. Comment

This chapter examines the Standard C++ string class, beginning with a look at what constitutes a C++ string and how the C++ version differs from a traditional C character array. You’ll learn about operations and manipulations using string objects, and you’ll see how C++ strings accommodate variation in character sets and string data conversion.[28] Comment

Handling text is perhaps one of the oldest of all programming applications, so it’s not surprising that the C++ string draws heavily on the ideas and terminology that have long been used for this purpose in C and other languages. As you begin to acquaint yourself with C++ strings, this fact should be reassuring. No matter which programming idiom you choose, there are really only about three things you want to do with a string:

·         Create or modify the sequence of characters stored in the string.

·         Detect the presence or absence of elements within the string.

·         Translate between various schemes for representing string characters. Comment

You’ll see how each of these jobs is accomplished using C++ string objects. Comment

What’s in a string?

In C, a string is simply an array of characters that always includes a binary zero (often called the null terminator) as its final array element. There are significant differences between C++ strings and their C progenitors. First, and most important, C++ strings hide the physical representation of the sequence of characters they contain. You don’t have to be concerned at all about array dimensions or null terminators. A string also contains certain “housekeeping” information about the size and storage location of its data. Specifically, a C++ string object knows its starting location in memory, its content, its length in characters, and the length in characters to which it can grow before the string object must resize its internal data buffer. C++ strings therefore greatly reduce the likelihood of making three of the most common and destructive C programming errors: overwriting array bounds, trying to access arrays through uninitialized or incorrectly valued pointers, and leaving pointers “dangling” after an array ceases to occupy the storage that was once allocated to it. Comment

The exact implementation of memory layout for the string class is not defined by the C++ Standard. This architecture is intended to be flexible enough to allow differing implementations by compiler vendors, yet guarantee predictable behavior for users. In particular, the exact conditions under which storage is allocated to hold data for a string object are not defined. String allocation rules were formulated to allow but not require a reference-counted implementation, but whether or not the implementation uses reference counting, the semantics must be the same. To put this a bit differently, in C, every char array occupies a unique physical region of memory. In C++, individual string objects may or may not occupy unique physical regions of memory, but if reference counting is used to avoid storing duplicate copies of data, the individual objects must look and act as though they do exclusively own unique regions of storage. For example: Comment

//: C03:StringStorage.cpp

//{L} ../TestSuite/Test

#include <string>

#include <iostream>

#include "../TestSuite/Test.h"

using namespace std;

 

class StringStorageTest : public TestSuite::Test {

public:

  void run() {

    string s1("12345");

    // This may copy the first to the second or

    // use reference counting to simulate a copy

    string s2 = s1;

    test_(s1 == s2);

    // Either way, this statement must ONLY modify s1

    s1[0] = '6';

    cout << "s1 = " << s1 << endl;

    cout << "s2 = " << s2 << endl;

    test_(s1 != s2);

  }

};

 

int main() {

  StringStorageTest t;

  t.run();

  return t.report();

} ///:~

 

An implementation that only makes unique copies when a string is modified is said to use a copy-on-write strategy. This approach saves time and space when strings are used only as value parameters or in other read-only situations.

Whether a library implementation uses reference counting or not should be transparent to users of the string class. Unfortunately, this is not always the case. In multithreaded programs, it is practically impossible to use a reference-counting implementation safely.[29] Comment

Creating and initializing C++ strings

Creating and initializing strings is a straightforward proposition and fairly flexible. In the SmallString.cpp example in this section, the first string, imBlank, is declared but contains no initial value. Unlike a C char array, which would contain a random and meaningless bit pattern until initialization, imBlank does contain meaningful information. This string object is initialized to hold “no characters” and can properly report its zero length and absence of data elements through the use of class member functions.

The next string, heyMom, is initialized by the literal argument "Where are my socks?" This form of initialization uses a quoted character array as a parameter to the string constructor. By contrast, standardReply is simply initialized with an assignment. The last string of the group, useThisOneAgain, is initialized using an existing C++ string object. Put another way, this example illustrates that string objects let you do the following: Comment

·         Create an empty string and defer initializing it with character data.

·         Initialize a string by passing a literal, quoted character array as an argument to the constructor.

·         Initialize a string using the equal sign (=).

·         Use one string to initialize another. Comment

//: C03:SmallString.cpp

#include <string>

using namespace std;

 

int main() {

  string imBlank;

  string heyMom("Where are my socks?");

  string standardReply = "Beamed into deep "

    "space on wide angle dispersion?";

  string useThisOneAgain(standardReply);

} ///:~

 

These are the simplest forms of string initialization, but variations offer more flexibility and control. You can do the following:

Here’s a program that illustrates these features.

//: C03:SmallString2.cpp

#include <string>

#include <iostream>

using namespace std;

 

int main() {

  string s1

    ("What is the sound of one clam napping?");

  string s2

    ("Anything worth doing is worth overdoing.");

  string s3("I saw Elvis in a UFO");

  // Copy the first 8 chars

  string s4(s1, 0, 8);

  cout << s4 << endl;

  // Copy 6 chars from the middle of the source

  string s5(s2, 15, 6);

  cout << s5 << endl;

  // Copy from middle to end

  string s6(s3, 6, 15);

  cout << s6 << endl;

  // Copy all sorts of stuff

  string quoteMe = s4 + "that" +

  // substr() copies 10 chars at element 20

  s1.substr(20, 10) + s5 +

  // substr() copies up to either 100 char

  // or eos starting at element 5

  "with" + s3.substr(5, 100) +

  // OK to copy a single char this way

  s1.substr(37, 1);

  cout << quoteMe << endl;

} ///:~

 

The string member function substr( ) takes a starting position as its first argument and the number of characters to select as the second argument. Both arguments have default values. If you say substr( ) with an empty argument list, you produce a copy of the entire string; so this is a convenient way to duplicate a string. Comment

Here’s the output from the program:

What is

doing

Elvis in a UFO

What is that one clam doing with Elvis in a UFO?

 

Notice the final line of the example. C++ allows string initialization techniques to be mixed in a single statement, a flexible and convenient feature. Also notice that the last initializer copies just one character from the source string. Comment

Another slightly more subtle initialization technique involves the use of the string iterators string::begin( ) and string::end( ). This technique treats a string like a container object (which you’ve seen primarily in the form of vector so far—you’ll see many more containers in Chapter 7), which uses iterators to indicate the start and end of a sequence of characters. In this way you can hand a string constructor two iterators, and it copies from one to the other into the new string: Comment

//: C03:StringIterators.cpp

#include <string>

#include <iostream>

#include <cassert>

using namespace std;

 

int main() {

  string source("xxx");

  string s(source.begin(), source.end());

  assert(s == source);

} ///:~

 

The iterators are not restricted to begin( ) and end( ); you can increment, decrement, and add integer offsets to them, allowing you to extract a subset of characters from the source string. Comment

C++ strings may not be initialized with single characters or with ASCII or other integer values. You can initialize a string with a number of copies of a single character, however. Comment

//: C03:UhOh.cpp

#include <string>

#include <cassert>

using namespace std;

 

int main() {

  // Error: no single char inits

  //! string nothingDoing1('a');

  // Error: no integer inits

  //! string nothingDoing2(0x37);

  // The following is legal:

  string okay(5, 'a');

  assert(okay == string("aaaaa"));

} ///:~

 

Operating on strings

If you’ve programmed in C, you are accustomed to the convenience of a large family of functions for writing, searching, modifying, and copying char arrays. However, there are two unfortunate aspects of the Standard C library functions for handling char arrays. First, there are two loosely organized families of them: the “plain” group, and the ones that require you to supply a count of the number of characters to be considered in the operation at hand. The roster of functions in the C char array handling library shocks the unsuspecting user with a long list of cryptic, mostly unpronounceable names. Although the kinds and number of arguments to the functions are somewhat consistent, to use them properly you must be attentive to details of function naming and parameter passing. Comment

The second inherent trap of the standard C char array tools is that they all rely explicitly on the assumption that the character array includes a null terminator. If by oversight or error the null is omitted or overwritten, there’s little to keep the C char array handling functions from manipulating the memory beyond the limits of the allocated space, sometimes with disastrous results. Comment

C++ provides a vast improvement in the convenience and safety of string objects. For purposes of actual string handling operations, there are about the same number of distinct member function names in the string class as there are functions in the C library, but because of overloading there is much more functionality. Coupled with sensible naming practices and the judicious use of default arguments, these features combine to make the string class much easier to use than the C library. Comment

Appending, inserting,
and concatenating strings

One of the most valuable and convenient aspects of C++ strings is that they grow as needed, without intervention on the part of the programmer. Not only does this make string-handling code inherently more trustworthy, it also almost entirely eliminates a tedious “housekeeping” chore—keeping track of the bounds of the storage in which your strings live. For example, if you create a string object and initialize it with a string of 50 copies of ‘X’, and later store in it 50 copies of “Zowie”, the object itself will reallocate sufficient storage to accommodate the growth of the data. Perhaps nowhere is this property more appreciated than when the strings manipulated in your code change size and you don’t know how big the change is. Appending, concatenating, and inserting strings often give rise to this circumstance, but the string member functions append( ) and insert( ) transparently reallocate storage when a string grows. Comment

//: C03:StrSize.cpp

#include <string>

#include <iostream>

using namespace std;

 

int main() {

  string bigNews("I saw Elvis in a UFO. ");

  cout << bigNews << endl;

  // How much data have we actually got?

  cout << "Size = " << bigNews.size() << endl;

  // How much can we store without reallocating

  cout << "Capacity = "

    << bigNews.capacity() << endl;

  // Insert this string in bigNews immediately

  // before bigNews[1]

  bigNews.insert(1, " thought I");

  cout << bigNews << endl;

  cout << "Size = " << bigNews.size() << endl;

  cout << "Capacity = "

    << bigNews.capacity() << endl;

  // Make sure that there will be this much space

  bigNews.reserve(500);

  // Add this to the end of the string

  bigNews.append("I've been working too hard.");

  cout << bigNews << endl;

  cout << "Size = " << bigNews.size() << endl;

  cout << "Capacity = "

    << bigNews.capacity() << endl;

} ///:~

 

Here is the output from one particular compiler: Comment

I saw Elvis in a UFO.

Size = 22

Capacity = 31

I thought I saw Elvis in a UFO.

Size = 32

Capacity = 47

I thought I saw Elvis in a UFO. I've been

working too hard.

Size = 59

Capacity = 511

 

This example demonstrates that even though you can safely relinquish much of the responsibility for allocating and managing the memory your strings occupy, C++ strings provide you with several tools to monitor and manage their size. Notice the ease with which we changed the size of the storage allocated to the string. The size( ) function, of course, returns the number of characters currently stored in the string and is identical to the length( ) member function. The capacity( ) function returns the size of the current underlying allocation, meaning the number of characters the string can hold without requesting more storage. The reserve( ) function is an optimization mechanism that allows you to indicate your intention to specify a certain amount of storage for future use; capacity( ) always returns a value at least as large as the most recent call to reserve( ). A resize( ) function appends spaces if the new size is greater than the current string size or truncates the string otherwise. (An overload of resize( ) allows you to specify a different character to append.) Comment

The exact fashion in which the string member functions allocate space for your data depends on the implementation of the library. When we tested one implementation with the previous example, it appeared that reallocations occurred on even word (that is, full-integer) boundaries, with one byte held back. The architects of the string class have endeavored to make it possible to mix the use of C char arrays and C++ string objects, so it is likely that figures reported by StrSize.cpp for capacity reflect that, in this particular implementation, a byte is set aside to easily accommodate the insertion of a null terminator. Comment

Replacing string characters

The insert( ) function is particularly nice because it absolves you of making sure the insertion of characters in a string won’t overrun the storage space or overwrite the characters immediately following the insertion point. Space grows, and existing characters politely move over to accommodate the new elements. Sometimes, however, this might not be what you want to happen. If you want the size of the string to remain unchanged, use the replace( ) function to overwrite characters. There are quite a number of overloaded versions of replace( ), but the simplest one takes three arguments: an integer indicating where to start in the string, an integer indicating how many characters to eliminate from the original string, and the replacement string (which can be a different number of characters than the eliminated quantity). Here’s a simple example: Comment

//: C03:StringReplace.cpp

// Simple find-and-replace in strings

#include <cassert>

#include <string>

using namespace std;

 

int main() {

  string s("A piece of text");

  string tag("$tag$");

  s.insert(8, tag + ' ');

  assert(s == "A piece $tag$ of text");

  int start = s.find(tag);

  assert(start == 8);

  assert(tag.size() == 5);

  s.replace(start, tag.size(), "hello there");

  assert(s == "A piece hello there of text");

} ///:~

 

The tag is first inserted into s (notice that the insert happens before the value indicating the insert point and that an extra space was added after tag), and then it is found and replaced. Comment

You should actually check to see if you’ve found anything before you perform a replace( ). The previous example replaces with a char*, but there’s an overloaded version that replaces with a string. Here’s a more complete demonstration replace( ):

//: C03:Replace.cpp

#include <cassert>

#include <cstddef>  // for size_t

#include <string>

using namespace std;

 

void replaceChars(string& modifyMe,

  const string& findMe, const string& newChars) {

  // Look in modifyMe for the "find string"

  // starting at position 0

  size_t i = modifyMe.find(findMe, 0);

  // Did we find the string to replace?

  if (i != string::npos)

    // Replace the find string with newChars

    modifyMe.replace(i, findMe.size(), newChars);

}

 

int main() {

  string bigNews =

   "I thought I saw Elvis in a UFO. "

   "I have been working too hard.";

  string replacement("wig");

  string findMe("UFO");

  // Find "UFO" in bigNews and overwrite it:

  replaceChars(bigNews, findMe, replacement);

  assert(bigNews == "I thought I saw Elvis in a "

         "wig. I have been working too hard.");

} ///:~

 

If replace doesn’t find the search string, it returns string::npos. The npos data member is a static constant member of the string class that represents a nonexistent character position.[30] Comment

Unlike insert( ), replace( ) won’t grow the string’s storage space if you copy new characters into the middle of an existing series of array elements. However, it will grow the storage space if needed, for example, when you make a “replacement” that would expand the original string beyond the end of the current allocation. Here’s an example: Comment

//: C03:ReplaceAndGrow.cpp

#include <cassert>

#include <string>

using namespace std;

 

int main() {

  string bigNews("I have been working the grave.");

  string replacement("yard shift.");

  // The first arg says "replace chars

  // beyond the end of the existing string":

  bigNews.replace(bigNews.size() - 1,

    replacement.size(), replacement);

  assert(bigNews == "I have been working the "

         "graveyard shift.");

} ///:~

 

The call to replace( ) begins “replacing” beyond the end of the existing array, which is equivalent to an append operation. Notice that in this example replace( ) expands the array accordingly. Comment

You may have been hunting through this chapter trying to do something relatively simple such as replace all the instances of one character with a different character. Upon finding the previous material on replacing, you thought you found the answer, but then you started seeing groups of characters and counts and other things that looked a bit too complex. Doesn’t string have a way to just replace one character with another everywhere? Comment

You can easily write such a function using the find( ) and replace( ) member functions as follows:

//: C03:ReplaceAll.cpp {O}

#include <cstddef>

#include <string>

using namespace std;

 

string& replaceAll(string& context, const string& from,

  const string& to) {

  size_t lookHere = 0;

  size_t foundHere;

while ((foundHere = context.find(from, lookHere))

  != string::npos) {

    context.replace(foundHere, from.size(), to);

lookHere = foundHere + to.size();

  }

  return context;

} ///:~

 

The version of find( ) used here takes as a second argument the position to start looking in and returns string::npos if it doesn’t find it. It is important to advance the position held in the variable lookHere past the replacement string, of course, in case from is a substring of to. The following program tests the replaceAll function: Comment

//: C03:ReplaceAllTest.cpp

//{-msc}

//{L} ReplaceAll

#include <iostream>

#include <cassert>

using namespace std;

 

string& replaceAll(string& context, const string& from,

  const string& to);

 

int main() {

  string text = "a man, a plan, a canal, panama";

  replaceAll(text, "an", "XXX");

  assert(text == "a mXXX, a plXXX, a cXXXal, pXXXama");

} ///:~

 

As you can see, the string class by itself doesn’t solve all possible problems. Many solutions have been left to the algorithms in the Standard library,[31] because the string class can look just like an STL sequence (by virtue of the iterators discussed earlier). All the generic algorithms work on a “range” of elements within a container. Usually that range is just “from the beginning of the container to the end.” A string object looks like a container of characters: to get the beginning of the range you use string::begin( ), and to get the end of the range you use string::end( ). The following example shows the use of the replace( ) algorithm to replace all the instances of the single character ‘X’ with ‘Y’: Comment

//: C03:StringCharReplace.cpp

#include <algorithm>

#include <cassert>

#include <string>

using namespace std;

 

int main() {

  string s("aaaXaaaXXaaXXXaXXXXaaa");

  replace(s.begin(), s.end(), 'X', 'Y');

  assert(s == "aaaYaaaYYaaYYYaYYYYaaa");

} ///:~

 

Notice that this replace( ) is not called as a member function of string. Also, unlike the string::replace( ) functions that only perform one replacement, the replace( ) algorithm replaces all instances of one character with another. Comment

The replace( ) algorithm only works with single objects (in this case, char objects) and will not replace quoted char arrays or string objects. Since a string behaves like an STL sequence, a number of other algorithms can be applied to it, which might solve other problems that are not directly addressed by the string member functions. Comment

Concatenation using
nonmember overloaded operators

One of the most delightful discoveries awaiting a C programmer learning about C++ string handling is how simply strings can be combined and appended using operator+ and operator+=. These operators make combining strings syntactically similar to adding numeric data. Comment

//: C03:AddStrings.cpp

#include <string>

#include <cassert>

using namespace std;

 

int main() {

  string s1("This ");

  string s2("That ");

  string s3("The other ");

  // operator+ concatenates strings

  s1 = s1 + s2;

  assert(s1 == "This That ");

  // Another way to concatenates strings

  s1 += s3;

  assert(s1 == "This That The other ");

  // You can index the string on the right

  s1 += s3 + s3[4] + "ooh lala";

  assert(s1 == "This That The other The other "

        "oooh lala");

} ///:~

 

 

Using the operator+ and operator+= operators is a flexible and convenient way to combine string data. On the right side of the statement, you can use almost any type that evaluates to a group of one or more characters. Comment

Searching in strings

The find family of string member functions allows you to locate a character or group of characters within a given string. Here are the members of the find family and their general usage :

string find member function

What/how it finds

 find( )

Searches a string for a specified character or group of characters and returns the starting position of the first occurrence found or npos if no match is found. (npos is a const of –1 [cast as a std::size_t] and indicates that a search failed.)

 find_first_of( )

Searches a target string and returns the position of the first match of any character in a specified group. If no match is found, it returns npos.

 find_last_of( )

Searches a target string and returns the position of the last match of any character in a specified group. If no match is found, it returns npos.

 find_first_not_of( )

Searches a target string and returns the position of the first element that doesn’t match any character in a specified group. If no such element is found, it returns npos.

 find_last_not_of( )

Searches a target string and returns the position of the element with the largest subscript that doesn’t match any character in a specified group. If no such element is found, it returns npos.

 rfind( )

Searches a string from end to beginning for a specified character or group of characters and returns the starting position of the match if one is found. If no match is found, it returns npos.

 

The simplest use of find( ) searches for one or more characters in a string. This overloaded version of find( ) takes a parameter that specifies the character(s) for which to search and optionally a parameter that tells it where in the string to begin searching for the occurrence of a substring. (The default position at which to begin searching is 0.) By setting the call to find inside a loop, you can easily move through a string, repeating a search to find all the occurrences of a given character or group of characters within the string. Comment

The following program uses the method of The Sieve of Eratosthenes to find prime numbers less than 50. This method starts with the number 2, marks all subsequent multiples of 2 as not prime, and repeats the process for the next prime candidate. Notice that we define the string object sieveChars using a constructor idiom that sets the initial size of the character array and writes the value ‘P’ to each of its member. Comment

//: C03:Sieve.cpp

//{L} ../TestSuite/Test

#include <cmath>

#include <cstddef>

#include <string>

#include "../TestSuite/Test.h"

using namespace std;

 

class SieveTest : public TestSuite::Test {

  string sieveChars;

public:

  // Create a 50 char string and set each

  // element to 'P' for Prime

SieveTest() : sieveChars(50, 'P') {}

  void run() {

    findPrimes();

    testPrimes();

  }

  bool isPrime(int p) {

    if (p == 0 || p == 1) return false;

    int root = int(sqrt(double(p)));

    for (int i = 2; i <= root; ++i)

      if (p % i == 0) return false;

    return true;

  }

  void findPrimes() {

    // By definition neither 0 nor 1 is prime.

    // Change these elements to "N" for Not Prime

    sieveChars.replace(0, 2, "NN");

    // Walk through the array:

    size_t sieveSize = sieveChars.size();

    int root = int(sqrt(double(sieveSize)));

    for (int i = 2; i <= root; ++i)

      // Find all the multiples:

      for (size_t factor = 2; factor * i < sieveSize;

           ++factor)

        sieveChars[factor * i] = 'N';

  }

  void testPrimes() {

    size_t i = sieveChars.find('P');

    while (i != string::npos) {

      test_(isPrime(i++));

      i = sieveChars.find('P', i);

    }

    i = sieveChars.find_first_not_of('P');

    while (i != string::npos) {

      test_(!isPrime(i++));

      i = sieveChars.find_first_not_of('P', i);

    }

  }

};

 

int main() {

  SieveTest t;

  t.run();

  return t.report();

} ///:~

 

The find( ) function allows you to walk forward through a string, detecting multiple occurrences of a character or a group of characters, and find_first_not_of( ) allows you to find other characters or substrings. Comment

There are no functions in the string class to change the case of a string, but you can easily create these functions using the Standard C library functions toupper( ) and tolower( ), which change the case of one character at a time. The following example illustrates a case-insensitive search: Comment

//: C03:Find.cpp

//{L} ../TestSuite/Test

#include <cctype>

#include <cstddef>

#include <string>

#include "../TestSuite/Test.h"

using namespace std;

 

// Make an uppercase copy of s

string upperCase(const string& s) {

  string upper(s);

  for(size_t i = 0; i < s.length(); ++i)

    upper[i] = toupper(upper[i]);

  return upper;

}

 

// Make a lowercase copy of s

string lowerCase(const string& s) {

  string lower(s);

  for(size_t i = 0; i < s.length(); ++i)

    lower[i] = tolower(lower[i]);

  return lower;

}

 

class FindTest : public TestSuite::Test {

  string chooseOne;

public:

  FindTest() : chooseOne("Eenie, Meenie, Miney, Mo") {}

  void testUpper() {

    string upper = upperCase(chooseOne);

    const string LOWER = "abcdefghijklmnopqrstuvwxyz";

    test_(upper.find_first_of(LOWER) == string::npos);

  }

  void testLower() {

    string lower = lowerCase(chooseOne);

    const string UPPER = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    test_(lower.find_first_of(UPPER) == string::npos);

  }

  void testSearch() {

    // Case sensitive search

    size_t i = chooseOne.find("een");

    test_(i == 8);

    // Search lowercase:

    string test = lowerCase(chooseOne);

    i = test.find("een");

    test_(i == 0);

    i = test.find("een", ++i);

    test_(i == 8);

    i = test.find("een", ++i);

    test_(i == string::npos);

    // Search uppercase:

    test = upperCase(chooseOne);

    i = test.find("EEN");

    test_(i == 0);

    i = test.find("EEN", ++i);

    test_(i == 8);

    i = test.find("EEN", ++i);

    test_(i == string::npos);

  }

  void run() {

    testUpper();

    testLower();

    testSearch();

  }

};

 

int main() {

  FindTest t;

  t.run();

  return t.report();

} ///:~

 

Both the upperCase( ) and lowerCase( ) functions follow the same form: they make a copy of the argument string and change the case. The NewFind.cpp program isn’t the best solution to the case-sensitivity problem, so we’ll revisit it when we examine string comparisons. Comment

Finding in reverse

Sometimes it’s necessary to search through a string from end to beginning, if you need to find the data in “last in / first out” order. The string member function rfind( ) handles this job. Comment

//: C03:Rparse.cpp

//{L} ../TestSuite/Test

#include <string>

#include <vector>

#include "../TestSuite/Test.h"

using namespace std;

 

class RparseTest : public TestSuite::Test {

  // To store the words:

  vector<string> strings;

public:

  void parseForData() {

    // The ';' characters will be delimiters

    string s("now.;sense;make;to;going;is;This");

    // The last element of the string:

    int last = s.size();

    // The beginning of the current word:

    int current = s.rfind(';');

    // Walk backward through the string:

    while(current != string::npos){

      // Push each word into the vector.

      // Current is incremented before copying to

      // avoid copying the delimiter:

      ++current;

      strings.push_back(

        s.substr(current, last - current));

      // Back over the delimiter we just found,

      // and set last to the end of the next word:

      current -= 2;

      last = current + 1;

      // Find the next delimiter

      current = s.rfind(';', current);

    }

    // Pick up the first word - it's not

    // preceded by a delimiter

    strings.push_back(s.substr(0, last));

  }

  void testData() {

    // Test order them in the new order:

    test_(strings[0] == "This");

    test_(strings[1] == "is");

    test_(strings[2] == "going");

    test_(strings[3] == "to");

    test_(strings[4] == "make");

    test_(strings[5] == "sense");

    test_(strings[6] == "now.");

    string sentence;

    for(int i = 0; i < strings.size() - 1; i++)

      sentence += strings[i] += " ";

    // Manually put last word in to avoid an extra space

    sentence += strings[strings.size() - 1];

    test_(sentence == "This is going to make sense now.");

  }

  void run() {

    parseForData();

    testData();

  }

};

 

int main() {

  RparseTest t;

  t.run();

  return t.report();

} ///:~

 

The string member function rfind( ) backs through the string looking for tokens and reporting the array index of matching characters or string::npos if it is unsuccessful. Comment

Finding first/last of a set of characters

The find_first_of( ) and find_last_of( ) member functions can be conveniently put to work to create a little utility that will strip whitespace characters from both ends of a string. Notice that it doesn’t touch the original string, but instead returns a new string: Comment

//: C03:Trim.h

#ifndef TRIM_H

#define TRIM_H

#include <string>

// General tool to strip spaces from both ends:

inline std::string trim(const std::string& s) {

  if(s.length() == 0)

    return s;

  int beg = s.find_first_not_of(" \a\b\f\n\r\t\v");

  int end = s.find_last_not_of(" \a\b\f\n\r\t\v");

  if(beg == std::string::npos) // No non-spaces

    return "";

  return std::string(s, beg, end - beg + 1);

}

#endif // TRIM_H ///:~

 

The first test checks for an empty string; in that case, no tests are made, and a copy is returned. Notice that once the end points are found, the string constructor builds a new string from the old one, giving the starting count and the length. Comment

Testing such a general-purpose tool needs to be thorough: Comment

//: C03:TrimTest.cpp

//{L} ../TestSuite/Test

#include <iostream>

#include "Trim.h"

#include "../TestSuite/Test.h"

using namespace std;

 

string s[] = {

  " \t abcdefghijklmnop \t ",

  "abcdefghijklmnop \t ",

  " \t abcdefghijklmnop",

  "a", "ab", "abc", "a b c",

  " \t a b c \t ", " \t a \t b \t c \t ",

  "\t \n \r \v \f",

  "" // Must also test the empty string

};

 

class TrimTest : public TestSuite::Test {

public:

  void testTrim() {

    test_(trim(s[0]) == "abcdefghijklmnop");

    test_(trim(s[1]) == "abcdefghijklmnop");

    test_(trim(s[2]) == "abcdefghijklmnop");

    test_(trim(s[3]) == "a");

    test_(trim(s[4]) == "ab");

    test_(trim(s[5]) == "abc");

    test_(trim(s[6]) == "a b c");

    test_(trim(s[7]) == "a b c");

    test_(trim(s[8]) == "a \t b \t c");

    test_(trim(s[9]) == "");

    test_(trim(s[10]) == "");

  }

  void run() {

    testTrim();

  }

};

 

int main() {

  TrimTest t;

  t.run();

  return t.report();

} ///:~

 

In the array of strings, you can see that the character arrays are automatically converted to string objects. This array provides cases to check the removal of spaces and tabs from both ends, as well as ensuring that spaces and tabs are not removed from the middle of a string. Comment

Removing characters from strings

Removing characters is easy and efficient with the erase( ) member function, which takes two arguments: where to start removing characters (which defaults to 0), and how many to remove (which defaults to string::npos). If you specify more characters than remain in the string, the remaining characters are all erased anyway (so calling erase( ) without any arguments removes all characters from a string). Sometimes it’s useful to take an HTML file and strip its tags and special characters so that you have something approximating the text that would be displayed in the Web browser, only as a plain text file. The following uses erase( ) to do the job: Comment

//: C03:HTMLStripper.cpp

//{L} ReplaceAll

// Filter to remove html tags and markers

#include <cassert>

#include <cmath>

#include <cstddef>

#include <fstream>

#include <iostream>

#include <string>

#include "../require.h"

using namespace std;

 

string& replaceAll(string& context, const string& from,

  const string& to);

 

string& stripHTMLTags(string& s) {

  static bool inTag = false;

  bool done = false;

  while (!done) {

    if (inTag) {

      // The previous line started an HTML tag

      // but didn't finish. Must search for '>'.

      size_t rightPos = s.find('>');

      if (rightPos != string::npos) {

        inTag = false;

        s.erase(0, rightPos + 1);

      }

      else {

        done = true;

        s.erase();

      }

    }

    else {

      // Look for start of tag:

      size_t leftPos = s.find('<');

      if (leftPos != string::npos) {

        // See if tag close is in this line

        size_t rightPos = s.find('>');

        if (rightPos == string::npos) {

          inTag = done = true;

          s.erase(leftPos);

        }

        else

          s.erase(leftPos, rightPos - leftPos + 1);

      }

      else

        done = true;

    }

  }

  // Remove all special HTML characters

  replaceAll(s, "&lt;", "<");

  replaceAll(s, "&gt;", ">");

  replaceAll(s, "&amp;", "&");

  replaceAll(s, "&nbsp;", " ");

  // Etc...

  return s;

}

 

int main(int argc, char* argv[]) {

  requireArgs(argc, 1,

    "usage: HTMLStripper InputFile");

  ifstream in(argv[1]);

  assure(in, argv[1]);

  string s;

  while(getline(in, s))

    if (!stripHTMLTags(s).empty())

      cout << s << endl;

} ///:~

 

This example will even strip HTML tags that span multiple lines.[32] This is accomplished with the static flag, inTag, which is true whenever the start of a tag is found, but the accompanying tag end is not found in the same line. All forms of erase( ) appear in the stripHTMLFlags( ) function.[33] The version of getline( ) we use here is a global function declared in the <string> header and is handy because it stores an arbitrarily long line in its string argument. You don’t have to worry about the dimension of a character array as you do with istream::getline( ). Notice that this program uses the replaceAll( ) function from earlier in this chapter. In the next chapter, we’ll use string streams to create a more elegant solution. Comment

Comparing strings

Comparing strings is inherently different from comparing numbers. Numbers have constant, universally meaningful values. To evaluate the relationship between the magnitudes of two strings, you must make a lexical comparison. Lexical comparison means that when you test a character to see if it is “greater than” or “less than” another character, you are actually comparing the numeric representation of those characters as specified in the collating sequence of the character set being used. Most often this will be the ASCII collating sequence, which assigns the printable characters for the English language numbers in the range 32 through 127 decimal. In the ASCII collating sequence, the first “character” in the list is the space, followed by several common punctuation marks, and then uppercase and lowercase letters. With respect to the alphabet, this means that the letters nearer the front have lower ASCII values than those nearer the end. With these details in mind, it becomes easier to remember that when a lexical comparison that reports s1 is “greater than” s2, it simply means that when the two were compared, the first differing character in s1 came later in the alphabet than the character in that same position in s2. Comment

C++ provides several ways to compare strings, and each has advantages. The simplest to use are the nonmember, overloaded operator functions: operator ==, operator != operator >, operator <, operator >=, and operator <=. Comment

//: C03:CompStr.cpp

//{L} ../TestSuite/Test

#include <string>

#include "../TestSuite/Test.h"

using namespace std;

 

class CompStrTest : public TestSuite::Test {

public:

  void run() {

    // Strings to compare

    string s1("This");

    string s2("That");

    test_(s1 == s1);

    test_(s1 != s2);

    test_(s1 > s2);

    test_(s1 >= s2);

    test_(s1 >= s1);

    test_(s2 < s1);

    test_(s2 <= s1);

    test_(s1 <= s1);

  }

};

 

int main() {

  CompStrTest t;

  t.run();

  return t.report();

} ///:~

 

The overloaded comparison operators are useful for comparing both full strings and individual string character elements. Comment

Notice in the following code fragment the flexibility of argument types on both the left and right side of the comparison operators. For efficiency, the string class provides overloaded operators for the direct comparison of string objects, quoted literals, and pointers to C-style strings without having to create temporary string objects. Comment

// The lvalue is a quoted literal and

// the rvalue is a string

if("That" == s2)

  cout << "A match" << endl;

// The left operand below is a string and the right is a

// pointer to a C-style null terminated string

if(s1 != s2.c_str())

  cout << "No match" << endl;

 

The c_str( ) function returns a const char* that points to a C-style, null-terminated string equivalent to the contents of the string object. This comes in handy when you want to pass a string to a standard C function, such as atoi( ) or any of the functions defined in the <cstring> header. It is an error to use the value returned by c_str( ) as non-const argument to any function. Comment

You won’t find the logical not (!) or the logical comparison operators (&& and ||) among operators for a string. (Neither will you find overloaded versions of the bitwise C operators &, |, ^, or ~.) The overloaded nonmember comparison operators for the string class are limited to the subset that has clear, unambiguous application to single characters or groups of characters. Comment

The compare( ) member function offers you a great deal more sophisticated and precise comparison than the nonmember operator set. It provides overloaded versions that allow you to compare two complete strings, part of either string to a complete string, and subsets of two strings. The following example compares complete strings: Comment

//: C03:Compare.cpp

// Demonstrates compare(), swap()

#include <cassert>

#include <string>

using namespace std;

 

int main() {

  string first("This");

  string second("That");

  assert(first.compare(first) == 0);

  assert(second.compare(second) == 0);

  // Which is lexically greater?

  assert(first.compare(second) > 0);

  assert(second.compare(first) < 0);

  first.swap(second);

  assert(first.compare(second) < 0);

  assert(second.compare(first) > 0);

} ///:~

 

The swap( ) function in this example does what its name implies: it exchanges the contents of its object and argument. To compare a subset of the characters in one or both strings, you add arguments that define where to start the comparison and how many characters to consider. For example, we can use the overloaded version of compare( ): Comment

s1.compare(s1StartPos, s1NumberChars, s2, s2StartPos, s2NumberChars); Comment

Here’s an example: Comment

//: C03:Compare2.cpp

// Illustrate overloaded compare()

#include <cassert>

#include <string>

using namespace std;

 

int main() {

  string first("This is a day that will live in infamy");

  string second("I don't believe that this is what "

                "I signed up for");

  // Compare "his is" in both strings:

  assert(first.compare(1, 7, second, 22, 7) == 0);

  // Compare "his is a" to "his is w":

  assert(first.compare(1, 9, second, 22, 9) < 0);

} ///:~

 

In the examples so far, we have used C-style array indexing syntax to refer to an individual character in a string. C++ strings provide an alternative to the s[n] notation: the at( ) member. These two indexing mechanisms produce the same result in C++ if all goes well: Comment

//: C03:StringIndexing.cpp

#include <cassert>

#include <string>

using namespace std;

int main(){

  string s("1234");

  assert(s[1] == '2');

  assert(s.at(1) == '2');

} ///:~

 

There is one important difference, however, between [ ] and at( ). When you try to reference an array element that is out of bounds, at( ) will do you the kindness of throwing an exception, while ordinary [ ] subscripting syntax will leave you to your own devices: Comment

//: C03:BadStringIndexing.cpp

#include <exception>

#include <iostream>

#include <string>

using namespace std;

 

int main(){

  string s("1234");

  // at() saves you by throwing an exception:

  try {

    s.at(5);

  } catch(exception& e) {

    cerr << e.what() << endl;

  }

} ///:~

 

Responsible programmers will not use errant indexes, but should you want to benefits of automatic index checking, using at( ) in place of [ ] will give you a chance to gracefully recover from references to array elements that don’t exist. Execution of this program on one of our test compilers gave the following output:

invalid string position

 

The at( ) member throws an object of class out_of_range, which derives (ultimately) from std::exception. By catching this object in an exception handler, you can take appropriate remedial actions such as recalculating the offending subscript or growing the array. Using string::operator[]( ) gives no such protection and is as dangerous as char array processing in C.[34] Comment

Strings and character traits

The program Find.cpp earlier in this chapter leads us to ask the obvious question: Why isn’t case-insensitive comparison part of the standard string class? The answer provides interesting background on the true nature of C++ string objects. Comment

Consider what it means for a character to have “case.” Written Hebrew, Farsi, and Kanji don’t use the concept of upper- and lowercase, so for those languages this idea has no meaning. It would seem that if there were a way to designate some languages as “all uppercase” or “all lowercase,” we could design a generalized solution. However, some languages that employ the concept of “case” also change the meaning of particular characters with diacritical marks: for example, the cedilla in Spanish, the circumflex in French, and the umlaut in German. For this reason, any case-sensitive collating scheme that attempts to be comprehensive will be nightmarishly complex to use. Comment

Although we usually treat the C++ string as a class, this is really not the case. The string type is actually a specialization of a more general constituent, the basic_string< > template. Observe how string is declared in the standard C++ header file:[35] Comment

typedef basic_string<char> string;

 

To really understand the nature of the string class, it’s helpful to delve a bit deeper and look at the template on which it is based. Here’s the declaration of the basic_string< > template: Comment

template<class charT,

  class traits = char_traits<charT>,

  class allocator = allocator<charT> >

  class basic_string;

 

In Chapter 5, we examine templates in great detail (much more than in Chapter 16 of Volume 1). For now, the main thing to notice about the two previous declarations is that the string type is created when the basic_string template is instantiated with char. Inside the basic_string< > template declaration, the line Comment

class traits = char_traits<charT>,

 

tells us that the behavior of the class made from the basic_string< > template is specified by a class based on the template char_traits< >. Thus, the basic_string< > template provides for cases in which you need string-oriented classes that manipulate types other than char (wide characters, for example). To do this, the char_traits< > template controls the content and collating behaviors of a variety of character sets using the character comparison functions eq( ) (equal), ne( ) (not equal), and lt( ) (less than) upon which the basic_string< > string comparison functions rely. Comment

This is why the string class doesn’t include case-insensitive member functions: that’s not in its job description. To change the way the string class treats character comparison, you must supply a different char_traits< > template, because that defines the behavior of the individual character comparison member functions. Comment

You can use this information to make a new type of string class that ignores case. First, we’ll define a new case-insensitive char_traits< > template that inherits from the existing template. Next, we’ll override only the members we need to change to make character-by-character comparison case insensitive. (In addition to the three lexical character comparison members mentioned earlier, we’ll also have to supply a new implementation for the char_traits functions find( ) and compare( ).) Finally, we’ll typedef a new class based on basic_string, but using the case-insensitive ichar_traits template for its second argument. Comment

//: C03:ichar_traits.h

// Creating your own character traits

#ifndef ICHAR_TRAITS_H

#define ICHAR_TRAITS_H

#include <cassert>

#include <cctype>

#include <cmath>

#include <ostream>

#include <string>

 

using std::toupper;

using std::tolower;

using std::ostream;

using std::string;

using std::char_traits;

using std::allocator;

using std::basic_string;

 

struct ichar_traits : char_traits<char> {

  // We'll only change character-by-

  // character comparison functions

  static bool eq(char c1st, char c2nd) {

    return toupper(c1st) == toupper(c2nd);

  }

  static bool ne(char c1st, char c2nd) {

    return !eq(c1st, c2nd);

  }

  static bool lt(char c1st, char c2nd) {

    return toupper(c1st) < toupper(c2nd);

  }

  static int compare(const char* str1,

    const char* str2, size_t n) {

    for(size_t i = 0; i < n; i++) {

      if(str1 == 0)

        return -1;

      else if(str2 == 0)

        return 1;

      else if(tolower(*str1) < tolower(*str2))

        return -1;

      else if(tolower(*str1) > tolower(*str2))

        return 1;

      assert(tolower(*str1) == tolower(*str2));

      str1++; str2++; // Compare the other chars

    }

    return 0;

  }

  static const char* find(const char* s1,

    size_t n, char c) {

    while(n-- > 0)

      if(toupper(*s1) == toupper(c))

        return s1;

      else

        ++s1;

    return 0;

  }

};

 

typedef basic_string<char, ichar_traits> istring;

 

inline ostream& operator<<(ostream& os, const istring& s) {

  return os << string(s.c_str(), s.length());

}

#endif // ICHAR_TRAITS_H  ///:~

 

We provide a typedef named istring so that our class will act like an ordinary string in every way, except that it will make all comparisons without respect to case. For convenience, we’ve also provided an overloaded operator<<( ) so that you can print istrings. Here’s an example: Comment

//: C03:ICompare.cpp

#include <cassert>

#include <iostream>

#include "ichar_traits.h"

using namespace std;

 

int main() {

  // The same letters except for case:

  istring first = "tHis";

  istring second = "ThIS";

  cout << first << endl;

  cout << second << endl;

  assert(first.compare(second) == 0);

  assert(first.find('h') == 1);

  assert(first.find('I') == 2);

  assert(first.find('x') == string::npos);

} ///:~

 

This is just a toy example, of course. To make istring fully equivalent to string, we’d have to create the other functions necessary to support the new istring type. Comment

The <string> header provides a wide string class via the following typedef:

typedef basic_string<wchar_t> wstring;

 

Wide string support also reveals itself in wide streams (wostream in place of ostream, also defined in <iostream>) and in the header <cwctype>, a wide-character version of <cctype>. This along with the wchar_t specialization of char_traits in the standard library allows us to do a wide-character version of ichar_traits:

//: C03:iwchar_traits.h

//{-bor}

//{-g++}

// Creating your own wide-character traits

#ifndef IWCHAR_TRAITS_H

#define IWCHAR_TRAITS_H

#include <cassert>

#include <cwctype>

#include <cmath>

#include <ostream>

#include <string>

 

using std::towupper;

using std::towlower;

using std::wostream;

using std::wstring;

using std::char_traits;

using std::allocator;

using std::basic_string;

 

struct iwchar_traits : char_traits<wchar_t> {

  // We'll only change character-by-

  // character comparison functions

  static bool eq(wchar_t c1st, wchar_t c2nd) {

    return towupper(c1st) == towupper(c2nd);

  }

  static bool ne(wchar_t c1st, wchar_t c2nd) {

    return towupper(c1st) != towupper(c2nd);

  }

  static bool lt(wchar_t c1st, wchar_t c2nd) {

    return towupper(c1st) < towupper(c2nd);

  }

  static int compare(const wchar_t* str1,

    const wchar_t* str2, size_t n) {

    for(size_t i = 0; i < n; i++) {

      if(str1 == 0)

        return -1;

      else if(str2 == 0)

        return 1;

      else if(towlower(*str1) < towlower(*str2))

        return -1;

      else if(towlower(*str1) > towlower(*str2))

        return 1;

      assert(towlower(*str1) == towlower(*str2));

      str1++; str2++; // Compare the other wchar_ts

    }

    return 0;

  }

  static const wchar_t* find(const wchar_t* s1,

    size_t n, wchar_t c) {

    while(n-- > 0)

      if(towupper(*s1) == towupper(c))

        return s1;

      else

        ++s1;

    return 0;

  }

};

 

typedef basic_string<wchar_t, iwchar_traits> iwstring;

 

inline wostream& operator<<(wostream& os,

  const iwstring& s) {

  return os << wstring(s.c_str(), s.length());

}

#endif // IWCHAR_TRAITS_H  ///:~

 

As you can see, this is mostly an exercise in placing a ‘w’ in the appropriate place in the source code. The test program looks like this:

//: C03:IWCompare.cpp

//{-g++}

#include <cassert>

#include <iostream>

#include "iwchar_traits.h"

using namespace std;

 

int main() {

  // The same letters except for case:

  iwstring wfirst = L"tHis";

  iwstring wsecond = L"ThIS";

  wcout << wfirst << endl;

  wcout << wsecond << endl;

  assert(wfirst.compare(wsecond) == 0);

  assert(wfirst.find('h') == 1);

  assert(wfirst.find('I') == 2);

  assert(wfirst.find('x') == wstring::npos);

} ///:~

 

Unfortunately, some compilers still do not provide robust support for wide characters. Comment

A string application

If you’ve looked at the sample code in this book closely, you’ve noticed that certain tokens in the comments surround the code. These are used by a Python program that Bruce wrote to extract the code into files and set up makefiles for building the code. For example, a double-slash followed by a colon at the beginning of a line denotes the first line of a source file. The rest of the line contains information describing the file’s name and location and whether it should be only compiled rather than fully built into an executable file. For example, the first line in the previous program above contains the string C03:IWCompare.cpp, indicating that the file IWCompare.cpp should be extracted into the directory C03. Comment

The last line of a source file contains a triple-slash followed by a colon and a tilde. If the first line has an exclamation point immediately after the colon, the first and last lines of the source code are not to be output to the file (this is for data-only files). (If you’re wondering why we’re avoiding showing you these tokens, it’s because we don’t want to break the code extractor when applied to the text of the book!) Comment

Bruce’s Python program does a lot more than just extract code. If the token “{O}” follows the file name, its makefile entry will only be set up to compile the file and not to link it into an executable. (The Test Framework in Chapter 2 is built this way.) To link such a file with another source example, the target executable’s source file will contain an “{L}” directive, as in Comment

//{L} ../TestSuite/Test

 

This section will present a program to just extract all the code so that you can compile and inspect it manually. You can use this program to extract all the code in this book by saving the document file as a text file[36] (let’s call it TICV2.txt) and by executing something like the following on a shell command line: Comment

C:> extractCode TICV2.txt /TheCode

 

This command reads the text file TICV2.txt and writes all the source code files in subdirectories under the top-level directory /TheCode. The directory tree will look like the following:

TheCode/

   C0B/

   C01/

   C02/

   C03/

   C04/

   C05/

   C06/

   C07/

   C08/

   C09/

   C10/

   C11/

   TestSuite/

 

The source files containing the examples from each chapter will be in the corresponding directory. Comment

Here’s the program:

//: C03:ExtractCode.cpp

// Extracts code from text

#include <cassert>

#include <cstddef>

#include <cstdio>

#include <cstdlib>

#include <fstream>

#include <iostream>

#include <string>

using namespace std;

// Legacy non-standard C header for mkdir()

#ifdef __GNUC__

#include <sys/stat.h>

#elif defined(__BORLANDC__) || defined(_MSC_VER)

#include <direct.h>

#else

#error Compiler not supported

#endif

 

// Check to see if directory exists

// by attempting to open a new file

// for output within it.

bool exists(string fname) {

  size_t len = fname.length();

  if(fname[len-1] != '/' && fname[len-1] != '\\')

    fname.append("/");

  fname.append("000.tmp");

  ofstream outf(fname.c_str());

  bool existFlag = outf;

  if (outf) {

    outf.close();

    remove(fname.c_str());

  }

  return existFlag;

}

 

int main(int argc, char* argv[]) {

  // See if input file name provided

  if(argc == 1) {

    cerr << "usage: extractCode file [dir]\n";

    exit(EXIT_FAILURE);

  }

  // See if input file exists

  ifstream inf(argv[1]);

  if(!inf) {

    cerr << "error opening file: " << argv[1] << endl;

    exit(EXIT_FAILURE);

  }

  // Check for optional output directory

  string root("./");  // current is default

  if(argc == 3) {

    // See if output directory exists

    root = argv[2];

    if(!exists(root)) {

      cerr << "no such directory: " << root << endl;

      exit(EXIT_FAILURE);

    }

    size_t rootLen = root.length();

    if(root[rootLen-1] != '/' && root[rootLen-1] != '\\')

      root.append("/");

  }

  // Read input file line by line

  // checking for code delimiters

  string line;

  bool inCode = false;

  bool printDelims = true;

  ofstream outf;

  while (getline(inf, line)) {

    size_t findDelim = line.find("//" "/:~");

    if(findDelim != string::npos) {

      // Output last line and close file

      if (!inCode) {

        cerr << "Lines out of order\n";

        exit(EXIT_FAILURE);

      }

      assert(outf);

      if (printDelims)

        outf << line << endl;

      outf.close();

      inCode = false;

      printDelims = true;

    } else {

      findDelim = line.find("//" ":");

      if(findDelim == 0) {

        // Check for '!' directive

        if(line[3] == '!') {

          printDelims = false;

          ++findDelim;  // To skip '!' for next search

        }

        // Extract subdirectory name, if any

        size_t startOfSubdir =

          line.find_first_not_of(" \t", findDelim+3);

        findDelim = line.find(':', startOfSubdir);

        if (findDelim == string::npos) {

          cerr << "missing filename information\n" << endl;

          exit(EXIT_FAILURE);

        }

        string subdir;

        if(findDelim > startOfSubdir)

          subdir = line.substr(startOfSubdir,

                               findDelim - startOfSubdir);

        // Extract file name (better be one!)

        size_t startOfFile = findDelim + 1;

        size_t endOfFile =

          line.find_first_of(" \t", startOfFile);

        if(endOfFile == startOfFile) {

          cerr << "missing filename\n";

          exit(EXIT_FAILURE);

        }

        // We have all the pieces; build fullPath name

        string fullPath(root);

        if(subdir.length() > 0)

          fullPath.append(subdir).append("/");

        assert(fullPath[fullPath.length()-1] == '/');

        if (!exists(fullPath))

#ifdef __GNUC__

          mkdir(fullPath.c_str(), 0);  // Create subdir

#else

          mkdir(fullPath.c_str());  // Create subdir

#endif

        fullPath.append(line.substr(startOfFile,

                        endOfFile - startOfFile));

        outf.open(fullPath.c_str());

        if(!outf) {

          cerr << "error opening " << fullPath

               << " for output\n";

          exit(EXIT_FAILURE);

        }

        inCode = true;

        cout << "Processing " << fullPath << endl;

        if(printDelims)

          outf << line << endl;

      }

      else if(inCode) {

        assert(outf);

        outf << line << endl;  // output middle code line

      }

    }

  }

  exit(EXIT_SUCCESS);

} ///:~

 

First, you’ll notice some conditional compilation directives. The mkdir( ) function, which creates a directory in the file system, is defined by the POSIX[37] standard in the header <sys/stat.h>. Unfortunately, many compilers still use a different header (<direct.h>). The respective signatures for mkdir( ) also differ: POSIX specifies two arguments, the older versions just one. For this reason, there is more conditional compilation later in the program to choose the right call to mkdir( ). We normally don’t use conditional compilation in the examples in this book, but this particular program is too useful not to put a little extra work into, since you can use it to extract all the code with it. Comment

The exists( ) function in ExtractCode.cpp tests whether a directory exists by opening a temporary file in it. If the open fails, the directory doesn’t exist. You remove a file by sending its name as a char* to std::remove( ). Comment

The main program validates the command-line arguments and then reads the input file a line at a time, looking for the special source code delimiters. The Boolean flag inCode indicates that the program is in the middle of a source file, so lines should be output. The printDelims flag will be true if the opening token is not followed by an exclamation point; otherwise the first and last lines are not written. It is important to check for the closing delimiter first, because the start token is a subset of it, and searching for the start token first would return a successful find for both cases. If we encounter the closing token, we verify that we are in the middle of processing a source file; otherwise, something is wrong with the way the delimiters are laid out in the text file. If inCode is true, all is well, and we (optionally) write the last line and close the file. When the opening token is found, we parse the directory and file name components and open the file. The following string-related functions were used in this example: length( ), append( ), getline( ), find( ) (two versions), find_first_not_of( ), substr( ), find_first_of( ), c_str( ), and, of course, operator<<( ). Comment

We also use a standard C technique for reporting program status to the calling context by returning different values from main( ). It is portable to use the statement return 0; to indicate success, but there is no portable value to indicate failure. For this reason we use the macro declared for this very purpose in <cstdlib>: EXIT_FAILURE. As a matter of consistency, whenever we use EXIT_FAILURE we also use EXIT_SUCCESS, even though the latter is always defined as zero. Comment

Summary

C++ string objects provide developers with a number of great advantages over their C counterparts. For the most part, the string class makes referring to strings through the use of character pointers unnecessary. This eliminates an entire class of software defects that arise from the use of uninitialized and incorrectly valued pointers. C++ strings dynamically and transparently grow their internal data storage space to accommodate increases in the size of the string data. This means that when the data in a string grows beyond the limits of the memory initially allocated to it, the string object will make the memory management calls that take space from and return space to the heap. Consistent allocation schemes prevent memory leaks and have the potential to be much more efficient than “roll your own” memory management. Comment

The string class member functions provide a fairly comprehensive set of tools for creating, modifying, and searching in strings. String comparisons are always case sensitive, but you can work around this by copying string data to C-style null-terminated strings and using case-insensitive string comparison functions, temporarily converting the data held in string objects to a single case, or by creating a case-insensitive string class that overrides the character traits used to create the basic_string object. Comment

Exercises

                            1.             Write a program that reverses the order of the characters in a string.

                        18.             A palindrome is a word or group of words that read the same forward and backward. For example “madam” or “wow.” Write a program that takes a string argument from the command line and prints whether the string was a palindrome or not.

                        19.             Make your program from exercise 2 return true even if symmetric letters differ in case. For example, "Civic" would still return true although the first letter is capitalized.

                       20.             Make your program from exercise 3 report true even if the string contains punctuation and spaces. For example "Able was I, ere I saw Elba." would report true.

                        21.             Using the following strings and only chars (no string literals or magic numbers):

string one("I walked down the canyon with the moving mountain bikers.");

string two("The bikers passed by me too close for comfort.");
string three("I went hiking instead.")


                        produce the following sentence:

"I moved down the canyon with the mountain bikers. The mountain bikers passed by me too close for comfort. So I went hiking instead."

                       22.             Write a program named replace that takes three command-line arguments representing an input text file, a string to replace (call it from), and a replacement string (call it to). The program should write a new file to standard output with all occurrences of from replaced by to.

                       23.             Repeat the previous exercise but replace all instances of from regardless of case.

 


4: Iostreams

You can do much more with the general I/O problem than just take standard I/O and turn it into a class.

Wouldn’t it be nice if you could make all the usual “receptacles”—standard I/O, files, and even blocks of memory—look the same so that you need to remember only one interface? That’s the idea behind iostreams. They’re much easier, safer, and sometimes even more efficient than the assorted functions from the Standard C stdio library. Comment

The iostreams classes are usually the first part of the C++ library that new C++ programmers learn to use. This chapter discusses how iostreams are an improvement over C’s stdio facilities and explores the behavior of file and string streams in addition to the standard console streams. Comment

Why iostreams?

You might wonder what’s wrong with the good old C library. Why not “wrap” the C library in a class and be done with it? Indeed, this is the perfect thing to do in some situations. For example, suppose you want to make sure the file represented by a stdio FILE pointer is always safely opened and properly closed, without having to rely on the user to remember to call the close( ) function. The following program is such an attempt. Comment

//: C04:FileClass.h

// stdio files wrapped

#ifndef FILECLASS_H

#define FILECLASS_H

#include <cstdio>

#include <stdexcept>

 

class FileClass {

  std::FILE* f;

public:

  struct FileClassError : std::runtime_error {

  public:

    FileClassError(const char* msg)

      : std::runtime_error(msg) {}

  };

  FileClass(const char* fname, const char* mode = "r");

  ~FileClass();

  std::FILE* fp();

};

#endif // FILECLASS_H ///:~

 

When you perform file I/O in C, you work with a naked pointer to a FILE struct, but this class wraps around the pointer and guarantees it is properly initialized and cleaned up using the constructor and destructor. The second constructor argument is the file mode, which defaults to “r” for “read.” Comment

To fetch the value of the pointer to use in the file I/O functions, you use the fp( ) access function. Here are the member function definitions: Comment

//: C04:FileClass.cpp {O}

// FileClassImplementation

#include "FileClass.h"

#include <cstdlib>

#include <cstdio>

using namespace std;

 

FileClass::FileClass(const char* fname, const char* mode) {

  if((f = fopen(fname, mode)) == 0)

    throw FileClassError("Error opening file");

}

 

FileClass::~FileClass() { fclose(f); }

 

FILE* FileClass::fp() { return f; } ///:~

 

The constructor calls fopen( ), as you would normally do, but it also ensures that the result isn’t zero, which indicates a failure upon opening the file. If the file does not open as expected, an exception is thrown. Comment

The destructor closes the file, and the access function fp( ) returns f. Here’s a simple example using class FileClass: Comment

//: C04:FileClassTest.cpp

// Tests FileClass

//{L} FileClass

#include <cstdlib>

#include <iostream>

#include "FileClass.h"

using namespace std;

 

int main() {

  try {

    FileClass f("FileClassTest.cpp");

    const int BSIZE = 100;

    char buf[BSIZE];

    while(fgets(buf, BSIZE, f.fp()))

      fputs(buf, stdout);

  }

  catch(FileClass::FileClassError& e) {

    cout << e.what() << endl;

    return EXIT_FAILURE;

  }

  return EXIT_SUCCESS;

} // File automatically closed by destructor

///:~

 

You create the FileClass object and use it in normal C file I/O function calls by calling fp( ). When you’re done with it, just forget about it; the file is closed by the destructor at the end of its scope. Comment

Even though the FILE pointer is private, it isn’t particularly safe because fp( ) retrieves it. Since the only effect seems to be guaranteed initialization and cleanup, why not make it public or use a struct instead? Notice that while you can get a copy of f using fp( ), you cannot assign to f— that’s completely under the control of the class. Of course, after capturing the pointer returned by fp( ), the client programmer can still assign to the structure elements or even close it, so the safety is in guaranteeing a valid FILE pointer rather than proper contents of the structure. Comment

If you want complete safety, you must prevent the user from directly accessing the FILE pointer. Some version of all the normal file I/O functions must show up as class members so that everything you can do with the C approach is available in the C++ class: Comment

//: C04:Fullwrap.h

// Completely hidden file IO

#ifndef FULLWRAP_H

#define FULLWRAP_H

 

class File {

  std::FILE* f;

  std::FILE* F(); // Produces checked pointer to f

public:

  File(); // Create object but don't open file

  File(const char* path,

       const char* mode = "r");

  ~File();

  int open(const char* path,

           const char* mode = "r");

  int reopen(const char* path,

             const char* mode);

  int getc();

  int ungetc(int c);

  int putc(int c);

  int puts(const char* s);

  char* gets(char* s, int n);

  int printf(const char* format, ...);

  size_t read(void* ptr, size_t size,

              size_t n);

  size_t write(const void* ptr,

               size_t size, size_t n);

  int eof();

  int close();

  int flush();

  int seek(long offset, int whence);

  int getpos(fpos_t* pos);

  int setpos(const fpos_t* pos);

  long tell();

  void rewind();

  void setbuf(char* buf);

  int setvbuf(char* buf, int type, size_t sz);

  int error();

  void clearErr();

};

#endif // FULLWRAP_H ///:~

 

This class contains almost all the file I/O functions from <cstdio>. (vfprintf( ) is missing; it is used to implement the printf( ) member function.) Comment

File has the same constructor as in the previous example, and it also has a default constructor. The default constructor is important if you want to create an array of File objects or use a File object as a member of another class in which the initialization doesn’t happen in the constructor, but some time after the enclosing object is created. Comment

The default constructor sets the private FILE pointer f to zero. But now, before any reference to f, its value must be checked to ensure it isn’t zero. This is accomplished with F( ), which is private because it is intended to be used only by other member functions. (We don’t want to give the user direct access to the underlying FILE structure in this class.)[38] Comment

This approach is not a terrible solution by any means. It’s quite functional, and you could imagine making similar classes for standard (console) I/O and for in-core formatting (reading/writing a piece of memory rather than a file or the console). Comment

The big stumbling block is the runtime interpreter used for the variable argument list functions. This is the code that parses your format string at runtime and grabs and interprets arguments from the variable argument list. It’s a problem for four reasons. Comment

1.               Even if you use only a fraction of the functionality of the interpreter, the whole thing gets loaded into your executable. So if you say printf("%c", 'x');, you’ll get the whole package, including the parts that print floating-point numbers and strings. There’s no standard option for reducing the amount of space used by the program. Comment

2.               Because the interpretation happens at runtime, you can’t get rid of a performance overhead. It’s frustrating because all the information is there in the format string at compile time, but it’s not evaluated until runtime. However, if you could parse the arguments in the format string at compile time, you could make direct function calls that have the potential to be much faster than a runtime interpreter (although the printf( ) family of functions is usually quite well optimized). Comment

3.               A worse problem is that the format string is not evaluated until runtime: there can be no compile-time error checking. You’re probably familiar with this problem if you’ve tried to find bugs that came from using the wrong number or type of arguments in a printf( ) statement. C++ makes a big deal out of compile-time error checking to find errors early and make your life easier. It seems a shame to throw type safety away for an I/O library, especially because I/O is used a lot. Comment

4.               For C++, the most crucial problem is that the printf( ) family of functions is not particularly extensible. They’re really designed to handle only the four basic data types in C (char, int, float, double, wchar_t, char*, wchar_t*, and void*) and their variations. You might think that every time you add a new class, you could add overloaded printf( ) and scanf( ) functions (and their variants for files and strings), but remember, overloaded functions must have different types in their argument lists, and the printf( ) family hides its type information in the format string and in the variable argument list. For a language such as C++, whose goal is to be able to easily add new data types, this is an ungainly restriction. Comment

Iostreams to the rescue

All these issues make it clear that one of the first priorities for the standard class libraries for C++ should handle I/O. Because “hello, world” is the first program just about everyone writes in a new language, and because I/O is part of virtually every program, the I/O library in C++ must be particularly easy to use. It also has the much greater challenge that it must accommodate any new class. Thus, its constraints require that this foundation class library be a truly inspired design. In addition to gaining a great deal of leverage and clarity in your dealings with I/O and formatting, you’ll also see in this chapter how a really powerful C++ library can work. Comment

Inserters and extractors

A stream is an object that transports and formats characters of a fixed width. You can have an input stream (via descendants of the istream class), an output stream (with ostream objects), or a stream that does both simultaneously (with objects derived from iostream). The iostreams library provides different types of such classes: ifstream, ofstream, and fstream for files, and istringstream, ostringstream, and stringstream for interfacing with the Standard C++ string class. All these stream classes have nearly identical interfaces, so you can use streams in a uniform manner, whether you’re working with a file, standard I/O, a region of memory, or a string object. The single interface you learn also works for extensions added to support new classes. Some functions implement your formatting commands, and some functions read and write characters without formatting. Comment

The stream classes mentioned earlier are actually template specializations,[39] much like the standard string class is a specialization of the basic_string template. The basic classes in the iostreams inheritance hierarchy are shown in the following figure. Comment

The ios_base class declares everything that is common to all streams, independent of the type of character the stream handles. These declarations are mostly constants and functions to manage them, some of which you’ll see throughout this chapter. The rest of the classes are templates that have the underlying character type as a parameter. The istream class, for example, is defined as follows: Comment

typedef basic_istream<char> istream;

 

All the classes mentioned earlier are defined via similar type definitions. There are also type definitions for all stream classes using wchar_t (the wide character type discussed in Chapter 3) instead of char. We’ll look at these at the end of this chapter. The basic_ios template defines functions common to both input and output, but that depends on the underlying character type (we won’t use these much). The template basic_istream defines generic functions for input, and basic_ostream does the same for output. The classes for file and string streams introduced later add functionality for their specific stream types. Comment

In the iostreams library, two operators are overloaded to simplify the use of iostreams. The operator << is often referred to as an inserter for iostreams, and the operator >> is often referred to as an extractor. Comment

Extractors parse the information that’s expected by the destination object according to its type. To see an example of this, you can use the cin object, which is the iostream equivalent of stdin in C, that is, redirectable standard input. This object is predefined whenever you include the <iostream> header. Comment

  int i;

  cin >> i;

 

  float f;

  cin >> f;

 

  char c;

  cin >> c;

 

  char buf[100];

  cin >> buf;

 

There’s an overloaded operator >> for every built-in data type. You can also overload your own, as you’ll see later. Comment

To find out what you have in the various variables, you can use the cout object (corresponding to standard output; there’s also a cerr object corresponding to standard error) with the inserter <<: Comment

  cout << "i = ";

  cout << i;

  cout << "\n";

  cout << "f = ";

  cout << f;

  cout << "\n";

  cout << "c = ";

  cout << c;

  cout << "\n";

  cout << "buf = ";

  cout << buf;

  cout << "\n";

 

This is notably tedious and doesn’t seem like much of an improvement over printf( ), despite improved type checking. Fortunately, the overloaded inserters and extractors are designed to be chained together into a more complicated expression that is much easier to write (and read): Comment

  cout << "i = " << i << endl;

  cout << "f = " << f << endl;

  cout << "c = " << c << endl;

  cout << "buf = " << buf << endl;

 

Defining inserters and extractors for your own classes is just a matter of overloading the associated operators to do the right things, namely:

·         Make the first parameter a non-const reference to the stream (istream for input, ostream for output)

·         Perform the operation by insert/extracting data to/from the stream (by processing the components of the object, of course)

·         Return a reference to the stream

The stream should be non-const because processing stream data changes the state of the stream. By returning the stream, you allow for chaining stream operations in a single statement, as shown earlier. Comment

As an example, consider how to output the representation of a Date object in MM-DD-YYYY format. The following inserter does the job:

ostream& operator<<(ostream& os, const Date& d) {

  char fillc = os.fill('0');

  os << setw(2) << d.getMonth() << '-'

     << setw(2) << d.getDay() << '-'

     << setw(4) << setfill(fillc) << d.getYear();

  return os;

}

 

This function cannot be a member of the Date class, of course, because the left operand of the << operator must be the output stream. The fill( ) member function of ostream changes the padding character used when the width of an output field, determined by the manipulator setw( ), is greater than needed for the data. We use a ‘0’ character so that months before October will display with a leading zero, such as “09” for September. The fill( ) function also returns the previous fill character (which defaults to a single space) so that we can restore it later with the manipulator setfill( ). We discuss manipulators in depth later in this chapter. Comment

Extractors require a little more care because things sometimes go wrong with input data. The way to signal a stream error is to set the stream’s fail bit, as follows:

istream& operator>>(istream& is, Date& d) {

  is >> d.month;

  char dash;

  is >> dash;

  if (dash != '-')

    is.setstate(ios::failbit);

  is >> d.day;

  is >> dash;

  if (dash != '-')

    is.setstate(ios::failbit);

  is >> d.year;

  return is;

}

 

When an error bit is set in a stream, all further streams operations are ignored until the stream is restored to a good state (explained shortly). That’s why the code above continues extracting even if ios::failbit gets set. This implementation is somewhat forgiving in that it allows white space between the numbers and dashes in a date string (because the >> operator skips white space by default when reading built-in types). The following are valid date strings for this extractor: Comment

"08-10-2003"

"8-10-2003"

"08 - 10 - 2003"

 

but these are not:

"A-10-2003"    // No alpha characters allowed

"08%10/2003"   // Only dashes allowed as a delimiter

 

We’ll discuss stream state in more depth in the section “Handling stream errors” later in this chapter. Comment

Common usage

As the Date extractor illustrated, you must be on guard for erroneous input. If the input produces an unexpected value, the process is skewed, and it’s difficult to recover. In addition, formatted input defaults to white space delimiters. Consider what happens when we collect the code fragments from earlier in this chapter into a single program: Comment

//: C04:Iosexamp.cpp

// Iostream examples

#include <iostream>

using namespace std;

 

int main() {

  int i;

  cin >> i;

 

  float f;

  cin >> f;

 

  char c;

  cin >> c;

 

  char buf[100];

  cin >> buf;

 

  cout << "i = " << i << endl;

  cout << "f = " << f << endl;

  cout << "c = " << c << endl;

  cout << "buf = " << buf << endl;

 

  cout << flush;

  cout << hex << "0x" << i << endl;

} ///:~

 

and give it the following input: Comment

12 1.4 c this is a test

 

We expect the same output as if we gave it:

12

1.4

c

this is a test

 

but the output is, somewhat unexpectedly

i = 12

f = 1.4

c = c

buf = this

0xc

 

Notice that buf got only the first word because the input routine looked for a space to delimit the input, which it saw after “this.” In addition, if the continuous input string is longer than the storage allocated for buf, we overrun the buffer. Comment

In practice, you’ll usually want to get input from interactive programs a line at a time as a sequence of characters, scan them, and then perform conversions once they’re safely in a buffer. This way you don’t have to worry about the input routine choking on unexpected data. Comment

Another thing to consider is the whole concept of a command-line interface. This made sense in the past when the console was little more than a glass typewriter, but the world is rapidly changing to one in which the graphical user interface (GUI) dominates. What is the meaning of console I/O in such a world? It makes much more sense to ignore cin altogether, other than for simple examples or tests, and take the following approaches: Comment

1.               If your program requires input, read that input from a file—you’ll soon see that it’s remarkably easy to use files with iostreams. Iostreams for files still works fine with a GUI. Comment

2.               Read the input without attempting to convert it, as we just suggested. When the input is some place where it can’t foul things up during conversion, you can safely scan it. Comment

3.               Output is different. If you’re using a GUI, cout doesn’t necessarily work, and you must send it to a file (which is identical to sending it to cout) or use the GUI facilities for data display. Otherwise it often makes sense to send it to cout. In both cases, the output formatting functions of iostreams are highly useful. Comment

Another common practice saves compile time on large projects. Consider, for example, how you would declare the Date stream operators introduced earlier in the chapter in a header file. You only need to include the prototypes for the functions, so it’s not really necessary to include the entire <iostream> header in Date.h. The standard practice is to only declare classes, something like this: Comment

class ostream;

 

This is an age-old technique for separating interface from implementation and is often called a forward declaration (and ostream at this point would be considered an incomplete type, since the class definition has not yet been seen by the compiler). Comment

This will not work as is, however, for two reasons:

1.       The stream classes are defined in the std namespace.

2.      They are templates.

The proper declaration would be:

namespace std {

  template<class charT, class traits = char_traits<charT> >

    class basic_ostream;

  typedef basic_ostream<char> ostream;

}

 

(As you can see, like the string class, the streams classes use the character traits classes mentioned in Chapter 3). Since it would be terribly tedious to type all that for every stream class you want to reference, the standard provides a header that does it for you: <iosfwd>. The Date header would then look something like this: Comment

// Date.h

#include <iosfwd>

class Date {

  friend std::ostream& operator<<(std::ostream&,

                                  const Date&);

  friend std::istream& operator>>(std::istream&, Date&);

  // etc. Comment

 

Line-oriented input

To grab input a line at a time, you have three choices:

The member function get( )

The member function getline( )

The global function getline( ) defined in the <string> header

The first two functions take three arguments:

A pointer to a character buffer in which to store the result

The size of that buffer (so it’s not overrun)

The terminating character, to know when to stop reading input

The terminating character has a default value of '\n', which is what you’ll usually use. Both functions store a zero in the result buffer when they encounter the terminating character in the input. Comment

So what’s the difference? Subtle, but important: get( ) stops when it sees the delimiter in the input stream, but it doesn’t extract it from the input stream. Thus, if you did another get( ) using the same delimiter, it would immediately return with no fetched input. (Presumably, you either use a different delimiter in the next get( ) statement or a different input function.) The getline( ) function, on the other hand, extracts the delimiter from the input stream, but still doesn’t store it in the result buffer. Comment

The getline( ) function defined in <string> is convenient. It is not a member function, but rather a stand-alone function declared in the namespace std. It takes only two non-default arguments, the input stream and the string object to populate. Like its namesake, it reads characters until it encounters the first occurrence of the delimiter ('\n' by default) and consumes and discards the delimiter. The advantage of this function is that it reads into a string object, so you don’t have to worry about buffer size. Comment

Generally, when you’re processing a text file that you read a line at a time, you’ll want to use one of the getline( ) functions. Comment

Overloaded versions of get( )

The get( ) function also comes in three other overloaded versions: one with no arguments that returns the next character, using an int return value; one that stuffs a character into its char argument, using a reference; and one that stores directly into the underlying buffer structure of another iostream object. The latter is explored later in the chapter. Comment

Reading raw bytes

If you know exactly what you’re dealing with and want to move the bytes directly into a variable, an array, or a structure in memory, you can use the unformatted I/O function read( ). The first argument is a pointer to the destination memory, and the second is the number of bytes to read. This is especially useful if you’ve previously stored the information to a file, for example, in binary form using the complementary write( ) member function for an output stream (using the same compiler, of course). You’ll see examples of all these functions later. Comment

Handling stream errors

The Date extractor shown earlier sets a stream’s fail bit under certain conditions. How does the user know when such a failure occurs? You can detect stream errors by either calling certain stream member functions to see if an error state has occurred, or if you don’t care what the particular error was, you can just evaluate the stream in a Boolean context. Both techniques derive from the state of a stream’s error bits. Comment

Stream state

The ios_base class, from which ios derives,[40] defines four flags that you can use to test the state of a stream:


 

Flag

Meaning

badbit

Some fatal (perhaps physical) error occurred. The stream should be considered unusable.

eofbit

End-of-input has occurred (either by encountering the physical end of a file stream or by the user terminating a console stream, such as with Ctrl-Z or Ctrl‑D).

failbit

An I/O operation failed, most likely because of invalid data (e.g., letters were found when trying to read a number). The stream is still usable. The failbit flag is also set when end-of-input occurs.

goodbit

All is well; no errors. End-of-input has not yet occurred.

 

You can test whether any of these conditions have occurred by calling corresponding member functions that return a Boolean value indicating whether any of these have been set. The good( ) stream member function returns true if none of the other three bits are set. The eof( ) function returns true if eofbit is set, which happens with an attempt to read from a stream that has no more data (usually a file). Because end-of-input happens in C++ when trying to read past the end of the physical medium, failbit is also set to indicate that the “expected” data was not successfully read. The fail( ) function returns true if either failbit or badbit is set, and bad( ) returns true only if the badbit is set. Comment

Once any of the error bits in a stream’s state are set, they remain set, which is not always what you want. When reading a file for example, you might want to reposition to an earlier place in the file before end-of-file occurred. Just moving the file pointer doesn’t automatically reset eofbit or failbit; you have to do it yourself with the clear( ) function, like this: Comment

myStream.clear();    // Clears all error bits

 

After calling clear( ), good( ) will return true if called immediately. As you saw in the Date extractor earlier, the setstate( ) function sets the bits you pass it. It turns out that setstate( ) doesn’t affect any other bits—if they’re already set, they stay set. If you want to set certain bits but at the same time reset all the rest, you can call an overloaded version of clear( ), passing it a bitwise expression representing the bits you want to set, as in: Comment

myStream.clear(ios::failbit | ios::eofbit);

 

Most of the time you won’t be interested in checking the stream state bits individually. Usually you just want to know if everything is okay. This is the case when you read a file from beginning to end; you just want to know when the input data is exhausted. In cases such as these, a conversion operator is defined for void* that is automatically called when a stream occurs in a Boolean expression. To read a stream until end-of-input using this idiom looks like the following: Comment

int i;

while (myStream >> i)

  cout << i << endl;

 

Remember that operator>>( ) returns its stream argument, so the while statement above tests the stream as a Boolean expression. This particular example assumes that the input stream myStream contains integers separated by white space. The function ios_base::operator void*( ) simply calls good( ) on its stream and returns the result.[41] Because most stream operations return their stream, using this idiom is convenient. Comment

Streams and exceptions

Iostreams existed as part of C++ long before there were exceptions, so checking stream state manually was just the way things were done. For backward compatibility, this is still the status quo, but iostreams can throw exceptions instead. The exceptions( ) stream member function takes a parameter representing the state bits for which you want exceptions to be thrown. Whenever the stream encounters such a state, it throws an exception of type std::ios_base::failure, which inherits from std::exception. Comment

Although you can trigger a failure exception for any of the four stream states, it’s not necessarily a good idea to enable exceptions for all of them. As Chapter 1 explains, use exceptions for truly exceptional conditions, but end-of-file is not only not exceptional—it’s expected! For that reason, you might want to enable exceptions only for the errors represented by badbit, which you would do like this: Comment

myStream.exceptions(ios::badbit);

 

You enable exceptions on a stream-by-stream basis, since exceptions( ) is a member function for streams. The exceptions( ) function returns a bitmask[42] (of type iostate, which is some compiler-dependent type convertible to int) indicating which stream states will cause exceptions. If those states have already been set, an exception is thrown immediately. Of course, if you use exceptions in connection with streams, you had better be ready to catch them, which means that you need to wrap all stream processing with a try block that has an ios::failure handler. Many programmers find this tedious and just check states manually where they expect errors to occur (since, for example, they don’t expect bad( ) to return true most of the time anyway). This is another reason that having streams throw exceptions is optional and not the default. In any case, you can choose how you want to handle stream errors. Comment

File iostreams

Manipulating files with iostreams is much easier and safer than using stdio in C. All you do to open a file is create an object; the constructor does the work. You don’t have to explicitly close a file (although you can, using the close( ) member function) because the destructor will close it when the object goes out of scope. To create a file that defaults to input, make an ifstream object. To create one that defaults to output, make an ofstream object. An fstream object can do both input and output. Comment

The file stream classes fit into the iostreams classes as shown in the following figure.

 

 

 

As before, the classes you actually use are template specializations defined by type definitions. For example, ifstream, which processes files of char, is defined as Comment

typedef basic_ifstream<char> ifstream;

 

A File-Processing Example

Here’s an example that shows many of the features discussed so far. Notice the inclusion of <fstream> to declare the file I/O classes. Although on many platforms this will also include <iostream> automatically, compilers are not required to do so. If you want portable code, always include both headers. Comment

//: C04:Strfile.cpp

// Stream I/O with files

// The difference between get() & getline()

#include <fstream>

#include <iostream>

#include "../require.h"

using namespace std;

 

int main() {

  const int sz = 100; // Buffer size;

  char buf[sz];

  {

    ifstream in("Strfile.cpp"); // Read

    assure(in, "Strfile.cpp"); // Verify open

    ofstream out("Strfile.out"); // Write

    assure(out, "Strfile.out");

    int i = 1; // Line counter

 

    // A less-convenient approach for line input:

    while(in.get(buf, sz)) { // Leaves \n in input

      in.get(); // Throw away next character (\n)

      cout << buf << endl; // Must add \n

      // File output just like standard I/O:

      out << i++ << ": " << buf << endl;

    }

  } // Destructors close in & out

 

  ifstream in("Strfile.out");

  assure(in, "Strfile.out");

  // More convenient line input:

  while(in.getline(buf, sz)) { // Removes \n

    char* cp = buf;

    while(*cp != ':')

      cp++;

    cp += 2; // Past ": "

    cout << cp << endl; // Must still add \n

  }

} ///:~

 

The creation of both the ifstream and ofstream are followed by an assure( ) to guarantee the file was successfully opened. Here again the object, used in a situation in which the compiler expects a Boolean result, produces a value that indicates success or failure. Comment

The first while loop demonstrates the use of two forms of the get( ) function. The first gets characters into a buffer and puts a zero terminator in the buffer when either sz-1 characters have been read or the third argument (defaulted to '\n') is encountered. The get( ) function leaves the terminator character in the input stream, so this terminator must be thrown away via in.get( ) using the form of get( ) with no argument, which fetches a single byte and returns it as an int. You can also use the ignore( ) member function, which has two default arguments. The first argument is the number of characters to throw away and defaults to one. The second argument is the character at which the ignore( ) function quits (after extracting it) and defaults to EOF. Comment

Next, you see two output statements that look similar: one to cout and one to the file out. Notice the convenience here; you don’t need to worry about what kind of object you’re dealing with because the formatting statements work the same with all ostream objects. The first one echoes the line to standard output, and the second writes the line out to the new file and includes a line number. Comment

To demonstrate getline( ), open the file we just created and strip off the line numbers. To ensure the file is properly closed before opening it to read, you have two choices. You can surround the first part of the program with braces to force the out object out of scope, thus calling the destructor and closing the file, which is done here. You can also call close( ) for both files; if you do this, you can even reuse the in object by calling the open( ) member function. Comment

The second while loop shows how getline( ) removes the terminator character (its third argument, which defaults to '\n') from the input stream when it’s encountered. Although getline( ), like get( ), puts a zero in the buffer, it still doesn’t insert the terminating character. Comment

This example, as well as most of the examples in this chapter, assumes that each call to any overload of getline( ) will actually encounter a newline character. If this is not the case, the eofbit state of the stream will be set and the call to getline( ) will return false, causing the program to lose the last line of input.

Open modes

You can control the way a file is opened by overriding the constructor’s default arguments. The following table shows the flags that control the mode of the file: Comment

Flag

Function

ios::in

Opens an input file. Use this as an open mode for an ofstream to prevent truncating an existing file.

ios::out

Opens an output file. When used for an ofstream without ios::app, ios::ate or ios::in, ios::trunc is implied.

ios::app

Opens an output file for appending only.

ios::ate

Opens an existing file (either input or output) and seeks to the end.

ios::trunc

Truncates the old file, if it already exists.

ios::binary

Opens a file in binary mode. The default is text mode.

 

You can combine these flags using a bitwise or operation. Comment

The binary flag, while portable, only has an effect on some non-UNIX systems, such as operating systems derived from MS-DOS, that have special conventions for storing end-of-line delimiters. For example, on MS-DOS systems in text mode (which is the default), every time you output a newline character ('\n'), the file system actually outputs two characters, a carriage-return/linefeed pair (CRLF), which is the pair of ASCII characters 0x0D and 0x0A. Conversely, when you read such a file back into memory in text mode, each occurrence of this pair of bytes causes a '\n' to be sent to the program in its place. If you want to bypass this special processing, you open files in binary mode. Binary mode has nothing whatsoever to do with whether you can write raw bytes to a file—you always can (by calling write( )) . You should, however, open a file in binary mode when you’ll be using read( ) or write( ), because these functions take a byte count parameter. Having the extra '\r' characters will throw your byte count off in those instances. You should also open a file in binary mode if you’re going to use the stream-positioning commands discussed later in this chapter. Comment

You can open a file for both input and output by declaring an fstream object. When declaring an fstream object, you must use enough of the open mode flags mentioned earlier to let the file system know whether you want to input, output, or both. To switch from output to input, you need to either flush the stream or change the file position. To change from input to output, change the file position. To create a file via an fstream object, you need to use the ios::trunc open mode flag in the constructor call if you will actually do both input and output. Comment

Iostream buffering

Good design practice dictates that whenever you create a new class, you should endeavor to hide the details of the underlying implementation as much possible from the user of the class. You show them only what they need to know and make the rest private to avoid confusion. When using inserters and extractors, you normally don’t know or care where the bytes are being produced or consumed, whether you’re dealing with standard I/O, files, memory, or some newly created class or device. Comment

A time comes, however, when it is important to communicate with the part of the iostream that produces and consumes bytes. To provide this part with a common interface and still hide its underlying implementation, the standard library abstracts it into its own class, called streambuf. Each iostream object contains a pointer to some kind of streambuf. (The kind depends on whether it deals with standard I/O, files, memory, and so on.) You can access the streambuf directly; for example, you can move raw bytes into and out of the streambuf, without formatting them through the enclosing iostream. This is accomplished by calling member functions for the streambuf object. Comment

Currently, the most important thing for you to know is that every iostream object contains a pointer to a streambuf object, and the streambuf object has some member functions you can call if necessary. For file and string streams, there are specialized types of stream buffers, as the following figure illustrates. Comment

To allow you to access the streambuf, every iostream object has a member function called rdbuf( ) that returns the pointer to the object’s streambuf. This way you can call any member function for the underlying streambuf. However, one of the most interesting things you can do with the streambuf pointer is to connect it to another iostream object using the << operator. This drains all the characters from your object into the one on the left side of the <<. If you want to move all the characters from one iostream to another, you don’t have to go through the tedium (and potential coding errors) of reading them one character or one line at a time. It’s a much more elegant approach. Comment

For example, here’s a simple program that opens a file and sends the contents to standard output (similar to the previous example): Comment

//: C04:Stype.cpp

// Type a file to standard output

#include <fstream>

#include <iostream>

#include "../require.h"

using namespace std;

 

int main() {

  ifstream in("Stype.cpp");

  assure(in, "Stype.cpp");

  cout << in.rdbuf(); // Outputs entire file

} ///:~

 

An ifstream is created using the source code file for this program as an argument. The assure( ) function reports a failure if the file cannot be opened. All the work really happens in the statement: Comment

cout << in.rdbuf();

 

which sends the entire contents of the file to cout. This is not only more succinct to code, it is often more efficient than moving the bytes one at a time. Comment

A form of get( ) allows you to write directly into the streambuf of another object. The first argument is a reference to the destination streambuf, and the second is the terminating character (‘\n’ by default), which stops the get( ) function. So there is yet another way to print a file to standard output: Comment

//: C04:Sbufget.cpp

// Copies a file to standard output

#include <fstream>

#include <iostream>

#include "../require.h"

using namespace std;

 

int main() {

  ifstream in("Sbufget.cpp");

  assure(in);

  streambuf& sb = *cout.rdbuf();

  while (!in.get(sb).eof()) {

    if (in.fail())          // Found blank line

      in.clear();

    cout << char(in.get()); // Process '\n'

  }

} ///:~

 

The rdbuf( ) function returns a pointer, so it must be dereferenced to satisfy the function’s need to see an object. Stream buffers are not meant to be copied (they have no copy constructor), so we define sb as a reference to cout’s stream buffer. We need the calls to fail( ) and clear( ) in case the input file has a blank line (this one does). When this particular overloaded version of get( ) sees two newlines in a row (evidence of a blank line), it sets the input stream’s fail bit, so we must call clear( ) to reset it so that the stream can continue to be read. The second call to get( ) extracts and echoes each newline delimiter. (Remember, the get( ) function doesn’t extract its delimiter like getline( ) does.) Comment

You probably won’t need to use a technique like this often, but it’s nice to know it exists.[43]Comment

Seeking in iostreams

Each type of iostream has a concept of where its “next” character will come from (if it’s an istream) or go (if it’s an ostream). In some situations, you might want to move this stream position. You can do so using two models: one uses an absolute location in the stream called the streampos; the second works like the Standard C library functions fseek( ) for a file and moves a given number of bytes from the beginning, end, or current position in the file. Comment

The streampos approach requires that you first call a “tell” function: tellp( ) for an ostream or tellg( ) for an istream. (The “p” refers to the “put pointer,” and the “g” refers to the “get pointer.”) This function returns a streampos you can later use in calls to seekp( ) for an ostream or seekg( ) for an istream, when you want to return to that position in the stream. Comment

The second approach is a relative seek and uses overloaded versions of seekp( ) and seekg( ). The first argument is the number of characters to move: it can be positive or negative. The second argument is the seek direction: Comment

ios::beg

From beginning of stream

ios::cur

Current position in stream

ios::end

From end of stream

 

Here’s an example that shows the movement through a file, but remember, you’re not limited to seeking within files, as you are with C and cstdio. With C++, you can seek in any type of iostream (although the standard stream objects, such as cin and cout, explicitly disallow it): Comment

//: C04:Seeking.cpp

// Seeking in iostreams

#include <cassert>

#include <cstddef>

#include <cstring>

#include <fstream>

#include "../require.h"

using namespace std;

 

int main() {

  const int STR_NUM = 5, STR_LEN = 30;

  char origData[STR_NUM][STR_LEN] = {

    "Hickory dickory dus. . .",

    "Are you tired of C++?",

    "Well, if you have,",

    "That's just too bad,",

    "There's plenty more for us!"

  };

  char readData[STR_NUM][STR_LEN] = { 0 };

  ofstream out("Poem.bin", ios::out | ios::binary);

  assure(out, "Poem.bin");

  for(size_t i = 0; i < STR_NUM; i++)

    out.write(origData[i], STR_LEN);

  out.close();

  ifstream in("Poem.bin", ios::in | ios::binary);

  assure(in, "Poem.bin");

  in.read(readData[0], STR_LEN);

  assert(strcmp(readData[0], "Hickory dickory dus. . .")

    == 0);

  // Seek -STR_LEN bytes from the end of file

  in.seekg(-STR_LEN, ios::end);

  in.read(readData[1], STR_LEN);

  assert(strcmp(readData[1], "There's plenty more for us!")

    == 0);

  // Absolute seek (like using operator[] with a file)

  in.seekg(3 * STR_LEN);

  in.read(readData[2], STR_LEN);

  assert(strcmp(readData[2], "That's just too bad,") == 0);

  // Seek backwards from current position

  in.seekg(-STR_LEN * 2, ios::cur);

  in.read(readData[3], STR_LEN);

  assert(strcmp(readData[3], "Well, if you have,") == 0);

  // Seek from the begining of the file

  in.seekg(1 * STR_LEN, ios::beg);

  in.read(readData[4], STR_LEN);

  assert(strcmp(readData[4], "Are you tired of C++?")

    == 0);

} ///:~

 

This program writes a (very clever?) poem to a file using a binary output stream. Since we reopen it as an ifstream, we use seekg( ) to position the “get pointer.” As you can see, you can seek from the beginning or end of the file or from the current file position. Obviously, you must provide a positive number to move from the beginning of the file and a negative number to move back from the end. Comment

Now that you know about the streambuf and how to seek, you can understand an alternative method (besides using an fstream object) for creating a stream object that will both read and write a file. The following code first creates an ifstream with flags that say it’s both an input and an output file. You can’t write to an ifstream, of course, so you need to create an ostream with the underlying stream buffer: Comment

ifstream in("filename", ios::in | ios::out);

ostream out(in.rdbuf());

 

You might wonder what happens when you write to one of these objects. Here’s an example: Comment

//: C04:Iofile.cpp

// Reading & writing one file

#include <fstream>

#include <iostream>

#include "../require.h"

using namespace std;

 

int main() {

  ifstream in("Iofile.cpp");

  assure(in, "Iofile.cpp");

  ofstream out("Iofile.out");

  assure(out, "Iofile.out");

  out << in.rdbuf(); // Copy file

  in.close();

  out.close();

  // Open for reading and writing:

  ifstream in2("Iofile.out", ios::in | ios::out);

  assure(in2, "Iofile.out");

  ostream out2(in2.rdbuf());

  cout << in2.rdbuf();  // Print whole file

  out2 << "Where does this end up?";

  out2.seekp(0, ios::beg);

  out2 << "And what about this?";

  in2.seekg(0, ios::beg);

  cout << in2.rdbuf();

} ///:~

 

The first five lines copy the source code for this program into a file called iofile.out and then close the files. This gives us a safe text file to play with. Then the aforementioned technique is used to create two objects that read and write to the same file. In cout << in2.rdbuf( ), you can see the “get” pointer is initialized to the beginning of the file. The “put” pointer, however, is set to the end of the file because “Where does this end up?” appears appended to the file. However, if the put pointer is moved to the beginning with a seekp( ), all the inserted text overwrites the existing text. Both writes are seen when the get pointer is moved back to the beginning with a seekg( ), and the file is displayed. Of course, the file is automatically saved and closed when out2 goes out of scope and its destructor is called. Comment

String iostreams

A string stream works directly with memory instead of a file or standard output. It allows you to use the same reading and formatting functions that you use with cin and cout to manipulate bytes in memory. On old computers, the memory was referred to as core, so this type of functionality is often called in-core formatting. Comment

The class names for string streams echo those for file streams. If you want to create a string stream to extract characters from, you create an istringstream. If you want to put characters into a string stream, you create an ostringstream. All declarations for string stream are in the standard header <sstream>. As usual, there are class templates that fit into the iostreams hierarchy, as shown in the following figure: Comment

Input string streams

To read from a string using stream operations, you create an istringstream object initialized with the string. The following program shows how to use an istringstream object.

//: C04:Istring.cpp

// Input string streams

#include <cassert>

#include <cmath>  // For fabs()

#include <iostream>

#include <limits> // For epsilon()

#include <sstream>

#include <string>

using namespace std;

 

int main() {

  istringstream s("47 1.414 This is a test");

  int i;

  double f;

  s >> i >> f; // Whitespace-delimited input

  assert(i == 47);

  double relerr = (fabs(f) - 1.414) / 1.414;

  assert(relerr <= numeric_limits<double>::epsilon());

  string buf2;

  s >> buf2;

  assert(buf2 == "This");

  cout << s.rdbuf(); // " is a test"

} ///:~

 

You can see that this is a more flexible and general approach to transforming character strings to typed values than the standard C library functions such as atof( ), atoi( ), even though the latter may be more efficient for single conversions. Comment

In the expression s >> i >> f, the first number is extracted into i, and the second into f. This isn’t “the first whitespace-delimited set of characters” because it depends on the data type it’s being extracted into. For example, if the string were instead, “1.414 47 This is a test,” then i would get the value 1 because the input routine would stop at the decimal point. Then f would get 0.414. This could be useful if you want to break a floating-point number into a whole number and a fraction part. Otherwise it would seem to be an error. The second assert( ) calculates the relative error between what we read and what we expected; it’s always better to do this than to compare floating-point numbers for equality. The constant returned by epsilon( ), defined in <limits>, represents the machine epsilon for double-precision numbers, which is the best tolerance you can expect comparisons of doubles to satisfy.[44] Comment

As you may already have guessed, buf2 doesn’t get the rest of the string, just the next white-space-delimited word. In general, it’s best to use the extractor in iostreams when you know the exact sequence of data in the input stream and you’re converting to some type other than a character string. However, if you want to extract the rest of the string all at once and send it to another iostream, you can use rdbuf( ) as shown. Comment

To test the Date extractor at the beginning of this chapter, we used an input string stream with the following test program:

//: C04:DateIOTest.cpp

//{L} ../C02/Date

#include <iostream>

#include <sstream>

#include "../C02/Date.h"

using namespace std;

 

void testDate(const string& s) {

  istringstream os(s);

  Date d;

  os >> d;

  if (os)

    cout << d << endl;

  else

    cout << "input error with \"" << s << "\"\n";

}

 

int main() {

  testDate("08-10-2003");

  testDate("8-10-2003");

  testDate("08 - 10 - 2003");

  testDate("A-10-2003");

  testDate("08%10/2003");

} ///:~

 

Each string literal in main( ) is passed by reference to testDate( ), which in turn wraps it in an istringstream so we can test the stream extractor we wrote for Date objects. The function testDate( ) also begins to test the inserter, operator<<( ). Comment

Output string streams

To create an output string stream to put data into, you just create an ostringstream object, which manages a dynamically sized character buffer to hold whatever you insert. To get the formatted result as a string object, you call the str( ) member function. Here’s an example: Comment

//: C04:Ostring.cpp

// Illustrates ostringstream

#include <iostream>

#include <sstream>

#include <string>

using namespace std;

 

int main() {

  cout << "type an int, a float and a string: ";

  int i;

  float f;

  cin >> i >> f;

  cin >> ws; // Throw away white space

  string stuff;

  getline(cin, stuff); // Get rest of the line

  ostringstream os;

  os << "integer = " << i << endl;

  os << "float = " << f << endl;

  os << "string = " << stuff << endl;

  string result = os.str();

  cout << result << endl;

} ///:~

 

This is similar to the Istring.cpp example earlier that fetched an int and a float. A sample execution follows (the keyboard input is in bold type). Comment

type an int, a float and a string: 10 20.5 the end

integer = 10

float = 20.5

string = the end

 

You can see that, like the other output streams, you can use the ordinary formatting tools, such as the << operator and endl, to send bytes to the ostringstream. The str( ) function returns a new string object every time you call it so the underlying stringbuf object owned by the string stream is left undisturbed. Comment

In the previous chapter, we presented a program, HTMLStripper.cpp, that removed all HTML tags and special codes from a text file. As promised, here is a more elegant version using string streams.

//: C04:HTMLStripper2.cpp

//{L} ../C03/ReplaceAll

// Filter to remove html tags and markers

#include <cstddef>

#include <cstdlib>

#include <fstream>

#include <iostream>

#include <sstream>

#include <stdexcept>

#include <string>

#include "../require.h"

using namespace std;

 

string& replaceAll(string& context, const string& from,

  const string& to);

 

string& stripHTMLTags(string& s) throw(runtime_error) {

  size_t leftPos;

  while ((leftPos = s.find('<')) != string::npos) {

    size_t rightPos = s.find('>', leftPos+1);

    if (rightPos == string::npos) {

      ostringstream msg;

      msg << "Incomplete HTML tag starting in position "

          << leftPos;

      throw runtime_error(msg.str());

    }

    s.erase(leftPos, rightPos - leftPos + 1);

  }

  // Remove all special HTML characters

  replaceAll(s, "&lt;", "<");

  replaceAll(s, "&gt;", ">");

  replaceAll(s, "&amp;", "&");

  replaceAll(s, "&nbsp;", " ");

  // Etc...

  return s;

}

 

int main(int argc, char* argv[]) {

  requireArgs(argc, 1,

    "usage: HTMLStripper2 InputFile");

  ifstream in(argv[1]);

  assure(in, argv[1]);

  // Read entire file into string; then strip

  ostringstream ss;

  ss << in.rdbuf();

  try {

    string s = ss.str();

    cout << stripHTMLTags(s) << endl;

    return EXIT_SUCCESS;

  }

  catch (runtime_error& x) {

    cout << x.what() << endl;

    return EXIT_FAILURE;

  }

} ///:~

 

In this program we read the entire file into a string by inserting a rdbuf( ) call to the file stream into an ostringstream. Now it’s an easy matter to search for HTML delimiter pairs and erase them without having to worry about crossing line boundaries like we had to with the previous version in Chapter 3. Comment

The following example shows how to use a bidirectional (that is, read/write) string stream.

//: C04:StringSeeking.cpp

// Reads and writes a string stream

//{-bor}

#include <cassert>

#include <sstream>

#include <string>

using namespace std;

 

int main() {

  string text = "We will sell no wine";

  stringstream ss(text);

  ss.seekp(0, ios::end);

  ss << " before its time.";

  assert(ss.str() ==

    "We will sell no wine before its time.");

  // Change "sell" to "ship"

  ss.seekg(9, ios::beg);

  string word;

  ss >> word;

  assert(word == "ell");

  ss.seekp(9, ios::beg);

  ss << "hip";

  // Change "wine" to "code"

  ss.seekg(16, ios::beg);

  ss >> word;

  assert(word == "wine");

  ss.seekp(16, ios::beg);

  ss << "code";

  assert(ss.str() ==

    "We will ship no code before its time.");

  ss.str("A horse of a different color.");

  assert(ss.str() == "A horse of a different color.");

} ///:~

 

As always, to move the put pointer, you call seekp( ), and to reposition the get pointer, you call seekg( ). Even though we didn’t show it with this example, string streams are a little more forgiving than file streams in that you can switch from reading to writing or vice-versa at any time. You don’t need to reposition the get or put pointers or flush the stream. This program also illustrates the overload of str( ) that replaces the stream’s underlying stringbuf with a new string. Comment

Output stream formatting

The goal of the iostreams design is to allow you to easily move and/or format characters. It certainly wouldn’t be useful if you couldn’t do most of the formatting provided by C’s printf( ) family of functions. In this section, you’ll learn all the output formatting functions that are available for iostreams, so you can format your bytes the way you want them. Comment

The formatting functions in iostreams can be somewhat confusing at first because there’s often more than one way to control the formatting: through both member functions and manipulators. To further confuse things, a generic member function sets state flags to control formatting, such as left or right justification, to use uppercase letters for hex notation, to always use a decimal point for floating-point values, and so on. On the other hand, separate member functions set and read values for the fill character, the field width, and the precision. Comment

In an attempt to clarify all this, we’ll first examine the internal formatting data of an iostream , along with the member functions that can modify that data. (Everything can be controlled through the member functions, if desired.) We’ll cover the manipulators separately. Comment

Format flags

The class ios contains data members to store all the formatting information pertaining to a stream. Some of this data has a range of values and is stored in variables: the floating-point precision, the output field width, and the character used to pad the output (normally a space). The rest of the formatting is determined by flags, which are usually combined to save space and are referred to collectively as the format flags. You can find out the value of the format flags with the ios::flags( ) member function, which takes no arguments and returns an object of type fmtflags (usually a synonym for long) that contains the current format flags. All the rest of the functions make changes to the format flags and return the previous value of the format flags. Comment

fmtflags ios::flags(fmtflags newflags);

fmtflags ios::setf(fmtflags ored_flag);

fmtflags ios::unsetf(fmtflags clear_flag);

fmtflags ios::setf(fmtflags bits, fmtflags field);

 

The first function forces all the flags to change, which you do sometimes. More often, you change one flag at a time using the remaining three functions. Comment

The use of setf( ) can seem somewhat confusing. To know which overloaded version to use, you must know what type of flag you’re changing. There are two types of flags: those that are simply on or off, and those that work in a group with other flags. The on/off flags are the simplest to understand because you turn them on with setf(fmtflags) and off with unsetf(fmtflags). These flags are shown in the following table. Comment

on/off flag

Effect

ios::skipws

Skip white space. (For input; this is the default.)

ios::showbase

Indicate the numeric base (as set, for example, by dec, oct, or hex) when printing an integral value. Input streams also recognize the base prefix when showbase is on.

ios::showpoint

Show decimal point and trailing zeros for floating-point values.

ios::uppercase

Display uppercase A-F for hexadecimal values and E for scientific values.

ios::showpos

Show plus sign (+) for positive values.

ios::unitbuf

“Unit buffering.” The stream is flushed after each insertion.

 

For example, to show the plus sign for cout, you say cout.setf(ios::showpos). To stop showing the plus sign, you say cout.unsetf(ios::showpos). Comment

The unitbuf flag controls unit buffering, which means that each insertion is flushed to its output stream immediately. This is handy for error tracing, so that in case of a program crash, your data is still written to the log file. The following program illustrates unit buffering.

//: C04:Unitbuf.cpp

#include <cstdlib>  // For abort()

#include <fstream>

using namespace std;

 

int main() {

  ofstream out("log.txt");

  out.setf(ios::unitbuf);

  out << "one\n";

  out << "two\n";

  abort();

} ///:~

 

It is necessary to turn on unit buffering before any insertions are made to the stream. When we commented out the call to setf( ), one particular compiler had written only the letter ‘o’ to the file log.txt. With unit buffering, no data was lost. Comment

The standard error output stream cerr has unit buffering turned on by default. There is a cost for unit buffering, of course, so if an output stream is heavily used, don’t enable unit buffering unless efficiency is not a consideration. Comment

Format fields

The second type of formatting flags work in a group. Only one of these flags can be, like the buttons on old car radios—you push one in, the rest pop out. Unfortunately this doesn’t happen automatically, and you have to pay attention to what flags you’re setting so that you don’t accidentally call the wrong setf( ) function. For example, there’s a flag for each of the number bases: hexadecimal, decimal, and octal. Collectively, these flags are referred to as the ios::basefield. If the ios::dec flag is set and you call setf(ios::hex), you’ll set the ios::hex flag, but you won’t clear the ios::dec bit, resulting in undefined behavior. The proper thing to do is call the second form of setf( ) like this: setf(ios::hex, ios::basefield). This function first clears all the bits in the ios::basefield and then sets ios::hex. Thus, this form of setf( ) ensures that the other flags in the group “pop out” whenever you set one. Of course, the ios::hex manipulator does all this for you, automatically, so you don’t have to concern yourself with the internal details of the implementation of this class or to even care that it’s a set of binary flags. Later you’ll see that there are manipulators to provide equivalent functionality in all the places you would use setf( ). Comment

Here are the flag groups and their effects:

ios::basefield

effect

ios::dec

Format integral values in base 10 (decimal) (the default radix—no prefix is visible).

ios::hex

Format integral values in base 16 (hexadecimal).

ios::oct

Format integral values in base 8 (octal).

Comment

ios::floatfield

effect

ios::scientific

Display floating-point numbers in scientific format. Precision field indicates number of digits after the decimal point.

ios::fixed

Display floating-point numbers in fixed format. Precision field indicates number of digits after the decimal point.

“automatic” (Neither bit is set.)

Precision field indicates the total number of significant digits.

Comment

ios::adjustfield

Effect

ios::left

Left-align values; pad on the right with the fill character.

ios::right

Right-align values. Pad on the left with the fill character. This is the default alignment.

ios::internal

Add fill characters after any leading sign or base indicator, but before the value. (In other words, the sign, if printed, is left-justified while the number is right-justified).

Comment

Width, fill, and precision

The internal variables that control the width of the output field, the fill character used to pad an output field, and the precision for printing floating-point numbers are read and written by member functions of the same name. Comment

Function

effect

int ios::width( )

Returns the current width. (Default is 0.) Used for both insertion and extraction.

int ios::width(int n)

Sets the width, returns the previous width.

int ios::fill( )

Returns the current fill character. (Default is space.)

int ios::fill(int n)

Sets the fill character, returns the previous fill character.

int ios::precision( )

Returns current floating-point precision. (Default is 6.)

int ios::precision(int n)

Sets floating-point precision, returns previous precision. See ios::floatfield table for the meaning of “precision.”

Comment

The fill and precision values are fairly straightforward, but width requires some explanation. When the width is zero, inserting a value produces the minimum number of characters necessary to represent that value. A positive width means that inserting a value will produce at least as many characters as the width; if the value has fewer than width characters, the fill character is used to pad the field. However, the value will never be truncated, so if you try to print 123 with a width of two, you’ll still get 123. The field width specifies a minimum number of characters; there’s no way to specify a maximum number. Comment

The width is also distinctly different because it’s reset to zero by each inserter or extractor that could be influenced by its value. It’s really not a state variable, but rather an implicit argument to the inserters and extractors. If you want a constant width, call width( ) after each insertion or extraction. Comment

An exhaustive example

To make sure you know how to call all the functions previously discussed, here’s an example that calls them all: Comment

//: C04:Format.cpp

// Formatting Functions

#include <fstream>

#include <iostream>

#include "../require.h"

using namespace std;

#define D(A) T << #A << endl; A

 

int main() {

  ofstream T("format.out");

  assure(T);

  D(int i = 47;)

  D(float f = 2300114.414159;)

  const char* s = "Is there any more?";

 

  D(T.setf(ios::unitbuf);)

  D(T.setf(ios::showbase);)

  D(T.setf(ios::uppercase | ios::showpos);)

  D(T << i << endl;) // Default is dec

  D(T.setf(ios::hex, ios::basefield);)

  D(T << i << endl;)

  D(T.setf(ios::oct, ios::basefield);)

  D(T << i << endl;)

  D(T.unsetf(ios::showbase);)

  D(T.setf(ios::dec, ios::basefield);)

  D(T.setf(ios::left, ios::adjustfield);)

  D(T.fill('0');)

  D(T << "fill char: " << T.fill() << endl;)

  D(T.width(10);)

  T << i << endl;

  D(T.setf(ios::right, ios::adjustfield);)

  D(T.width(10);)

  T << i << endl;

  D(T.setf(ios::internal, ios::adjustfield);)

  D(T.width(10);)

  T << i << endl;

  D(T << i << endl;) // Without width(10)

 

  D(T.unsetf(ios::showpos);)

  D(T.setf(ios::showpoint);)

  D(T << "prec = " << T.precision() << endl;)

  D(T.setf(ios::scientific, ios::floatfield);)

  D(T << endl << f << endl;)

  D(T.unsetf(ios::uppercase);)

  D(T << endl << f << endl;)

  D(T.setf(ios::fixed, ios::floatfield);)

  D(T << f << endl;)

  D(T.precision(20);)

  D(T << "prec = " << T.precision() << endl;)

  D(T << endl << f << endl;)

  D(T.setf(ios::scientific, ios::floatfield);)

  D(T << endl << f << endl;)

  D(T.setf(ios::fixed, ios::floatfield);)

  D(T << f << endl;)

 

  D(T.width(10);)

  T << s << endl;

  D(T.width(40);)

  T << s << endl;

  D(T.setf(ios::left, ios::adjustfield);)

  D(T.width(40);)

  T << s << endl;

} ///:~

 

This example uses a trick to create a trace file so that you can monitor what’s happening. The macro D(a) uses the preprocessor “stringizing” to turn a into a string to display. Then it reiterates a so the statement is executed. The macro sends all the information to a file called T, which is the trace file. The output is: Comment

int i = 47;

float f = 2300114.414159;

T.setf(ios::unitbuf);

T.setf(ios::showbase);

T.setf(ios::uppercase | ios::showpos);

T << i << endl;

+47

T.setf(ios::hex, ios::basefield);

T << i << endl;

0X2F

T.setf(ios::oct, ios::basefield);

T << i << endl;

057

T.unsetf(ios::showbase);

T.setf(ios::dec, ios::basefield);

T.setf(ios::left, ios::adjustfield);

T.fill('0');

T << "fill char: " << T.fill() << endl;

fill char: 0

T.width(10);

+470000000

T.setf(ios::right, ios::adjustfield);

T.width(10);

0000000+47

T.setf(ios::internal, ios::adjustfield);

T.width(10);

+000000047

T << i << endl;

+47

T.unsetf(ios::showpos);

T.setf(ios::showpoint);

T << "prec = " << T.precision() << endl;

prec = 6

T.setf(ios::scientific, ios::floatfield);

T << endl << f << endl;

 

2.300114E+06

T.unsetf(ios::uppercase);

T << endl << f << endl;

 

2.300114e+06

T.setf(ios::fixed, ios::floatfield);

T << f << endl;

2300114.500000

T.precision(20);

T << "prec = " << T.precision() << endl;

prec = 20

T << endl << f << endl;

 

2300114.50000000000000000000

T.setf(ios::scientific, ios::floatfield);

T << endl << f << endl;

 

2.30011450000000000000e+06

T.setf(ios::fixed, ios::floatfield);

T << f << endl;

2300114.50000000000000000000

T.width(10);

Is there any more?

T.width(40);

0000000000000000000000Is there any more?

T.setf(ios::left, ios::adjustfield);

T.width(40);

Is there any more?0000000000000000000000

 

Studying this output should clarify your understanding of the iostream formatting member functions. Comment

Manipulators

As you can see from the previous program, calling the member functions for stream formatting operations can get a bit tedious. To make things easier to read and write, a set of manipulators is supplied to duplicate the actions provided by the member functions. Manipulators are a convenience because you can insert them for their effect within a containing expression; you don’t have to create a separate function-call statement. Comment

Manipulators change the state of the stream instead of (or in addition to) processing data. When you insert endl in an output expression, for example, it not only inserts a newline character, but it also flushes the stream (that is, puts out all pending characters that have been stored in the internal stream buffer but not yet output). You can also just flush a stream like this: Comment

cout << flush;

 

which causes a call to the flush( ) member function, as in

cout.flush();

 

as a side effect (nothing is inserted into the stream). Additional basic manipulators will change the number base to oct (octal), dec (decimal) or hex (hexadecimal): Comment

 cout << hex << "0x" << i << endl;

 

In this case, numeric output will continue in hexadecimal mode until you change it by inserting either dec or oct in the output stream.

There’s also a manipulator for extraction that “eats” white space: Comment

cin >> ws;

 

Manipulators with no arguments are provided in <iostream>. These include dec, oct, and hex , which perform the same action as, respectively, setf(ios::dec, ios::basefield), setf(ios::oct, ios::basefield), and setf(ios::hex, ios::basefield), albeit more succinctly. The <iostream> header also includes ws, endl, and flush and the additional set shown here: Comment

Manipulator

Effect

showbase
noshowbase

Indicate the numeric base (dec, oct, or hex) when printing an integral value. The format used can be read by the C++ compiler.

showpos
noshowpos

Show plus sign (+) for positive values.

uppercase
nouppercase

Display uppercase A-F for hexadecimal values, and display E for scientific values.

showpoint
noshowpoint

Show decimal point and trailing zeros for floating-point values.

skipws
noskipws

Skip white space on input.

left
right

internal

Left-align, pad on right.
Right-align, pad on left.
Fill between leading sign or base indicator and value.

scientific
fixed

Indicates the display preference for floating-point output (scientific notation vs. fixed-point decimal).

Comment

Manipulators with arguments

There are six standard manipulators, such as setw( ), that take arguments. These are defined in the header file <iomanip>, and are summarized in the following table.

Manipulator

effect

setiosflags (fmtflags n)

Equivalent to a call to setf(n). The setting remains in effect until the next change, such as ios::setf( ).

resetiosflags(fmtflags n)

Clears only the format flags specified by n. The setting remains in effect until the next change, such as ios::unsetf( ).

setbase(base n)

Changes base to n, where n is 10, 8, or 16. (Anything else results in 0.) If n is zero, output is base 10, but input uses the C conventions: 10 is 10, 010 is 8, and 0xf is 15. You might as well use dec, oct, and hex for output.

setfill(char n)

Changes the fill character to n, such as ios::fill( ).

setprecision(int n)

Changes the precision to n, such as ios::precision( ).

setw(int n)

Changes the field width to n, such as ios::width( ).

Comment

If you’re doing a lot of formatting, you can see how using manipulators instead of calling stream member functions can clean up your code. As an example, here’s the program from the previous section rewritten to use the manipulators. (The D( ) macro is removed to make it easier to read.) Comment

//: C04:Manips.cpp

// Format.cpp using manipulators

#include <fstream>

#include <iomanip>

#include <iostream>

using namespace std;

 

int main() {

  ofstream trc("trace.out");

  int i = 47;

  float f = 2300114.414159;

  char* s = "Is there any more?";

 

  trc << setiosflags(ios::unitbuf

           | ios::showbase | ios::uppercase

           | ios::showpos);

  trc << i << endl;

  trc << hex << i << endl

      << oct << i << endl;

  trc.setf(ios::left, ios::adjustfield);

  trc << resetiosflags(ios::showbase)

      << dec << setfill('0');

  trc << "fill char: " << trc.fill() << endl;

  trc << setw(10) << i << endl;

  trc.setf(ios::right, ios::adjustfield);

  trc << setw(10) << i << endl;

  trc.setf(ios::internal, ios::adjustfield);

  trc << setw(10) << i << endl;

  trc << i << endl; // Without setw(10)

 

  trc << resetiosflags(ios::showpos)

      << setiosflags(ios::showpoint)

      << "prec = " << trc.precision() << endl;

  trc.setf(ios::scientific, ios::floatfield);

  trc << f << resetiosflags(ios::uppercase) << endl;

  trc.setf(ios::fixed, ios::floatfield);

  trc << f << endl;

  trc << f << endl;

  trc << setprecision(20);

  trc << "prec = " << trc.precision() << endl;

  trc << f << endl;

  trc.setf(ios::scientific, ios::floatfield);

  trc << f << endl;

  trc.setf(ios::fixed, ios::floatfield);

  trc << f << endl;

  trc << f << endl;

 

  trc << setw(10) << s << endl;

  trc << setw(40) << s << endl;

  trc.setf(ios::left, ios::adjustfield);

  trc << setw(40) << s << endl;

} ///:~

 

You can see that a lot of the multiple statements have been condensed into a single chained insertion. Notice the call to setiosflags( ) in which the bitwise-OR of the flags is passed. This could also have been done with setf( ) and unsetf( ) as in the previous example. Comment

When using setw( ) with an output stream, the output expression is formatted into a temporary string that is padded with the current fill character if needed, as determined by comparing the length of the formatted result to the argument of setw( ). In other words, setw( ) affects the result string of a formatted output operation. Likewise, using setw( ) with input streams only is meaningful when reading strings, as the following example makes clear.

//: C04:InputWidth.cpp

// Shows limitations of setw with input

#include <cassert>

#include <cmath>

#include <iomanip>

#include <limits>

#include <sstream>

#include <string>

using namespace std;

 

int main() {

  istringstream is("one 2.34 five");

  string temp;

  is >> setw(2) >> temp;

  assert(temp == "on");

  is >> setw(2) >> temp;

  assert(temp == "e");

  double x;

  is >> setw(2) >> x;

  double relerr = fabs(x - 2.34) / x;

  assert(relerr <= numeric_limits<double>::epsilon());

} ///:~

 

If you attempt to read a string, setw( ) will control the number of characters extracted quite nicely… up to a point. The first extraction gets two characters, but the second only gets one, even though we asked for two. That is because operator>>( ) uses white space as a delimiter (unless you turn off the skipws flag). When trying to read a number, however, such as x, you cannot use setw( ) to limit the characters read. With input streams, use only setw( ) for extracting strings. Comment

Creating manipulators

Sometimes you’d like to create your own manipulators, and it turns out to be remarkably simple. A zero-argument manipulator such as endl is simply a function that takes as its argument an ostream reference and returns an ostream reference. The declaration for endl is Comment

ostream& endl(ostream&);

 

Now, when you say: Comment

cout << "howdy" << endl;

 

the endl produces the address of that function. So the compiler asks, “Is there a function I can call that takes the address of a function as its argument?” Predefined functions in <iostream> do this; they’re called applicators (because they apply a function to a stream). The applicator calls its function argument, passing it the ostream object as its argument. You don’t need to know how applicators work to create your own manipulator; you only need to know that they exist. Nonetheless, they’re simple. Here’s the (simplified) code for an ostream applicator:

ostream& ostream::operator<<(ostream& (*pf)(ostream&)) {

  return pf(*this);

}

 

The actual definition is a little more complicated since it involves templates, but this code illustrates the technique. When a function such as *pf (that takes a stream parameter and returns a stream reference) is inserted into a stream, this applicator function is called, which in turn executes the function to which pf points. Applicators for ios_base, basic_ios, basic_ostream, and basic_istream are predefined in the standard C++ library. Comment

To illustrate the process, here’s a trivial example that creates a manipulator called nl that is equivalent to just inserting a newline into a stream (i.e., no flushing of the stream occurs, as with endl): Comment

//: C04:nl.cpp

// Creating a manipulator

#include <iostream>

using namespace std;

 

ostream& nl(ostream& os) {

  return os << '\n';

}

 

int main() {

  cout << "newlines" << nl << "between" << nl

       << "each" << nl << "word" << nl;

} ///:~

 

When you insert nl into an output stream, such as cout, the following sequence of calls ensues:

cout.operator<<(nl) è nl(cout)

 

The expression

os << '\n';

 

inside nl( ) calls ostream::operator(char), which of course returns the stream, which is what is ultimately returned from nl( ).[45] Comment

Effectors

As you’ve seen, zero-argument manipulators are easy to create. But what if you want to create a manipulator that takes arguments? If you inspect the <iomanip> header, you’ll see a type called smanip, which is what the manipulators with arguments return. You might be tempted to somehow use that type to define your own manipulators, but don’t give in to the temptation. The smanip type is implementation-dependent, so using it would not be portable. Fortunately, you can define such manipulators in a straightforward way without any special machinery, based on a technique introduced by Jerry Schwarz, called an effector.[46] An effector is a simple class whose constructor formats a string representing the desired operation, along with an overloaded operator<< to insert that string into a stream. Here’s an example with two effectors. The first outputs a truncated character string, and the second prints a number in binary. Comment

//: C04:Effector.cpp

// Jerry Schwarz's "effectors"

#include <cassert>

#include <limits>  // For max()

#include <sstream>

#include <string>

using namespace std;

 

// Put out a prefix of a string:

class Fixw {

  string str;

public:

  Fixw(const string& s, int width)

    : str(s, 0, width) {}

  friend ostream&

  operator<<(ostream& os, const Fixw& fw) {

    return os << fw.str;

  }

};

 

// Print a number in binary:

typedef unsigned long ulong;

class Bin {

  ulong n;

public:

  Bin(ulong nn) { n = nn; }

  friend ostream& operator<<(ostream& os, const Bin& b) {

    const ulong ULMAX = numeric_limits<ulong>::max();

    ulong bit = ~(ULMAX >> 1); // Top bit set

    while(bit) {

      os << (b.n & bit ? '1' : '0');

      bit >>= 1;

    }

    return os;

  }

};

 

int main() {

  string words =

    "Things that make us happy, make us wise";

  for(int i = words.size(); --i >= 0;) {

    ostringstream s;

    s << Fixw(words, i);

    assert(s.str() == words.substr(0, i));

  }

  ostringstream xs, ys;

  xs << Bin(0xCAFEBABEUL);

  assert(xs.str() ==

    "1100""1010""1111""1110""1011""1010""1011""1110");

  ys << Bin(0x76543210UL);

  assert(ys.str() ==

    "0111""0110""0101""0100""0011""0010""0001""0000");

} ///:~

 

The constructor for Fixw creates a shortened copy of its char* argument, and the destructor releases the memory created for this copy. The overloaded operator<< takes the contents of its second argument, the Fixw object, inserts it into the first argument, the ostream, and then returns the ostream so that it can be used in a chained expression. When you use Fixw in an expression like this: Comment

cout << Fixw(string, i) << endl;

 

a temporary object is created by the call to the Fixw constructor, and that temporary object is passed to operator<<. The effect is that of a manipulator with arguments. The temporary Fixw object persists until the end of the statement. Comment

The Bin effector relies on the fact that shifting an unsigned number to the right shifts zeros into the high bits. We use numeric_limits<unsigned long>::max( ) (the largest unsigned long value, from the standard header <limits> ) to produce a value with the high bit set, and this value is moved across the number in question (by shifting it to the right), masking each bit in turn. We’ve juxtaposed string literals in the code for readability; the separate strings are of course concatenated into one by the compiler. Comment

Historically, the problem with this technique was that once you created a class called Fixw for char* or Bin for unsigned long, no one else could create a different Fixw or Bin class for their type. However, with namespaces, this problem is eliminated. Comment

Iostream examples

In this section you’ll see some examples of what you can do with all the information you’ve learned in this chapter. Although many tools exist to manipulate bytes (stream editors such as sed and awk from UNIX are perhaps the most well known, but a text editor also fits this category), they generally have some limitations. Both sed and awk can be slow and can only handle lines in a forward sequence, and text editors usually require human interaction, or at least learning a proprietary macro language. The programs you write with iostreams have none of these limitations: they’re fast, portable, and flexible. Comment

Maintaining class library source code

Generally, when you create a class, you think in library terms: you make a header file Name.h for the class declaration, and you create a file in which the member functions are implemented, called Name.cpp. These files have certain requirements: a particular coding standard (the program shown here uses the coding format for this book), and in the header file the declarations are generally surrounded by some preprocessor statements to prevent multiple declarations of classes. (Multiple declarations confuse the compiler—it doesn’t know which one you want to use. They could be different, so it throws up its hands and gives an error message.) Comment

This example allows you to create a new header/implementation pair of files or to modify an existing pair. If the files already exist, it checks and potentially modifies the files, but if they don’t exist, it creates them using the proper format. Comment

//: C04:Cppcheck.cpp

// Configures .h & .cpp files to conform to style

// standard. Tests existing files for conformance.

#include <fstream>

#include <sstream>

#include <string>

#include "../require.h"

using namespace std;

 

bool startsWith(const string& base, const string& key) {

  return base.compare(0, key.size(), key) == 0;

}

 

void cppCheck(string fileName) {

  enum bufs { BASE, HEADER, IMPLEMENT,

    HLINE1, GUARD1, GUARD2, GUARD3,

    CPPLINE1, INCLUDE, BUFNUM };

  string part[BUFNUM];

  part[BASE] = fileName;

  // Find any '.' in the string:

  size_t loc = part[BASE].find('.');

  if(loc != string::npos)

    part[BASE].erase(loc); // Strip extension

  // Force to upper case:

  for(size_t i = 0; i < part[BASE].size(); i++)

    part[BASE][i] = toupper(part[BASE][i]);

  // Create file names and internal lines:

  part[HEADER] = part[BASE] + ".h";

  part[IMPLEMENT] = part[BASE] + ".cpp";

  part[HLINE1] = "//" ": " + part[HEADER];

  part[GUARD1] = "#ifndef " + part[BASE] + "_H";

  part[GUARD2] = "#define " + part[BASE] + "_H";

  part[GUARD3] = "#endif // " + part[BASE] +"_H";

  part[CPPLINE1] = string("//") + ": "

    + part[IMPLEMENT];

  part[INCLUDE] = "#include \"" + part[HEADER] + "\"";

  // First, try to open existing files:

  ifstream existh(part[HEADER].c_str()),

           existcpp(part[IMPLEMENT].c_str());

  if(!existh) { // Doesn't exist; create it

    ofstream newheader(part[HEADER].c_str());

    assure(newheader, part[HEADER].c_str());

    newheader << part[HLINE1] << endl

      << part[GUARD1] << endl

      << part[GUARD2] << endl << endl

      << part[GUARD3] << endl;

  } else { // Already exists; verify it

    stringstream hfile; // Write & read

    ostringstream newheader; // Write

    hfile << existh.rdbuf();

    // Check that first three lines conform:

    bool changed = false;

    string s;

    hfile.seekg(0);

    getline(hfile, s);

    bool lineUsed = false;

    // The call to good() is for Microsoft (later too)

    for (int line = HLINE1; hfile.good() && line <= GUARD2;

         ++line) {

      if(startsWith(s, part[line])) {

        newheader << s << endl;

        lineUsed = true;

        if (getline(hfile, s))

          lineUsed = false;

      } else {

        newheader << part[line] << endl;

        changed = true;

        lineUsed = false;

      }

    }

    // Copy rest of file

    if (!lineUsed)

      newheader << s << endl;

    newheader << hfile.rdbuf();

    // Check for GUARD3

    string head = hfile.str();

    if(head.find(part[GUARD3]) == string::npos) {

      newheader << part[GUARD3] << endl;

      changed = true;

    }

    // If there were changes, overwrite file:

    if(changed) {

      existh.close();

      ofstream newH(part[HEADER].c_str());

      assure(newH, part[HEADER].c_str());

      newH << "//@//\n"  // Change marker

        << newheader.str();

    }

  }

  if(!existcpp) { // Create cpp file

    ofstream newcpp(part[IMPLEMENT].c_str());

    assure(newcpp, part[IMPLEMENT].c_str());

    newcpp << part[CPPLINE1] << endl

      << part[INCLUDE] << endl;

  } else { // Already exists; verify it

    stringstream cppfile;

    ostringstream newcpp;

    cppfile << existcpp.rdbuf();

    // Check that first two lines conform:

    bool changed = false;

    string s;

    cppfile.seekg(0);

    getline(cppfile, s);

    bool lineUsed = false;

    for (int line = CPPLINE1;

         cppfile.good() && line <= INCLUDE;

         ++line) {

      if(startsWith(s, part[line])) {

        newcpp << s << endl;

        lineUsed = true;

        if (getline(cppfile, s))

          lineUsed = false;

      } else {

        newcpp << part[line] << endl;

        changed = true;

        lineUsed = false;

      }

    }

    // Copy rest of file

    if (!lineUsed)

      newcpp << s << endl;

    newcpp << cppfile.rdbuf();

    // If there were changes, overwrite file:

    if(changed){

      existcpp.close();

      ofstream newCPP(part[IMPLEMENT].c_str());

      assure(newCPP, part[IMPLEMENT].c_str());

      newCPP << "//@//\n"  // Change marker

        << newcpp.str();

    }

  }

}

 

int main(int argc, char* argv[]) {

  if(argc > 1)

    cppCheck(argv[1]);

  else

    cppCheck("cppCheckTest.h");

} ///:~

 

First notice the useful function startsWith( ), which does just what its name says—it returns true if the first string argument starts with the second argument. This is used when looking for the expected comments and include-related statements. Having the array of strings, part, allows for easy looping through the series of expected statements in source code. If the source file doesn’t exist, we merely write the statements to a new file of the given name. If the file does exist, we search a line at a time, verifying that the expected lines occur. If they are not present, they are inserted. Special care has to be taken to make sure we don’t drop existing lines (see where we use the Boolean variable lineUsed). Notice that we use a stringstream for an existing file, so we can first write the contents of the file to it and then read from it and search it. Comment

The names in the enumeration are BASE, the capitalized base file name without extension; HEADER, the header file name; IMPLEMENT, the implementation file (cpp) name; HLINE1, the skeleton first line of the header file; GUARD1, GUARD2, and GUARD3, the “guard” lines in the header file (to prevent multiple inclusion); CPPLINE1, the skeleton first line of the cpp file; and INCLUDE, the line in the cpp file that includes the header file. Comment

If you run this program without any arguments, the following two files are created:

// CPPCHECKTEST.h

#ifndef CPPCHECKTEST_H

#define CPPCHECKTEST_H

 

#endif // CPPCHECKTEST_H

 

// CPPCHECKTEST.cpp

#include "CPPCHECKTEST.h"

 

(We removed the colon after the double-slash in the first comment lines so as not to confuse the book’s code extractor. It will appear in the actual output produced by cppCheck.)

You can experiment by removing selected lines from these files and re-running the program. Each time you will see that the correct lines are added back in. When a file is modified, the string “//@//” is placed as the first line of the file to bring the change to your attention. You will need to remove this line before you process the file again (otherwise cppcheck will assume the initial comment line is missing). Comment

Detecting compiler errors

All the code in this book is designed to compile as shown without errors. Any line of code that should generate a compile-time error is commented out with the special comment sequence “//!”. The following program will remove these special comments and append a numbered comment to the line. When you run your compiler, it should generate error messages, and you will see all the numbers appear when you compile all the files. This program also appends the modified line to a special file so that you can easily locate any lines that don’t generate errors. Comment

//: C04:Showerr.cpp

// Un-comment error generators

#include <cstddef>

#include <cstdlib>

#include <cstdio>

#include <fstream>

#include <iostream>

#include <sstream>

#include <string>

#include "../require.h"

using namespace std;

 

const string usage =

  "usage: showerr filename chapnum\n"

  "where filename is a C++ source file\n"

  "and chapnum is the chapter name it's in.\n"

  "Finds lines commented with //! and removes\n"

  "comment, appending //(#) where # is unique\n"

  "across all files, so you can determine\n"

  "if your compiler finds the error.\n"

  "showerr /r\n"

  "resets the unique counter.";

 

class Showerr {

  const int CHAP;

  const string MARKER, FNAME;

  // File containing error number counter:

  const string ERRNUM;

  // File containing error lines:

  const string ERRFILE;

  stringstream edited; // Edited file

  int counter;

public:

  Showerr(const string& f, const string& en,

    const string& ef, int c) : FNAME(f), MARKER("//!"),

    ERRNUM(en), ERRFILE(ef), CHAP(c) { counter = 0; }

  void replaceErrors() {

    ifstream infile(FNAME.c_str());

    assure(infile, FNAME.c_str());

    ifstream count(ERRNUM.c_str());

    if(count) count >> counter;

    int linecount = 1;

    string buf;

    ofstream errlines(ERRFILE.c_str(), ios::app);

    assure(errlines, ERRFILE.c_str());

    while(getline(infile, buf)) {

      // Find marker at start of line:

      size_t pos = buf.find(MARKER);

      if(pos != string::npos) {

        // Erase marker:

        buf.erase(pos, MARKER.size() + 1);

        // Append counter & error info:

        ostringstream out;

        out << buf << " // (" << ++counter << ") "

            << "Chapter " << CHAP

            << " File: " << FNAME

            << " Line " << linecount << endl;

        edited << out.str();

        errlines << out.str(); // Append error file

      }

      else

        edited << buf << "\n"; // Just copy

      linecount++;

    }

  }

  void saveFiles() {

    ofstream outfile(FNAME.c_str()); // Overwrites

    assure(outfile, FNAME.c_str());

    outfile << edited.rdbuf();

    ofstream count(ERRNUM.c_str()); // Overwrites

    assure(count, ERRNUM.c_str());

    count << counter; // Save new counter

  }

};

 

int main(int argc, char* argv[]) {

  const string ERRCOUNT("../errnum.txt"),

    ERRFILE("../errlines.txt");

  requireMinArgs(argc, 1, usage.c_str());

  if(argv[1][0] == '/' || argv[1][0] == '-') {

    // Allow for other switches:

    switch(argv[1][1]) {

      case 'r': case 'R':

        cout << "reset counter" << endl;

        remove(ERRCOUNT.c_str()); // Delete files

        remove(ERRFILE.c_str());

        return 0;

      default:

        cerr << usage << endl;

        return 1;

    }

  }

  if (argc == 3) {

    Showerr s(argv[1], ERRCOUNT, ERRFILE, atoi(argv[2]));

    s.replaceErrors();

    s.saveFiles();

  }

} ///:~

 

You can replace the marker with one of your choice. Comment

Each file is read a line at a time, and each line is searched for the marker appearing at the head of the line; the line is modified and put into the error line list and into the string stream, edited. When the whole file is processed, it is closed (by reaching the end of a scope), it is reopened as an output file, and edited is poured into the file. Also notice the counter is saved in an external file. The next time this program is invoked, it continues to sequence the counter. Comment

A simple data logger

This example shows an approach you might take to log data to disk and later retrieve it for processing. It is meant to produce a temperature-depth profile of the ocean at various points. To hold the data, a class is used: Comment

//: C04:DataLogger.h

// Datalogger record layout

#ifndef DATALOG_H

#define DATALOG_H

#include <ctime>

#include <iosfwd>

#include <string>

using std::ostream;

 

struct Coord {

  int deg, min, sec;

  Coord(int d = 0, int m = 0, int s = 0)

    : deg(d), min(m), sec(s) {}

  std::string toString() const;

};

ostream& operator<<(ostream&, const Coord&);

 

class DataPoint {

  std::time_t timestamp; // Time & day

  Coord latitude, longitude;

  double depth, temperature;

public:

  DataPoint(std::time_t ts, const Coord& lat,

            const Coord& lon, double dep, double temp)

    : timestamp(ts), latitude(lat), longitude(lon),

      depth(dep), temperature(temp) {}

  DataPoint() : timestamp(0), depth(0), temperature(0) {}

  friend ostream& operator<<(ostream&, const DataPoint&);

};

#endif // DATALOG_H ///:~

 

A DataPoint consists of a time stamp, which is stored as a time_t value as defined in <ctime>, longitude and latitude coordinates, and values for depth and temperature. We use inserters for easy formatting. Here’s the implementation file: Comment

//: C04:DataLogger.cpp {O}

// Datapoint implementations

#include "DataLogger.h"

#include <iomanip>

#include <iostream>

#include <sstream>

#include <string>

using namespace std;

 

ostream& operator<<(ostream& os, const Coord& c) {

  return os << c.deg << '*' << c.min << '\''

            << c.sec << '"';

}

 

string Coord::toString() const {

  ostringstream os;

  os << *this;

  return os.str();

}

 

ostream& operator<<(ostream& os, const DataPoint& d) {

  os.setf(ios::fixed, ios::floatfield);

  char fillc = os.fill('0'); // Pad on left with '0'

  tm* tdata = localtime(&d.timestamp);

  os << setw(2) << tdata->tm_mon + 1 << '\\'

     << setw(2) << tdata->tm_mday << '\\'

     << setw(2) << tdata->tm_year+1900 << ' '

     << setw(2) << tdata->tm_hour << ':'

     << setw(2) << tdata->tm_min << ':'

     << setw(2) << tdata->tm_sec;

  os.fill(' '); // Pad on left with ' '

  streamsize prec = os.precision(4);

  os << " Lat:" << setw(9) << d.latitude.toString()

     << ", Long:" << setw(9) << d.longitude.toString()

     << ", depth:" << setw(9) << d.depth

     << ", temp:" << setw(9) << d.temperature;

  os.fill(fillc);

  os.precision(prec);

  return os;

} ///:~

 

The Coord::toString( ) function is necessary because the DataPoint inserter calls setw( ) before it prints the latitude and longitude. If we used the stream inserter for Coord instead, the width would only apply to the first insertion (that is, to Coord::deg), since width changes are always reset immediately. The call to setf( ) causes the floating-point output to be fixed-precision, and precision( ) sets the number of decimal places to four. Notice how we restore the fill character and precision to whatever they were before the inserter was called. Comment

To get the values from the time encoding stored in DataPoint::timestamp, we call the function std::localtime( ), which returns a static pointer to a tm object. The tm struct has the following layout:

struct tm {

  int tm_sec; // 0-59 seconds

  int tm_min; // 0-59 minutes

  int tm_hour; // 0-23 hours

  int tm_mday; // Day of month

  int tm_mon; // 0-11 months

  int tm_year; // Years since 1900

  int tm_wday; // Sunday == 0, etc.

  int tm_yday; // 0-365 day of year

  int tm_isdst; // Daylight savings?

};

 

Generating test data

Here’s a program that creates a file of test data in binary form (using write( )) and a second file in ASCII form using the DataPoint inserter. You can also print it out to the screen, but it’s easier to inspect in file form. Comment

//: C04:Datagen.cpp

// Test data generator

//{L} DataLogger

#include <cstdlib>

#include <cstring>

#include <fstream>

#include "../require.h"

#include "DataLogger.h"

using namespace std;

 

int main() {

  ofstream data("data.txt");

  assure(data, "data.txt");

  ofstream bindata("data.bin", ios::binary);

  assure(bindata, "data.bin");

  time_t timer;

  Coord lat(45,20,31);

  Coord lon(22,34,18);

  // Seed random number generator:

  srand(time(&timer));

  for(int i = 0; i < 100; i++, timer += 55) {

    // Zero to 199 meters:

    double newdepth  = rand() % 200;

    double fraction = rand() % 100 + 1;

    newdepth += 1.0 / fraction;

    double newtemp = 150 + rand()%200; // Kelvin

    fraction = rand() % 100 + 1;

    newtemp += 1.0 / fraction;

    const DataPoint d(timer, Coord(45,20,31),

                      Coord(22,34,18), newdepth,

                      newtemp);

    data << d << endl;

    bindata.write(reinterpret_cast<const char*>(&d),

                  sizeof(d));

  }

} ///:~

 

The file data.txt is created in the ordinary way as an ASCII file, but data.bin has the flag ios::binary to tell the constructor to set it up as a binary file. To illustrate the formatting used for the text file, here is the first line of data.txt (the line wraps because it’s longer than this page will allow): Comment

07\28\2003 12:54:40 Lat:45*20'31", Long:22*34'18", depth:  16.0164, temp: 242.0122

 

The Standard C library function time( ) updates the time_t value its argument points to with an encoding of the current time, which on most platforms is the number of seconds elapsed since 00:00:00 GMT, January 1 1970 (the dawning of the age of Aquarius?). The current time is also a convenient way to seed the random number generator with the Standard C library function srand( ), as is done here. Comment

After this, the timer is incremented by 55 seconds to give an interesting interval between readings in this simulation. Comment

The latitude and longitude used are fixed values to indicate a set of readings at a single location. Both the depth and the temperature are generated with the Standard C library rand( ) function, which returns a pseudorandom number between zero and a platform-dependent constant, RAND_MAX, defined in <cstdlib> (usually the value of the platform’s largest unsigned integer). To put this in a desired range, use the remainder operator % and the upper end of the range. These numbers are integral; to add a fractional part, a second call to rand( ) is made, and the value is inverted after adding one (to prevent divide-by-zero errors). Comment

In effect, the data.bin file is being used as a container for the data in the program, even though the container exists on disk and not in RAM. To send the data out to the disk in binary form, write( ) is used. The first argument is the starting address of the source block—notice it must be cast to a char* because that’s what write( ) expects for narrow streams. The second argument is the number of characters to write, which in this case is the size of the DataPoint object (again, because we’re using narrow streams). Because no pointers are contained in DataPoint, there is no problem in writing the object to disk. If the object is more sophisticated, you must implement a scheme for serialization, which writes the data referred to by pointers and defines new pointers when read back in later. (We don’t talk about serialization in this volume—most vendor class libraries have some sort of serialization structure built into them.) Comment

Verifying and viewing the data

To check the validity of the data stored in binary format, you can read it into memory with the read( ) member function for input streams, and compare it to the text file created earlier by Datagen.cpp. The following example just writes the formatted results to cout, but you can redirect this to a file and then use a file comparison utility to verify that it is identical to the original. Comment

//: C04:Datascan.cpp

// Test data generator

//{L} DataLogger

#include <fstream>

#include <iostream>

#include "DataLogger.h"

#include "../require.h"

using namespace std;

 

int main() {

  ifstream bindata("data.bin", ios::binary);

  assure(bindata, "data.bin");

  DataPoint d;

  while (bindata.read(reinterpret_cast<char*>(&d),

         sizeof d))

    cout << d << endl;

} ///:~

 

Internationalization

The software industry is now a healthy, worldwide economic market, and applications that can run in various languages and cultures are in demand. As early as the late 1980s, the C Standards Committee added support for non-U.S. formatting conventions with their locale mechanism. A locale is a set of preferences for displaying certain entities such as dates and monetary quantities. In the 1990s, the C Standards Committee approved an addendum to Standard C that specified functions to handle wide characters (denoted by the type wchar_t), which allow support for character sets other than ASCII and its commonly used Western European extensions. Although the size of a wide character is not specified, some platforms implement them as 32-bit quantities, so they can hold the encodings specified by the Unicode Consortium, as well as mappings to multi-byte characters sets defined by Asian standards bodies. C++ has integrated support for both wide characters and locales into the iostreams library. Comment

Wide Streams

A wide stream is a simply a stream class that handles wide characters. All the examples so far (except for the last traits example in Chapter 3) have used narrow streams, meaning streams that hold instances of char. Since stream operations are essentially the same no matter the underlying character type, they are encapsulated generically as templates. As we mentioned earlier, all input streams, for example, are connected somehow to the basic_istream class template, which is defined as follows:

template<class charT, class traits = char_traits<charT> >

class basic_istream {…};

 

In fact, all input stream types are specializations of this template, according to the following type definitions:

typedef basic_istream<char> istream;

typedef basic_istream<wchar_t> wistream;

typedef basic_ifstream<char> ifstream;

typedef basic_ifstream<wchar_t> wifstream;

typedef basic_istringstream<char> istringstream;

typedef basic_istringstream<wchar_t> wistringstream;

 

All other stream types are defined in similar fashion.

In a “perfect” world, this is all you’d have to do to have streams of different character types. In reality, things aren’t that simple. The reason is that the character-processing functions provided for char and wchar_t don’t have the same names. To compare two narrow strings, for example, you use the strcmp( ) function. For wide characters, that function is named wcscmp( ). (Remember these originated in C, which does not have function overloading, hence unique names are a must.) For this reason, a generic stream can’t just call strcmp( ) in response to a comparison operator. There needs to be a way for the correct low-level functions to be called automatically. Comment

The principle that guides the solution is well known. You simply “factor out” the differences into a new abstraction. The operations you can perform on characters have been abstracted into the char_traits template, which has predefined specializations for char and wchar_t, as we discussed at the end of the previous chapter. To compare two strings, then, basic_string just calls traits::compare( ) (remember that traits is the second template parameter), which in turn calls either strcmp( ) or wcscmp( ), depending on which specialization is being used (transparent to basic_string, of course). Comment

You only need to be concerned about char_traits if you access the low-level character processing functions; most of the time you don’t care. Consider, however, making your inserters and extractors more robust by defining them as templates, just in case someone wants to use them on a wide stream.

To illustrate, recall again the Date class inserter from the beginning of this chapter. We originally declared it as:

ostream& operator<<(ostream&, const Date&);

 

This accommodates only narrow streams. To make it generic, we simply make it a template based on basic_ostream:

template<class charT, class traits>

std::basic_ostream<charT, traits>&

operator<<(std::basic_ostream<charT, traits>& os,

           const Date& d) {

  charT fillc = os.fill(os.widen('0'));

  charT dash = os.widen('-');

  os << setw(2) << d.month << dash

     << setw(2) << d.day << dash

     << setw(4) << d.year;

  os.fill(fillc);

  return os;

}

 

Notice that we also have to replace char with the template parameter charT in the declaration of fillc, since it could be either char or wchar_t, depending on the template instantiation being used. Comment

Since you don’t know when you’re writing the template which type of stream you have, you need a way to automatically convert character literals to the correct size for the stream. This is the job of the widen( ) member function. The expression widen('-'), for example, converts its argument to L’-’ (the literal syntax equivalent to the conversion wchar_t(‘-’)) if the stream is a wide stream and leaves it alone otherwise. There is also a narrow( ) function that converts to a char if needed. Comment

We can use widen( ) to write a generic version of the nl manipulator we presented earlier in the chapter.

template<class charT, class traits>

basic_ostream<charT,traits>&

nl(basic_ostream<charT,traits>& os) {

  return os << charT(os.widen('\n'));

}

 

Locales

Perhaps the most notable difference in typical numeric computer output from country to country is the punctuator used to separate the integer and fractional parts of a real number. In the United States, a period denotes a decimal point, but in much of the world, a comma is expected instead. It would be quite inconvenient to do all your own formatting for locale-dependent displays. Once again, creating an abstraction that handles these differences solves the problem.

That abstraction is the locale. All streams have an associated locale object that they use for guidance on how to display certain quantities for different cultural environments. A locale manages the categories of culture-dependent display rules, which are defined as follows:

Category

Effect

collate

allows comparing strings according to different, supported collating sequences

ctype

abstracts the character classification and conversion facilities found in <cctype>

monetary

supports different displays of monetary quantities

numeric

supports different display formats of real numbers, including radix (decimal point) and grouping (thousands) separators

time

supports various international formats for display of date and time

messages

scaffolding to implement context-dependent message catalogs (such as for error messages in different languages)

 

The following program illustrates basic locale behavior:

//: C04:Locale.cpp

//{-g++}

//{-bor}

//{-edg}

// Illustrates effects of locales

#include <iostream>

#include <locale>

using namespace std;

int main() {

  locale def;

  cout << def.name() << endl;

  locale current = cout.getloc();

  cout << current.name() << endl;

  float val = 1234.56;

  cout << val << endl;

  // Change to French/France

  cout.imbue(locale("french"));

  current = cout.getloc();

  cout << current.name() << endl;

  cout << val << endl;

 

  cout << "Enter the literal 7890,12: ";

  cin.imbue(cout.getloc());

  cin >> val;

  cout << val << endl;

  cout.imbue(def);

  cout << val << endl;

} ///:~

 

Here’s the output:

C

C

1234.56

French_France.1252

1234,56

Enter the literal 7890,12: 7890,12

7890,12

7890.12

 

The default locale is the “C” locale, which is what C and C++ programmers have been used to all these years (basically, English language and American culture). All streams are initially “imbued” with the “C” locale. The imbue( ) member function changes the locale that a stream uses. Notice that the full ISO name for the “French” locale is displayed (that is, French used in France vs. French used in another country). This example shows that this locale uses a comma for a radix point in numeric display. We have to change cin to the same locale if we want to do input according to the rules of this locale.

Each locale category is divided into number of facets, which are classes encapsulating the functionality that pertains to that category. For example, the time category has the facets time_put and time_get, which contain functions for doing time and date input and output respectively. The monetary category has facets money_get, money_put, and moneypunct. (The latter facet determines the currency symbol.) The following program illustrates the moneypunct facet. (The time facet requires a sophisticated use of iterators which is beyond the scope of this chapter.)

//: C04:Facets.cpp

//{-bor}

//{-g++}

#include <iostream>

#include <locale>

#include <string>

using namespace std;

 

int main() {

  // Change to French/France

  locale loc("french");

  cout.imbue(loc);

 

  string currency =

    use_facet<moneypunct<char> >(loc).curr_symbol();

  char point =

    use_facet<moneypunct<char> >(loc).decimal_point();

  cout << "I made " << currency << 12.34 << " today!"

       << endl;

} ///:~

 

The output shows the French currency symbol and decimal separator:

I made Ç12,34 today!

 

You can also define your own facets to construct customized locales.[47] Be aware that the overhead for locales is considerable. In fact, some library vendors provide different “flavors” of the standard C++ library to accommodate environments that have limited space.[48]

Summary

This chapter has given you a fairly thorough introduction to the iostream class library. In all likelihood, it is all you need to create programs using iostreams. However, be aware that some additional features in iostreams are not used often, but you can discover them by looking at the iostream header files and by reading your compiler’s documentation on iostreams or the references mentioned in this chapter and in the book’s preface. Comment

Exercises

                            1.             Open a file by creating an ifstream object. Make an ostringstream object and read the entire contents into the ostringstream using the rdbuf( ) member function. Extract a string copy of the underlying buffer and capitalize every character in the file using the Standard C toupper( ) macro defined in <cctype>. Write the result out to a new file.</#><#TIC2V2_CHAPTER5_I190>

                       24.             Create a program that opens a file (the first argument on the command line) and searches it for any one of a set of words (the remaining arguments on the command line). Read the input a line at a time, and write out the lines (with line numbers) that match to the new file.</#><#TIC2V2_CHAPTER5_I191>

                       25.             Write a program that adds a copyright notice to the beginning of all source-code files indicated by the program’s command-line arguments.</#><#TIC2V2_CHAPTER5_I192>

                       26.             Use your favorite text-searching program (grep, for example) to output the names (only) of all the files that contain a particular pattern. Redirect the output into a file. Write a program that uses the contents of that file to generate a batch file that invokes your editor on each of the files found by the search program.

                        27.             We know that setw( ) allows for a minimum of characters read in, but what if you wanted to read a maximum? Write an effector that allows the user to specify a maximum number of characters to extract. Have your effector also work for output, in such a way that output fields are truncated, if necessary, to stay within width limits.

                       28.             Demonstrate to yourself that if the fail or bad bit is set, and you subsequently turn on stream exceptions, that the stream will immediately throw an exception.

                       29.             String streams accommodate easy conversions, but they come with a price. Write a program that races atoi( ) against the stringstream conversion system to see the effect of the overhead involved with stringstream.

                       30.             Make a Person struct with fields such as name, age, address, etc. Make the string fields fixed-size arrays. The social security number will be the key for each record. Implement the following Database class.

 

class DataBase {

public:

  // Find where a record is on disk

  size_t query(size_t ssn);

  // Return the person at rn (record number)

  Person retrieve(size_t rn);

  // Record a record on disk

  void add(const Person& p);

};

 

Write some Person records to disk (do not keep them all in memory). When the user requests a record, read it off the disk and return it. The I/O operations in the DataBase class use read( ) and write( ) to process all Person records.

                        31.             Write an operator<< inserter for the Person struct that can be used to display records in a format easily read. Practice writing it out to file.

                       32.             Suppose your database for your Person structs was lost but that you have the file you wrote from the previous exercise. Recreate your database using this file. Be sure to use error checking.

                       33.             Write size_t(-1) (the largest unsigned int on your platform) to a text file 1,000,000 times. Repeat, but write to a binary file. Compare the two files for size, and see how much room is saved using the binary format. (You may first want to try to calculate how much will be saved on your platform.)

                       34.             Found out the maximum number of digits of precision your implementation of iostreams will print by repeatedly increasing the value of the argument to precision( ) when printing a transcendental number such as sqrt(2.0).

                       35.             Write a program that reads real numbers from a file and prints their sum, average, minimum, and maximum.

                       36.             Determine the output of the following program before it is executed.

//: C04:Exercise16.cpp

#include <fstream>

#include <iostream>

#include <sstream>

#include "../require.h"

using namespace std;

#define d(a) cout << #a " ==\t" << a << endl;

 

void tellPointers(fstream& s) {

  d(s.tellp());

  d(s.tellg());

  cout << endl;

}

void tellPointers(stringstream& s) {

  d(s.tellp());

  d(s.tellg());

  cout << endl;

}

int main() {

  fstream in("Exercise16.cpp");

  assure(in, "Exercise16.cpp");

  in.seekg(10);

  tellPointers(in);

  in.seekp(20);

  tellPointers(in);

  stringstream memStream("Here is a sentence.");

  memStream.seekg(10);

  tellPointers(memStream);

  memStream.seekp(5);

  tellPointers(memStream);

} ///:~

 

                        37.             Suppose you are given line-oriented data in a file formatted as follows:

Australia

5E56,7667230284,Langler,Tyson,31.2147,0.00042117361

2B97,7586701,Oneill,Zeke,553.429,0.0074673053156065

4D75,7907252710,Nickerson,Kelly,761.612,0.010276276

9F2,6882945012,Hartenbach,Neil,47.9637,0.0006471644

Austria

480F,7187262472,Oneill,Dee,264.012,0.00356226040013

1B65,4754732628,Haney,Kim,7.33843,0.000099015948475

DA1,1954960784,Pascente,Lester,56.5452,0.0007629529

3F18,1839715659,Elsea,Chelsy,801.901,0.010819887645

Belgium

BDF,5993489554,Oneill,Meredith,283.404,0.0038239127

5AC6,6612945602,Parisienne,Biff,557.74,0.0075254727

6AD,6477082,Pennington,Lizanne,31.0807,0.0004193544

4D0E,7861652688,Sisca,Francis,704.751,0.00950906238

Bahamas

37D8,6837424208,Parisienne,Samson,396.104,0.0053445

5E98,6384069,Willis,Pam,90.4257,0.00122009564059246

1462,1288616408,Stover,Hazal,583.939,0.007878970561

5FF3,8028775718,Stromstedt,Bunk,39.8712,0.000537974

1095,3737212,Stover,Denny,3.05387,0.000041205248883

7428,2019381883,Parisienne,Shane,363.272,0.00490155

 

The heading of each section is a region, and every line under that heading is a seller in that region. Each comma-separated field represents the data about each seller. The first field in a line is the SELLER_ID which unfortunately was written out in hexadecimal format. The second is the PHONE_NUMBER (notice that some are missing area codes). LAST_NAME and FIRST_NAME then follow. TOTAL_SALES is the second to the last column. The last column is the decimal amount of the total sales that the seller represents for the company. You are to format the data on the terminal window so that an executive can easily interpret the trends. Sample output is given below.

                          Australia

              ---------------------------------

 

*Last Name*   *First Name*   *ID*    *Phone*        *Sales*   *Percent*

 

Langler       Tyson          24150   766-723-0284     31.24   4.21E-02

Oneill        Zeke           11159   XXX-758-6701    553.43   7.47E-01

(etc.)




5: Templates in depth

The C++ template facility goes far beyond simple “containers of T.” Although the original motivation was to enable type-safe, generic containers, in modern C++, templates are also used to generate custom code and to optimize program execution through compile-time programming constructs. Comment

In this chapter we offer a practical look at the power (and pitfalls) of programming with templates in modern C++. For a more complete analysis of template-related language issues and “gotchas,” we recommend the superb book by David Vandevoorde and Nico Josuttis.[49] Comment

Template parameters

As we illustrated in Volume 1, templates come in two flavors: function templates and class templates. Both are wholly characterized by their parameters. Each template parameter itself can represent one of the following categories of arguments:

1.       Types (either built-in or user-defined)

2.      Compile-time constant values (for example, integers, and pointers and references to static entities; often referred to as non-type parameters)

3.      Other templates

The examples in Volume 1 all fall into the first category and are the most common. The canonical example for simple container-like templates nowadays seems to be a Stack class. Being a container, a Stack object is not concerned with the type of object it stores; the logic of holding objects is independent of the type of objects being held. For this reason you can use a type parameter to represent the contained type: Comment

template<class T>

class Stack {

  T* data;

  size_t count;

public:

  void push(const T& t);

  // etc.

};

 

You provide the actual type to be used for a particular Stack instance by means of an argument for the parameter T:

Stack<int> myStack;  // A Stack of ints

 

The compiler then provides an int-version of Stack by substituting int for T and generating the corresponding code. The name of the class instance generated from the template in this case is Stack<int>. Comment

Non-type template parameters

It is also possible to provide a non-type template parameter, as long as it represents an integral value that is known at compile time. You can make a fixed-size Stack, for instance, by specifying a non-type parameter to be used as the dimension for the underlying array, as follows. Comment

template<class T, size_t N>

class Stack {

  T data[N];  // Fixed capacity is N

  size_t count;

public:

  void push(const T& t);

  // etc.

};

 

You must provide a compile-time constant value for the parameter N when you request an instance of this template, such as

Stack<int, 100> myFixedStack;

 

Because the value of N is known at compile time, the underlying array (data) can be placed on the runtime stack instead of on the free store, which can improve runtime performance by avoiding the overhead associated with dynamic memory allocation. Following the pattern mentioned earlier, the name of the class above is Stack<int, 100>. This means that each distinct value of N results in a unique class type. For example, Stack<int, 99> is a distinct class from Stack<int, 100>. Comment

The bitset class template, discussed in detail in Chapter 7, is the only class in the standard C++ library that uses a non-type template parameter, which happens to specify the number of bits the bitset object can hold. The following random number generator example uses a bitset to track numbers so all the numbers in its range are returned in random order without repetition before starting over. This example also overloads operator( ) to produce a familiar function-call syntax. Comment

//: C05:Urand.h

//{-bor}

// Unique randomizer

#ifndef URAND_H

#define URAND_H

#include <bitset>

#include <cstddef>

#include <cstdlib>

#include <ctime>

using std::size_t;

using std::bitset;

 

template<size_t UpperBound>

class Urand {

  bitset<UpperBound> used;

public:

  Urand() {

    srand(time(0));  // randomize

  }

  size_t operator()(); // The "generator" function

};

 

template<size_t UpperBound>

inline size_t Urand<UpperBound>::operator()() {

  if(used.count() == UpperBound)

    used.reset();  // start over (clear bitset)

  size_t newval;

  while(used[newval = rand() % UpperBound])

    ; // Until unique value is found

  used[newval] = true;

  return newval;

}

#endif // URAND_H ///:~

 

The uniqueness of Urand is produced by tracking with a bitset all the numbers possible in the random space (the upper bound is set with the template argument) and by recording each number as it’s used by setting the corresponding position bit in used. When the numbers are all used up, the bitset is cleared to start over. Here’s a simple driver that illustrates how to use a Urand object: Comment

//: C05:UrandTest.cpp

//{-bor}

#include <iostream>

#include "Urand.h"

using namespace std;

 

int main() {

  Urand<10> u;

  for(int i = 0; i < 20; ++i)

    cout << u() << ' ';

} ///:~

 

As we explain later in this chapter, non-type template arguments are also important in the optimization of numeric computations. Comment

Default template arguments

You can provide default arguments for template parameters in class templates. (They are not allowed in function templates.) As with default function arguments, they should only be defined once, the first time a template declaration or definition is seen by the compiler; and once you introduce a default argument, all the subsequent template parameters must also have defaults. To make the fixed-size Stack template shown earlier a little friendlier, for example, you can add a default argument like this: Comment

template<class T, size_t N = 100>

class Stack {

  T data[N];  // Fixed capacity is N

  size_t count;

public:

  void push(const T& t);

  // etc.

};

 

Now, if you omit the second template argument when declaring a Stack object, the value for N will default to 100. Comment

You can choose to provide defaults for all arguments, but you must use an empty set of brackets when declaring an instance so that the compiler knows that a class template is involved. Here’s how:

template<class T = int, size_t N = 100>  // Both defaulted

class Stack {

  T data[N];  // Fixed capacity is N

  size_t count;

public:

  void push(const T& t);

  // etc.

};

 

Stack<> myStack;  // Same as Stack<int, 100> Comment

 

Default arguments are used heavily in the standard C++ library. The vector class template, for instance, is declared as follows:

template <class T, class Allocator = allocator<T> >

class vector;

 

Note the space between the last two right angle bracket characters. This prevents the compiler from interpreting those two characters (>>) as the right-shift operator.

This declaration reveals that vector actually takes two arguments: the type of the contained objects it holds, and a type that represents the allocator used by the vector. (We talk more about allocators in Chapter 7.) Whenever you omit the second argument, the standard allocator template is used, parameterized by the first template parameter. This declaration also shows that you can use template parameters in other, subsequent template parameters, as T is used here. Comment

Although you cannot use default template arguments in function templates, you can use template parameters as default arguments to normal functions. The following function template adds the elements in a sequence. Comment

//: C05:FuncDef.cpp

#include <iostream>

using namespace std;

 

template<class T>

T sum(T* b, T* e, T init = T()) {

  while(b != e)

    init += *b++;

  return init;

}

 

int main() {

  int a[] = {1,2,3};

  cout << sum(a, a+sizeof a / sizeof a[0]) << endl; // 6

} ///:~

 

The third argument to sum( ) is the initial value for the accumulation of the elements. Since we omitted it, this argument defaults to T( ), which in the case of int and other built-in types invokes a pseudo-constructor that performs zero-initialization. Comment

Template template parameters

The third type of parameter a template can accept is another class template. This may sound strange, since templates are types, and type parameters are already allowed, but if you are going to use a template type parameter as a template in your code, the compiler needs to know that the parameter is a template in the first place. The following example illustrates a template template parameter. Comment

//: C05:TempTemp.cpp

// Illustrates a template template parameter

#include <cstddef>

#include <iostream>

using namespace std;

 

template<class T>

class Array { // A simple, expandable sequence

  enum {INIT = 10};

  T *data;

  size_t capacity;

  size_t count;

public:

  Array() {

    count = 0;

    data = new T[capacity = INIT];

  }

  void push_back(const T& t) {

    if(count == capacity) {

      // Grow underlying array

      size_t newCap = 2*capacity;

      T* newData = new T[newCap];

      for (size_t i = 0; i < count; ++i)

        newData[i] = data[i];

      delete data;

      data = newData;

      capacity = newCap;

    }

    data[count++] = t;

  }

  void pop_back() {

    if(count > 0)

      --count;

  }

  T* begin() {

    return data;

  }

  T* end() {

    return data + count;

  }

};

 

template<class T, template<class> class Seq>

class Container {

  Seq<T> seq;

public:

  void append(const T& t) {

    seq.push_back(t);

  }

  T* begin() {

    return seq.begin();

  }

  T* end() {

    return seq.end();

  }

};

 

int main() {

  Container<int, Array> theData;

  theData.append(1);

  theData.append(2);

  int* p = theData.begin();

  while(p != theData.end())

    cout << *p++ << endl;

} ///:~

 

The Array class template is a trivial sequence container. The Container template takes two parameters: the type of the objects it is to hold, and a sequence data structure to do the holding. The following line in the implementation of the Container class requires that we inform the compiler that Seq is a template: Comment

  Seq<T> seq;

 

If we hadn’t declared Seq to be a template template parameter, the compiler would complain here that Seq is not a template, since we’re using it as such. In main( ) a Container is instantiated to use an Array to hold integers, so Seq stands for Array in this example. Comment

Note that it is not necessary in this case to name the parameter for Seq inside Container’s declaration. The line in question is:

template<class T, template<class> class Seq>

 

Although we could have written

template<class T, template<class U> class Seq>

 

the parameter U is not needed anywhere. All that matters is that Seq is a class template that takes a single type parameter. This is analogous to omitting the names of function parameters when they’re not needed, such as when you overload the post-increment operator: Comment

T operator++(int);

 

The int here is merely a placeholder and therefore needs no name.

The following program uses a fixed-size array, which has an extra template parameter representing the array dimension:

//: C05:TempTemp2.cpp

// A multi-variate template template parameter

#include <cstddef>

#include <iostream>

using namespace std;

 

template<class T, size_t N>

class Array {

  T data[N];

  size_t count;

public:

  Array() { count = 0; }

  void push_back(const T& t) {

    if(count < N)

      data[count++] = t;

  }

  void pop_back() {

    if(count > 0)

      --count;

  }

  T* begin() { return data; }

  T* end() { return data + count; }

};

 

template<class T,size_t N,template<class,size_t> class Seq>

class Container {

  Seq<T,N> seq;

public:

  void append(const T& t) { seq.push_back(t); }

  T* begin() { return seq.begin(); }

  T* end() { return seq.end(); }

};

 

int main() {

  const size_t N = 10;

  Container<int, N, Array> theData;

  theData.append(1);

  theData.append(2);

  int* p = theData.begin();

  while(p != theData.end())

    cout << *p++ << endl;

} ///:~

 

Once again, parameter names are not needed in the declaration of Seq inside Container’s declaration, but we need two parameters to declare the data member seq, hence the appearance of the non-type parameter N at the top level. Comment

Combining default arguments with template template parameters is slightly more problematic. The When the compiler looks at the inner parameters of a template template parameter, default arguments are not considered, so you have to repeat the defaults in order to get an exact match. The following example uses a default argument for the fixed-size Array template and shows how to accommodate this quirk in the language. Comment

//: C05:TempTemp3.cpp

//{-bor}

//{-msc}

// Combining template template parameters and

// default arguments

#include <cstddef>

#include <iostream>

using namespace std;

 

template<class T, size_t N = 10>  // A default argument

class Array {

  T data[N];

  size_t count;

public:

  Array() { count = 0; }

  void push_back(const T& t) {

    if(count < N)

      data[count++] = t;

  }

  void pop_back() {

    if(count > 0)

      --count;

  }

  T* begin() { return data; }

  T* end() { return data + count; }

};

 

template<class T, template<class, size_t = 10> class Seq>

class Container {

  Seq<T> seq;  // Default used

public:

  void append(const T& t) { seq.push_back(t); }

  T* begin() { return seq.begin(); }

  T* end() { return seq.end(); }

};

 

int main() {

  Container<int, Array> theData;

  theData.append(1);

  theData.append(2);

  int* p = theData.begin();

  while(p != theData.end())

    cout << *p++ << endl;

} ///:~

 

It is necessary to include the default dimension of 10 in the line:

template<class T, template<class, size_t = 10> class Seq>

 

Both the definition of seq in Container and theData in main( ) use the default. The only way to use something other than the default value is as the previous program (TempTemp2.cpp) illustrated. This is the only exception to the rule stated earlier that default arguments should appear only once in a compilation unit. Comment

Since the standard sequence containers (vector, list, and deque, discussed in depth in Chapter 7) have a default allocator argument, the technique shown above is helpful should you ever want to pass one of these sequences as a template parameter. The following program passes a vector and then a list to two instances of Container. Comment

//: C05:TempTemp4.cpp

//{-bor}

//{-msc}

// Passes standard sequences as template arguments

#include <iostream>

#include <list>

#include <memory>  // Declares allocator<T>

#include <vector>

using namespace std;

 

template<class T, template<class U, class = allocator<U> >

                    class Seq>

class Container {

  Seq<T> seq; // Default of allocator<T> applied implicitly

public:

  void push_back(const T& t) { seq.push_back(t); }

  typename Seq<T>::iterator begin() { return seq.begin(); }

  typename Seq<T>::iterator end() { return seq.end(); }

};

 

int main() {

  // Use a vector

  Container<int, vector> theData;

  theData.push_back(1);

  theData.push_back(2);

  for(vector<int>::iterator p = theData.begin();

      p != theData.end(); ++p) {

    cout << *p << endl;

  }

  // Use a list

  Container<int, list> theOtherData;

  theOtherData.push_back(3);

  theOtherData.push_back(4);

  for(list<int>::iterator p2 = theOtherData.begin();

      p2 != theOtherData.end(); ++p2) {

    cout << *p2 << endl;

  }

} ///:~

 

In this case we name the first parameter of the inner template Seq (with the name U), because the allocators in the standard sequences must themselves be parameterized with the same type as the contained objects in the sequence. Also, since the default allocator parameter is known, we can omit it in the subsequent references to Seq<T>, as we did in the previous program. To fully explain this example, however, we have to discuss the semantics of the typename keyword. Comment

The typename keyword

Consider the following: Comment

//: C05:TypenamedID.cpp

//{-bor}

// Uses 'typename' as a prefix for nested types

 

template<class T> class X {

  // Without typename, you should get an error:

  typename T::id i;

public:

  void f() { i.g(); }

};

 

class Y {

public:

  class id {

  public:

    void g() {}

  };

};

 

int main() {

  X<Y> xy;

  xy.f();

} ///:~

 

The template definition assumes that the class T that you hand it must have a nested identifier of some kind called id. But id could also be a static data member of T, in which case you can perform operations on id directly, but you can’t “create an object” of “the type id.” Comment

However, that’s exactly what is happening here: the identifier id is being treated as if it were actually a nested type inside T. In the case of class Y, id is in fact a nested type, but (without the typename keyword) the compiler can’t know that when it’s compiling X. Comment

If the compiler has the option of treating an identifier as a type or as something other than a type when it sees an identifier in a template, it will assume that the identifier refers to something other than a type. That is, it will assume that the identifier refers to an object (including variables of primitive types), an enumeration, or something similar. However, it will not–cannot–just assume that it is a type. Comment

Because the default behavior of the compiler is to assume that a name that fits the above two points is not a type, you must use typename for nested names (except in constructor initializer lists, where it is neither needed nor allowed). In the above example, when the compiler sees template T::id, it knows (because of the typename keyword) that id refers to a nested type and thus it can create an object of that type. Comment

The short version of the rule is: if a type referred to inside template code is qualified by a template type parameter, it should be preceded by the typename keyword, unless it appears in a base class specification or initializer list in the same scope (in which case you must not).

All the above explains the use of the typename keyword in the program TempTemp4.cpp. Without it, the compiler would assume that the expression Seq<T>::iterator is not a type, but we were using it to define the return type of the begin( ) and end( ) member functions. Comment

The following example, which defines a function template that can print any standard C++ sequence, shows a similar use of typename.

//: C05:PrintSeq.cpp

//{-msc}

// A print function for standard C++ sequences

#include <iostream>

#include <list>

#include <memory>

#include <vector>

using namespace std;

 

template<class T, template<class U, class = allocator<U> >

                  class Seq>

void printSeq(Seq<T>& seq) {

  for (typename Seq<T>::iterator b = seq.begin();

       b != seq.end();)

    cout << *b++ << endl;

}

 

int main() {

  // Process a vector

  vector<int> v;

  v.push_back(1);

  v.push_back(2);

  printSeq(v);

  // Process a list

  list<int> lst;

  lst.push_back(3);

  lst.push_back(4);

  printSeq(lst);

} ///:~

 

Once again, without the typename keyword the compiler will interpret iterator as a static data member of Seq<T>, which is a syntax error, since a type is required. Comment

Typedef-ing a typename

It’s important not to assume that the typename keyword creates a new type name. It doesn’t. Its purpose is to inform the compiler that the qualified identifier is to be interpreted as a type. A line that reads: Comment

typename Seq<T>::iterator It;

 

causes a variable named It to be declared of type Seq<T>::iterator. If you mean to create a new type name, you should use typedef, as usual, as in: Comment

typedef typename Seq<It>::iterator It;

 

Using typename instead of class

Another role of the typename keyword is to provide you the option of using typename instead of class in the template argument list of a template definition. To some, this produces clearer code (your mileage may vary): Comment

//: C05:UsingTypename.cpp

// Using 'typename' in the template argument list

 

template<typename T> class X { };

 

int main() {

  X<int> x;

} ///:~

 

You probably won’t see a great deal of code that uses typename in this fashion, since the keyword was added to the language a relatively long time after templates were introduced. Comment

Using the template keyword as a hint

Just as the typename keyword helps the compiler in situations in which a type identifier is not expected, there is also a potential difficulty with tokens that are not identifiers, such as the < and > characters; sometimes they represent the less-than or greater-than symbols, and sometimes they delimit template parameter lists. As an example, we’ll once more use the bitset class: Comment

//: C05:DotTemplate.cpp

// Illustrate the .template construct

#include <bitset>

#include <cstddef>

#include <iostream>

#include <string>

using namespace std;

 

template<class charT, size_t N>

basic_string<charT> bitsetToString(const bitset<N>& bs) {

  return bs. template to_string<charT, char_traits<charT>,

                                allocator<charT> >();

}

 

int main() {

  bitset<10> bs;

  bs.set(1);

  bs.set(5);

  cout << bs << endl; // 0000100010

  string s = bitsetToString<char>(bs);

  cout << s << endl;  // 0000100010

} ///:~

 

The bitset class supports conversion to string object via its to_string member function. To support multiple string classes, to_string is itself a template, following the pattern established by the basic_string template discussed in Chapter 3. The declaration of to_string inside of bitset looks like this: Comment

template <class charT, class traits, class Allocator>

basic_string<charT, traits, Allocator> to_string() const;

 

Our bitsetToString( ) function template above allows you to request different types of string representations of a bitset. To get a wide string, for instance, you change the call to the following:

  wstring s = bitsetToString<wchar_t>(bs);

 

Note that basic_string uses default template arguments, so we don’t have to repeat the char_traits and allocator arguments in the return value. Unfortunately, bitset::to_string does not use default arguments. Using bitsetToString<char>( bs) is more convenient than typing a fully-qualified call to bs.template to_string<char, char_traits, allocator<char> >( ) every time. Comment

The return statement in bitsetToString( ) contains the template keyword in an odd place—right after the dot operator applied to the bitset object bs. This is because when the template is parsed, the < character after the to_string token would be interpreted as a less-than operation instead of the beginning or a template argument list. We explain exactly why this confusion exists in the section “Name lookup issues” later in this chapter. The template keyword used in this context tells the compiler that what follows is the name of a template, causing the < character to be interpreted correctly. The same reasoning applies to the -> and :: operators when applied to templates. As with the typename keyword, this template disambiguation technique can only be used within a template.[50] Comment

Member Templates

The bitset::to_string( ) function template is an example of a member template: a template declared within another class or class template. This allows many combinations of independent template arguments to be combined. A useful example is found in the complex class template in the standard C+ library. The complex template has a type parameter meant to represent an underlying floating-point type to hold the real and imaginary parts of a complex number. The following code snippet from the standard library shows a member-template constructor in the complex class template: Comment

template<typename T>

class complex {

public:

  template<class X> complex(const complex<X>&);

 

The standard complex template comes ready-made with specializations that use float, double, and long double for the parameter T. The member-template constructor above allows you to create a new complex number that uses a different floating-point type as its base type, as seen in the code below: Comment

  complex<float> z(1,2);

  complex<double> w(z);

 

In the declaration of w, the complex template parameter T is double and X is float. Member templates make this kind of flexible conversion easy. Comment

Since defining a template within a template is a nesting operation, the prefixes that introduce the templates must reflect that nesting if you define the member template outside the outer class definition. For example, if you were to implement the complex class template, and if you were to define the member-template constructor above outside the complex template class definition, you would have to do it like this: Comment

template<typename T>

template<typename X>

complex<T>::complex(const complex<X>& c) {/*body here…*/}

 

Another use of member function templates in the standard library is in the initialization of containers, such as a vector. Suppose we have a vector of ints and we want to initialize a new vector of doubles with it, like this: Comment

  int data[5] = {1,2,3,4,5};

  vector<int> v1(data, data+5);

  vector<double> v2(v1.begin(), v1.end());

 

As long as the elements in v1 are assignment-compatible with the elements in v2 (as double and int are here), all is well. The vector class template has the following member template constructor: Comment

template <class InputIterator>

vector(InputIterator first, InputIterator last,

       const Allocator& = Allocator());

 

This constructor is actually used twice in the vector declarations above. When v1 is initialized from the array of ints, the type InputIterator is int*. When v2 is initialized from v1, an instance of the member template constructor is used with InputIterator representing vector<int>::iterator. Comment

Member templates can also be classes. (They don’t have to be functions, although that’s usually what you need.) The following example shows a member class template inside an outer class template. Comment

//: C05:MemberClass.cpp

// A member class template

#include <iostream>

#include <typeinfo>

using namespace std;

 

template<class T>

class Outer {

public:

  template<class R>

  class Inner {

  public:

    void f();

  };

};

 

template<class T> template <class R>

void Outer<T>::Inner<R>::f() {

  cout << "Outer == " << typeid(T).name() << endl;

  cout << "Inner == " << typeid(R).name() << endl;

  cout << "Full Inner == " << typeid(*this).name() << endl;

}

 

int main() {

  Outer<int>::Inner<bool> inner;

  inner.f();

} ///:~

 

The typeid operator, which is covered in Chapter 8, returns an object whose name( ) member function yields a string representation of a type or of the type of a variable. Although the exact representation varies from compiler to compiler, the output of the program above should be something like this: Comment

Outer == int

Inner == bool

Full Inner == Outer<int>::Inner<bool>

 

The declaration of the variable inner in the main program instantiates both Inner<bool> and Outer<int>. Comment

Member template functions cannot be declared virtual. Current compiler technology expects to be able to fix the size of a class’s virtual function table when the class is parsed. Allowing virtual member template functions would require knowing all calls to such member functions everywhere in the program ahead of time, which is not feasible, especially for multi-file projects. Comment

Function template issues

Just as a class template describes a family of classes, a function template describes a family of functions. The syntax for creating either type of template is virtually identical, but they differ somewhat in how they are used. You must always use angle brackets when instantiating class templates and you must supply all non-default template arguments. With function templates, on the other hand, you can often omit the template arguments, and default template arguments are not even allowed. Consider a typical implementation of the min( ) function template declared in the <algorithm> header, which looks something like this: Comment

template<typename T>

const T& min(const T& a, const T& b) {

  return (a < b) ? a : b;

}

 

You could invoke this template by providing the type of the arguments in angle brackets, just like you do with class templates, as in:

int z = min<int>(i, j);

 

This syntax tells the compiler that a specialization of the min template is needed with int used in place of the parameter T, whereupon the compiler generates the corresponding code. Following the pattern of naming the classes generated from class templates, you can think of the name of the instantiated function as min<int>. Comment

Type deduction of function template arguments

You can always use such explicit function template specification as in the example above, but it is often convenient to leave off the template arguments and let the compiler deduce them from the function arguments, like this: Comment

int z = min(i, j);

 

If both i and j are ints, the compiler knows that you need min<int>, which it then instantiates automatically. The types must be identical, because the template was originally specified with only one template type argument used for both function parameters. No standard conversions are applied for function arguments whose type is specified by a template parameter. For example, if you wanted to find the minimum of an int and a double, the following attempt at a call to min would fail: Comment

int z = min(x, j); // x is a double

 

Since x and j are distinct types, no single parameter matches the template parameter T in the definition of min; so the call does not match the template declaration. You can work around this difficulty by casting one argument to the other’s type or by reverting to the fully-specified call syntax, as in: Comment

int z = min<double>(x, j);

 

This tells the compiler to generate the double version of min, after which j can be promoted to a double by normal standard conversion rules (because the function min<double>(const double&, const double&) would then exist). Comment

You might be tempted to require two parameters for min, allowing the types of the arguments to be independent, like this:

template<typename T, typename U>

const T& min(const T& a, const U& b) {

  return (a < b) ? a : b;

}

 

This is often a good strategy, but in this case it is problematic because min must return a value, and there is no satisfactory way to determine which type it should be (T or U?). Comment

If the return type of a function template is an independent template parameter, you must always specify its type explicitly when you call it, since there is no argument from which to deduce it. Such is the case with the fromString template below. Comment

//: C05:StringConv.h

#ifndef STRINGCONV_H

#define STRINGCONV_H

// Function templates to convert to and from strings

#include <string>

#include <sstream>

 

template<typename T>

T fromString(const std::string& s) {

  std::istringstream is(s);

  T t;

  is >> t;

  return t;

}

 

template<typename T>

std::string toString(const T& t) {

  std::ostringstream s;

  s << t;

  return s.str();

}

#endif // STRINGCONV_H ///:~

 

These function templates provide conversions to and from std:: string for any types that provide a stream inserter or extractor, respectively. Here’s a test program that includes the use of the standard library complex number type: Comment

//: C05:StringConvTest.cpp

#include "StringConv.h"

#include <iostream>

#include <complex>

using namespace std;

 

int main() {

  int i = 1234;

  cout << "i == \"" << toString(i) << "\"\n";

  float x = 567.89;

  cout << "x == \"" << toString(x) << "\"\n";

  complex<float> c(1.0, 2.0);

  cout << "c == \"" << toString(c) << "\"\n";

  cout << endl;

 

  i = fromString<int>(string("1234"));

  cout << "i == " << i << endl;

  x = fromString<float>(string("567.89"));

  cout << "x == " << x << endl;

  c = fromString< complex<float> >(string("(1.0,2.0)"));

  cout << "c == " << c << endl;

} ///:~

 

The output is what you’d expect: Comment

i == "1234"

x == "567.89"

c == "(1,2)"

 

i == 1234

x == 567.89

c == (1,2)

 

Notice that in each of the instantiations of fromString, the template parameter is specified in the call. If you have a function template with template parameters for the parameter types as well as the return types, it is important to declare the return type parameter first; otherwise you won’t be able to omit the type parameters for the function parameters. As an illustration, consider the following well-known function template:[51] Comment

//: C05:ImplicitCast.cpp

template<typename R, typename P>

R implicit_cast(const P& p) {

  return p;

}

 

int main() {

  int i = 1;

  float x = implicit_cast<float>(i);

  int j = implicit_cast<int>(x);

  // char* p = implicit_cast<char*>(i);

} ///:~

 

If you interchange R and P in the template parameter list near the top of the file, it will be impossible to compile this program because the return type will remain unspecified (since the first template parameter would be the function’s parameter type). The last line (which is commented out) is illegal because there is no standard conversion from int to char*; implicit_cast is for revealing in your code conversions that are allowed naturally. Comment

With a little care you can even deduce array dimensions. The following example has an array-initialization function template (init2) that does just that.

//: C05:ArraySize.cpp

#include <cstddef>

using std::size_t;

 

template<size_t R, size_t C, typename T>

void init1(T a[R][C]) {

  for (size_t i = 0; i < R; ++i)

    for (size_t j = 0; j < C; ++j)

      a[i][j] = T();

}

 

template<size_t R, size_t C, class T>

void init2(T (&a)[R][C]) {  // reference parameter

  for (size_t i = 0; i < R; ++i)

    for (size_t j = 0; j < C; ++j)

      a[i][j] = T();

}

 

int main() {

  int a[10][20];

  init1<10,20>(a);  // must specify

  init2(a);         // sizes deduced

} ///:~

 

Array dimensions are not passed as part of a function parameter’s type unless that parameter is passed by pointer or reference. The function template init2 declares a to be a reference to a two-dimensional array, so its dimensions R and C are deduced by the template facility, making init2 a handy way to initialize a two-dimensional array of any size. The template init1 does not pass the array by reference, so the sizes must be explicitly specified, although the type parameter can still deduced. Comment

Function template overloading

As with functions, you can overload function templates that have the same name. When the compiler processes a function call in a program, it has to decide which template or ordinary function is the “best” fit for the call. Assuming the existence of the min function template introduced earlier, let’s add some ordinary functions to the mix: Comment

//: C05:MinTest.cpp

#include <cstring>

#include <iostream>

using std::strcmp;

using std::cout;

using std::endl;

 

template<typename T> const T& min(const T& a, const T& b) {

  return (a < b) ? a : b;

}

const char* min(const char* a, const char* b) {

  return (strcmp(a, b) < 0) ? a : b;

}

double min(double x, double y) {

  return (x < y) ? x : y;

}

 

int main() {

  const char *s2 = "say \"Ni-!\"", *s1 = "knights who";

  cout << min(1, 2) << endl;      // 1: 1 (template)

  cout << min(1.0, 2.0) << endl;  // 2: 1 (double)

  cout << min(1, 2.0) << endl;    // 3: 1 (double)

  cout << min(s1, s2) << endl;    // 4: knights who (const

                                  //                 char*)

  cout << min<>(s1, s2) << endl;  // 5: say "Ni-!"

                                  //    (template)

} ///:~

 

In addition to the function template, this program defines two non-template functions: a C-style string version of min and a double version. If the template doesn’t exist at all, the call in line 1 above have invokes the double version of min because of the standard conversion from int to double. Since the template can generate an int version, however, that is considered a better match (of course!); so that’s what happens. The call in line 2 is an exact match for the double version, of course, and the call in line 3 also invokes the same function, implicitly converting 1 to 1.0. In line 4 the const char* version of min is called directly. In line 5 we force the compiler to use the template facility by appending empty angle brackets to the function name, whereupon it generates a const char* version from the template and uses it (which is verified by the wrong answer—it’s just comparing addresses![52]). If you’re wondering why we used using declarations in lieu of the using namespace std; directive, some compilers include headers behind the scenes that bring in std::min, which would conflict with our declarations of the name min. Comment

As stated above, you can overload templates of the same name, as long as they can be distinguished by the compiler. You could, for example, declare a min function template that processes three arguments:

template<typename T>

const T& min(const T& a, const T& b, const T& c);

 

Versions of this template will be generated only for calls to min( ) that have three arguments of the same type. Comment

Taking the address of a generated function template

In a number of situations you need to take the address of a function. For example, you may have a function that takes an argument of a pointer to another function. Of course, it’s possible that this other function might be generated from a template function, so you need some way to take that kind of address:[53] Comment

//: C05:TemplateFunctionAddress.cpp

// Taking the address of a function generated

// from a template.

 

template <typename T> void f(T*) {}

 

void h(void (*pf)(int*)) {}

 

template <typename T>

  void g(void (*pf)(T*)) {}

 

int main() {

  // Full type specification:

  h(&f<int>);

  // Type deduction:

  h(&f);

  // Full type specification:

  g<int>(&f<int>);

  // Type deduction:

  g(&f<int>);

  // Partial (but sufficient) specification

  g<int>(&f);

} ///:~

 

This example demonstrates a number of issues. First, even though you’re using templates, the signatures must match. The function h( ) takes a pointer to a function that takes an int* and returns void, and that’s what the template f produces. Second, the function that wants the function pointer as an argument can itself be a template, as in the case of the template g. Comment

In main( ) you can see that type deduction works here too. The first call to h( ) explicitly gives the template argument for f, but since h( ) says that it will only take the address of a function that takes an int*, that part can be deduced by the compiler. With g( ) the situation is even more interesting because two templates are involved. The compiler cannot deduce the type with nothing to go on, but if either f or g is given int, the rest can be deduced. Comment

An obscure issue arises when trying to pass the functions tolower or toupper, declared in <cctype>, as parameters. It is possible to use these in conjunction with the transform algorithm (which is covered in detail in the next chapter), for example, to convert a string to lower or upper case. Care must be taken, however, because there are multiple declarations for these functions. A naïve approach would is something like this: Comment

// The variable s is a std::string

transform(s.begin(), s.end(), s.begin(), tolower);

 

The transform algorithm applies its fourth parameter (tolower in this case) to each character in the string s and places the result in s itself, thus overwriting each character in s with its lower-case equivalent. As it is written, this statement may or may not work! It fails in the following context: Comment

#include <algorithm>

#include <cctype>

#include <iostream>

#include <string>

using namespace std;

 

int main() {

  string s("LOWER");

  transform(s.begin(),s.end(),s.begin(),tolower);

  cout << s << endl;

}

 

Even if your compiler let’s you get away with this, it is illegal. The reason is that the <iostream> header also makes available a two-argument version of tolower and toupper:

template <class charT> charT toupper(charT c,

                                     const locale& loc);

template <class charT> charT tolower(charT c,

                                     const locale& loc);

 

These function templates take a second argument of type locale. The compiler has no way of knowing whether it should use the one-argument version of tolower defined in <cctype> or the one mentioned above. You can solve this problem (almost!) with a cast in the call to transform, as follows: Comment

  transform(s.begin(),s.end(),s.begin()

            static_cast<int (*)(int)>(tolower));

 

(Recall that tolower and toupper traffic in int instead of char.) The cast above makes clear that the single-argument version of tolower is desired. Once again, this works with some compilers, but it is not required to. The reason, albeit obscure, is that a library implementation is allowed to give “C linkage” (meaning that the function name does not contain all the auxiliary information[54] that normal C++ functions do) to functions inherited from the C language. If this is the case, the cast fails, because transform is a C++ function template and expects its fourth argument to have C++ linkage—and a cast is not allowed to change the linkage. What a predicament! Comment

The solution is to place calls to tolower in an unambiguous context. For example, you could write a function, let’s call it strTolower( ), and place it in its own file without including <iostream>, like this: Comment

//: C05:StrTolower.cpp {O}

#include <algorithm>

#include <cctype>

#include <string>

using namespace std;

 

string strTolower(string s) {

  transform(s.begin(), s.end(), s.begin(), tolower);

  return s;

} ///:~

 

The header <iostream> is not involved here, and the compilers we use do not introduce the two-argument version of tolower in this context,[55] so there’s no problem. You can then use this function normally: Comment

//: C05:Tolower.cpp

//{L} StrTolower

#include <algorithm>

#include <cctype>

#include <iostream>

#include <string>

using namespace std;

string strTolower(string);

 

int main() {

  string s("LOWER");

  cout << strTolower(s) << endl;

} ///:~

 

Another solution is to write a wrapper function template that calls the correct version of tolower explicitly:

//: C05:ToLower2.cpp

#include <algorithm>

#include <cctype>

#include <iostream>

#include <string>

using namespace std;

 

template<class charT>

charT strTolower(charT c) {

  return tolower(c);  // one-arg version called

}

 

int main() {

  string s("LOWER");

  transform(s.begin(),s.end(),s.begin(),&strTolower<char>);

  cout << s << endl;

} ///:~

 

This version has the advantage that it can process both wide and narrow strings since the underlying character type is a template parameter. The C++ standards committee is working on modifying the language so that the first example (without the cast) will work, and some day these workarounds can be ignored.[56] Comment

Applying a function to an STL sequence

Suppose you want to take an STL sequence container (which you’ll learn more about in subsequent chapters; for now we can just use the familiar vector) and apply a function to all the objects it contains. Because a vector can contain any type of object, you need a function that works with any type of vector: Comment

//: C05:ApplySequence.h

// Apply a function to an STL sequence container

 

// 0 arguments, any type of return value:

template<class Seq, class T, class R>

void apply(Seq& sq, R (T::*f)()) {

  typename Seq::iterator it = sq.begin();

  while(it != sq.end()) {

    ((*it)->*f)();

    it++;

  }

}

 

// 1 argument, any type of return value:

template<class Seq, class T, class R, class A>

void apply(Seq& sq, R(T::*f)(A), A a) {

  typename Seq::iterator it = sq.begin();

  while(it != sq.end()) {

    ((*it)->*f)(a);

    it++;

  }

}

 

// 2 arguments, any type of return value:

template<class Seq, class T, class R,

         class A1, class A2>

void apply(Seq& sq, R(T::*f)(A1, A2),

    A1 a1, A2 a2) {

  typename Seq::iterator it = sq.begin();

  while(it != sq.end()) {

    ((*it)->*f)(a1, a2);

    it++;

  }

}

// Etc., to handle maximum likely arguments ///:~

 

The apply( ) function template takes a reference to the container class and a pointer-to-member for a member function of the objects contained in the class. It uses an iterator to move through the Stack and apply the function to every object. Comment

Notice that there are no STL header files (or any header files, for that matter) included in applySequence.h, so it is actually not limited to use with an STL container. However, it does make assumptions (primarily, the name and behavior of the iterator) that apply to STL sequences. Comment

You can see there is more than one version of apply( ), further illustrating overloading of function templates. Although these templates allow any type of return value (which is ignored, but the type information is required to match the pointer-to-member), each version takes a different number of arguments, and because it’s a template, those arguments can be of any type. The only limitation here is that there’s no “super template” to create templates for you; you must decide how many arguments will ever be required. Comment

To test the various overloaded versions of apply( ), the class Gromit[57] is created containing functions with different numbers of arguments: Comment

//: C05:Gromit.h

// The techno-dog. Has member functions

// with various numbers of arguments.

#include <iostream>

 

class Gromit {

  int arf;

public:

  Gromit(int arf = 1) : arf(arf + 1) {}

  void speak(int) {

    for(int i = 0; i < arf; i++)

      std::cout << "arf! ";

    std::cout << std::endl;

  }

  char eat(float) {

    std::cout << "chomp!" << std::endl;

    return 'z';

  }

  int sleep(char, double) {

    std::cout << "zzz..." << std::endl;

    return 0;

  }

  void sit() {

    std::cout << " Sitting...)" << std::endl;

  }

}; ///:~

 

Now you can use the apply( ) template functions to apply the Gromit member functions to a vector<Gromit*>, like this: Comment

//: C05:ApplyGromit.cpp

// Test ApplySequence.h

#include <cstddef>

#include <iostream>

#include <vector>

#include "ApplySequence.h"

#include "Gromit.h"

using namespace std;

 

int main() {

  vector<Gromit*> dogs;

  for(size_t i = 0; i < 5; i++)

    dogs.push_back(new Gromit(i));

  apply(dogs, &Gromit::speak, 1);

  apply(dogs, &Gromit::eat, 2.0f);

  apply(dogs, &Gromit::sleep, 'z', 3.0);

  apply(dogs, &Gromit::sit);

  for (size_t i = 0; i < dogs.size(); ++i)

    delete dogs[i];

} ///:~

 

Although the definition of apply( ) is somewhat complex and not something you’d ever expect a novice to understand, its use is remarkably clean and simple, and a novice could easily use it knowing only what it is intended to accomplish, not how. This is the type of division you should strive for in all your program components: The tough details are all isolated on the designer’s side of the wall. Users are concerned only with accomplishing their goals and don’t see, know about, or depend on details of the underlying implementation. We’ll explore even more flexible ways to apply functions to sequences in the next chapter. Comment

Partial ordering of function templates

We mentioned earlier that an ordinary function overload of min( ) is preferable to using the template. If a function already exists to match a function call, why generate another? In the absence of ordinary functions, however, it is possible that overloaded function templates can lead to ambiguities. To minimize the chances of this, an ordering is defined for function templates that chooses the most specialized template, if such exists. A function template is considered more specialized than another if every possible list of arguments that matches it also matches the other, but not the other way around. Consider the following function template declarations, taken from an example in the C++ standard document: Comment

template<class T> void f(T);

template<class T> void f(T*);

template<class T> void f(const T*);

 

The first template can be matched with any type. The second template is more specialized than the first because only pointer types match it. In other words, you can look upon the set of possible calls that match the second template as a subset of the first. A similar relationship exists between the second and third template declarations above: the third can only be called for pointers to const, but the second accommodates any pointer type. The following program illustrates these rules. Comment

//: C05:PartialOrder.cpp

// Reveals Ordering of Function Templates

#include <iostream>

using namespace std;

 

template<class T>

void f(T) {

  cout << "T\n";

}

 

template<class T>

void f(T*) {

  cout << "T*\n";

}

 

template<class T>

void f(const T*) {

  cout << "const T*\n";

}

 

int main() {

  f(0);            // T

  int i = 0;

  f(&i);           // T*

  const int j = 0;

  f(&j);           // const T*

} ///:~

 

The call f(&i) certainly matches the first template, but since the second is more specialized, it is called. The third can’t be called in this case since the pointer is not a pointer to const. The call f(&j) matches all three templates (for example, T would be const int in the second template), but again, the third template is more specialized, so it is used instead. Comment

If there is no “most specialized” template among a set of overloaded function templates, an ambiguity remains and the compiler will report an error. That is why this feature is called a “partial ordering”—it may not be able to resolve all possibilities. Similar rules exist for class templates (see the section “Partial specialization” below). Comment

Template specialization

The term specialization has a specific, template-related meaning in C++. A template definition is, by its very nature, a generalization, because it describes a family of functions or classes in general terms. When template arguments are supplied, the result is a specialization of the template, because it fixes a unique instance out of the many possible instances of the family of functions or classes. The min function template seen at the beginning of this chapter is a generalization of a minimum-finding function, because the type of its parameters is not specified. When you supply the type for the template parameter, whether explicitly or implicitly via argument deduction, the resultant code generated by the compiler (for example, min<int>) is a specialization of the template. The code generated is also considered an instantiation of the template, of course, as are all code bodies generated by the template facility. Comment

Explicit specialization

You can also provide the code yourself for a given template specialization, should the need arise. Providing your own template specializations is often needed with class templates, but we will begin with the min function template to introduce the syntax. Comment

Recall that in MinTest.cpp earlier in this chapter we introduced the following ordinary function:

const char* min(const char* a, const char* b) {

  return (strcmp(a, b) < 0) ? a : b;

}

 

This was so that a call to min would compare strings and not addresses. Although it would provide no advantage in this case, we could define instead a const char* specialization for min, as in the following program: Comment

//: C05:MinTest2.cpp

#include <cstring>

#include <iostream>

using std::strcmp;

using std::cout;

using std::endl;

 

template<class T> const T& min(const T& a, const T& b) {

  return (a < b) ? a : b;

}

// An explicit specialization of the min template

template<>

const char* const& min<const char*>(const char* const& a,

                                    const char* const& b) {

  return (strcmp(a, b) < 0) ? a : b;

}

 

int main() {

  const char *s2 = "say \"Ni-!\"", *s1 = "knights who";

  cout << min(s1, s2) << endl;

  cout << min<>(s1, s2) << endl;

} ///:~

 

The “template<>” prefix tells the compiler that what follows is a specialization of a template. The type for the specialization must appear in angle brackets immediately following the function name, as it normally would in an explicitly-specified call. Note that we carefully substitute const char* for T in the explicit specialization. Whenever the original template specifies const T, that const modifies the whole type T. It is the pointer to a const char* that is const. Therefore we must write const char* const in place of const T in the specialization. When the compiler sees a call to min with const char* arguments in the program, it will instantiate our const char* version of min so it can be called. The two calls to min in this program call the same specialization of min. Comment

Explicit specializations tend to be more useful for class templates than for function templates. When you provide a full specialization for a class template, though, you may need to implement all the member functions. This is because you are providing a separate class, and client code may expect the complete interface to be implemented. Comment

The standard library has an explicit specialization for vector when it is used to hold objects of type bool. As you saw earlier in this chapter, the declaration for the primary vector class template is:

template <class T, class Allocator = allocator<T> >

class vector {…};

 

To specialize for objects of type bool, you could declare an explicit specialization as follows:

template <>

class vector< bool, allocator<bool> > {…};

 

Again, this is quickly recognized as a full, explicit specialization because of the template<> prefix and because all the primary template’s parameters are satisfied by the argument list appended to the class name. The purpose for vector<bool> is to allow library implementations to save space by packing bits into integers.[58] Comment

It turns out that vector<bool> is a little more flexible than we have described, as seen in the next section.

Partial Specialization

Class templates can also be partially specialized, meaning that at least one of the template parameters is left “open” in some way in the specialization. This is actually what vector<bool> does; it specifies the object type (bool), but leaves the allocator type unspecified. Here is the actual declaration of vector<bool>:

template <class Allocator>

class vector<bool, Allocator>;

 

You can recognize a partial specialization because non-empty parameter lists appear in angle brackets both after the template keyword (the unspecified parameters) and after the class (the specified arguments). Because of the way vector<bool> is defined, a user can provide a custom allocator type, even though the contained type of bool is fixed. In other words, specialization, and partial specialization in particular, constitute a sort of “overloading” for class templates. Comment

Partial ordering of class templates

The rules that determine which template is selected for instantiation are similar to the partial ordering for function templates—the “most specialized” template is selected. An illustration follows. (The string in each f( ) member function below explains the role of each template definition.) Comment

//: C05:PartialOrder2.cpp

// Reveals partial ordering of class templates

#include <iostream>

using namespace std;

 

template<class T, class U> class C {

public:

  void f() {

    cout << "Primary Template\n";

  }

};

 

template<class U> class C<int, U> {

public:

  void f() {

    cout << "T == int\n";

  }

};

 

template<class T> class C<T, double> {

public:

  void f() {

    cout << "U == double\n";

  }

};

 

template<class T, class U> class C<T*, U> {

public:

  void f() {

    cout << "T* used \n";

  }

};

 

template<class T, class U> class C<T, U*> {

public:

  void f() {

    cout << "U* used\n";

  }

};

 

template<class T, class U> class C<T*, U*> {

public:

  void f() {

    cout << "T* and U* used\n";

  }

};

 

template<class T> class C<T, T> {

public:

  void f() {

    cout << "T == U\n";

  }

};

 

int main() {

  C<float, int>().f();    // 1: Primary template

  C<int, float>().f();    // 2: T == int

  C<float, double>().f(); // 3: U == double

  C<float, float>().f();  // 4: T == U

  C<float*, float>().f(); // 5: T* used [T is float]

  C<float, float*>().f(); // 6: U* used [U is float]

  C<float*, int*>().f();  // 7: T* and U* used [float,int]

 

  // The following are ambiguous:

//   8: C<int, int>().f();

//   9: C<double, double>().f();

//  10: C<float*, float*>().f();

//  11: C<int, int*>().f();

//  12: C<int*, int*>().f();

} ///:~

 

As you can see, you can partially specify template parameters according to whether they are pointer types, or whether they are equal. When the T* specialization is used, such as is the case in line 5, T itself is not the top-level pointer type that was passed—it is the type that the pointer refers to (float, in this case). The T* specification is a pattern to allow matching against pointer types. If you were to use int** as the first template argument, T would be int*. Line 8 is ambiguous because having the first parameter as an int vs. having the two parameters equal are independent issues—one is not more specialized than the other. Similar logic applies to lines 9 through 12. Comment

A practical example

You can easily derive from a class template, and you can create a new template that instantiates and inherits from an existing template. If the vector template does most everything you want, for example, but in a certain application you’d also like a version that can sort itself, you can easily reuse the vector code. The following example derives from vector<T> and adds sorting. Comment

//: C05:Sorted.h

// Template specialization

#ifndef SORTED_H

#define SORTED_H

#include <string>

#include <vector>

 

template<class T>

class Sorted : public std::vector<T> {

public:

  void sort();

};

 

template<class T>

void Sorted<T>::sort() { // A simple sort

  for(int i = size(); i > 0; i--)

    for(int j = 1; j < i; j++)

      if(at(j-1) > at(j)) {

        T t = at(j-1);

        at(j-1) = at(j);

        at(j) = t;

      }

}

 

// Partial specialization for pointers:

template<class T>

class Sorted<T*> : public std::vector<T*> {

public:

  void sort();

};

 

template<class T>

void Sorted<T*>::sort() {

  for(int i = size(); i > 0; i--)

    for(int j = 1; j < i; j++)

      if(*at(j-1) > *at(j)) {

        T* t = at(j-1);

        at(j-1) = at(j);

        at(j) = t;

      }

}

 

// Full specialization for char*

// (Made inline here for convenience –

//  normally would place function body in separate file

//  and only leave declaration here)

template<>

inline void Sorted<char*>::sort() {

  for(int i = size(); i > 0; i--)

    for(int j = 1; j < i; j++)

      if(std::strcmp(at(j-1), at(j)) > 0) {

        char* t = at(j-1);

        at(j-1) = at(j);

        at(j) = t;

      }

}

#endif // SORTED_H ///:~

 

The Sorted template imposes a restriction on all but one of the classes for which it is instantiated: they must contain a > operator. It works correctly only with non-pointer objects (including objects of built-in types). The full specialization compares the elements using strcmp( ) to sort vectors of char* according to the null-terminated strings to which they refer. Comment

Here’s a driver for Sorted.h that uses the randomizer introduced earlier in the chapter: Comment

//: C05:Sorted.cpp

//{bor} (because of bitset in Urand.h)

// Testing template specialization

#include <cstddef>

#include <iostream>

#include "Sorted.h"

#include "Urand.h"

using namespace std;

 

#define asz(a) (sizeof a / sizeof a[0])

 

char* words[] = {

  "is", "running", "big", "dog", "a",

};

char* words2[] = {

  "this", "that", "theother",

};

 

int main() {

  Sorted<int> is;

  Urand<47> rand;

  for(size_t i = 0; i < 15; i++)

    is.push_back(rand());

  for(size_t i = 0; i < is.size(); i++)

    cout << is[i] << ' ';

  cout << endl;

  is.sort();

  for(size_t i = 0; i < is.size(); i++)

    cout << is[i] << ' ';

  cout << endl;

 

  // Uses the template partial specialization:

  Sorted<string*> ss;

  for(size_t i = 0; i < asz(words); i++)

    ss.push_back(new string(words[i]));

  for(size_t i = 0; i < ss.size(); i++)

    cout << *ss[i] << ' ';

  cout << endl;

  ss.sort();

  for(size_t i = 0; i < ss.size(); i++) {

    cout << *ss[i] << ' ';

    delete ss[i];

  }

  cout << endl;

 

  // Uses the full char* specialization:

  Sorted<char*> scp;

  for(size_t i = 0; i < asz(words2); i++)

    scp.push_back(words2[i]);

  for(size_t i = 0; i < scp.size(); i++)

    cout << scp[i] << ' ';

  cout << endl;

  scp.sort();

  for(size_t i = 0; i < scp.size(); i++)

    cout << scp[i] << ' ';

  cout << endl;

} ///:~

 

Each of the template instantiations above uses a different version of the template. Sorted<int> uses the primary template. Sorted<string*> uses the partial specialization for pointers. Last, Sorted<char*> uses the full specialization for char*. Without this full specialization, you could be fooled into thinking that things were working correctly because the words array would still sort out to “a big dog is running” since the partial specialization would end up comparing the first character of each array. However, words2 would not sort correctly. Comment

Preventing template code bloat

Whenever a class template is instantiated, the code from the class definition for the particular specialization is generated, along with all the member functions that are called in the program. Only the member functions that are actually called are generated. This is a good thing, as the following program makes clear: Comment

//: C05:DelayedInstantiation.cpp

// Member functions of class templates are not

// instantiated until they're needed.

 

class X {

public:

  void f() {}

};

 

class Y {

public:

  void g() {}

};

 

template <typename T> class Z {

  T t;

public:

  void a() { t.f(); }

  void b() { t.g(); }

};

 

int main() {

  Z<X> zx;

  zx.a(); // Doesn't create Z<X>::b()

  Z<Y> zy;

  zy.b(); // Doesn't create Z<Y>::a()

} ///:~

 

Here, even though the template Z purports to use both f( ) and g( ) member functions of T, the fact that the program compiles shows you that it only generates Z<X>::a( ) when it is explicitly called for zx. (If Z<X>::b( ) were also generated at the same time, a compile-time error message would be generated, because it would attempt to call X::g( ), which doesn’t exist.) Similarly, the call to zy.b( ) doesn’t generate Z<Y>::a( ). As a result, the Z template can be used with X and Y; whereas if all the member functions were generated when the class was first created the use of many templates would significantly limited. Comment

Suppose you have a container, a Stack say, and you use specializations for int, int*, and char*. Three versions of Stack code will be generated and linked as part of your program. One of the reasons for using a template in the first place is so you don’t have to replicate code by hand; but code still gets replicated—it’s just the compiler that does it instead of you. You can factor the bulk of the implementation for storing pointer types into a single class by using a combination of full and partial specialization. The key is to fully specialize for void* and then derive all other pointer types from the void* implementation so the common code can be shared. The program below illustrates this technique. Comment

//: C05:Nobloat.h

// Shares code for storing pointers in a Stack

#ifndef NOBLOAT_H

#define NOBLOAT_H

#include <cassert>

#include <cstddef>

#include <cstring>

 

// The primary template

template<class T>

class Stack {

  T* data;

  std::size_t count;

  std::size_t capacity;

  enum {INIT = 5};

public:

  Stack() {

    count = 0;

    capacity = INIT;

    data = new T[INIT];

  }

  void push(const T& t) {

    if (count == capacity) {

      // Grow array store

      std::size_t newCapacity = 2*capacity;

      T* newData = new T[newCapacity];

      for (size_t i = 0; i < count; ++i)

        newData[i] = data[i];

      delete [] data;

      data = newData;

      capacity = newCapacity;

    }

    assert(count < capacity);

    data[count++] = t;

  }

  void pop() {

    assert(count > 0);

    --count;

  }

  T top() const {

    assert(count > 0);

    return data[count-1];

  }

  size_t size() const {return count;}

};

 

// Full specialization for void*

template<>

class Stack<void *> {

  void** data;

  std::size_t count;

  std::size_t capacity;

  enum {INIT = 5};

public:

  Stack() {

    count = 0;

    capacity = INIT;

    data = new void*[INIT];

  }

  void push(void* const & t) {

    if (count == capacity) {

      std::size_t newCapacity = 2*capacity;

      void** newData = new void*[newCapacity];

      std::memcpy(newData, data, count*sizeof(void*));

      delete [] data;

      data = newData;

      capacity = newCapacity;

    }

    assert(count < capacity);

    data[count++] = t;

  }

  void pop() {

    assert(count > 0);

    --count;

  }

  void* top() const {

    assert(count > 0);

    return data[count-1];

  }

  std::size_t size() const {return count;}

};

 

// Partial specialization for other pointer types

template<class T>

class Stack<T*> : private Stack<void *> {

  typedef Stack<void *> Base;

public:

  void push(T* const & t) {Base::push(t);}

  void pop() {Base::pop();}

  T* top() const {return static_cast<T*>(Base::top());}

  std::size_t size() {return Base::size();}

};

#endif // NOBLOAT_H ///:~ Comment

 

This simple stack expands as it fills its capacity. The void* specialization stands out as a full specialization by virtue of the template<> prefix (that is, the template parameter list is empty). As mentioned earlier, it is necessary to implement all member functions in a class template specialization. The savings occurs with all other pointer types. The partial specialization for other pointer types derives from Stack<void*> privately, since we are merely using Stack<void*> for implementation purposes, and do not wish to expose any of its interface directly to the user. The member functions for each pointer instantiation are small forwarding functions to the corresponding functions in Stack<void*>. Hence, whenever a pointer type other than void* is instantiated, it is a fraction of the size it would have been had the primary template alone been used.[59] Here is a driver program: Comment

//: C05:NobloatTest.cpp

#include <iostream>

#include <string>

#include "Nobloat.h"

using namespace std;

 

template<class StackType>

void emptyTheStack(StackType& stk) {

  while (stk.size() > 0) {

    cout << stk.top() << endl;

    stk.pop();

  }

}

// An overload for emptyTheStack (not a specialization!)

template<class T>

void emptyTheStack(Stack<T*>& stk) {

  while (stk.size() > 0) {

    cout << *stk.top() << endl;

    stk.pop();

  }

}

 

int main() {

  Stack<int> s1;

  s1.push(1);

  s1.push(2);

  emptyTheStack(s1);

 

  Stack<int *> s2;

  int i = 3;

  int j = 4;

  s2.push(&i);

  s2.push(&j);

  emptyTheStack(s2);

} ///:~

 

For convenience we have included two emptyStack function templates. Since function templates don’t support partial specialization, we provide overloaded templates. The second version of emptyStack is more specialized than the first, so it is chosen whenever pointer types are used. Three class templates are instantiated in this program: Stack<int>, Stack<void*>, and Stack<int*>. Stack<void*> is implicitly instantiated because Stack<int*> derives from it. If a program uses instantiations for many pointer types, the savings in code size over just using a single Stack template can be substantial. Comment

Name lookup issues

When the compiler encounters an identifier it must determine the type and scope (and in the case of variables, the lifetime) of the entity the identifier represents. This is common knowledge among software developers, but the plot thickens when templates are involved. Because not everything is known about a template when its definition is first seen by the compiler, the compiler must hold off until the template is instantiated before it can determine whether it is being used properly. This predicament leads to a two-phase process for template compilation. Comment

Names in templates

In the first phase the compiler parses the template definition looking for obvious syntax errors and resolving all the names it can. The names it can resolve during parsing are those that do not depend on template parameters, which the compiler takes care of through normal name lookup means (and also through argument-dependent lookup, discussed below, if necessary). The names it can’t resolve are the so-called dependent names, which are names that in some way depend on template parameters. These can’t be resolved until the template is instantiated with its actual arguments. Instantiation, therefore, is the second phase of template compilation. During this second phase, the compiler determines whether an explicit specialization of the template in question needs to be used instead of the primary template. Comment

Before you see an example, two more terms need to be defined. A qualified name is a name with a class-name prefix, a name with an object name and a dot operator, or a name with a pointer to an object and an arrow operator. Examples of qualified names are found in the following expressions: Comment

MyClass::f();

x.f();

p->f();

 

We have used qualified names many times in this book, and most recently in connection with the typename keyword. These are called qualified names because the target names (like f above) are explicitly associated with a class, which tells the compiler where to look for the declarations of those names. Comment

The other term to discuss is argument-dependent lookup[60] (ADL), which is a technique originally designed to simplify using non-member function calls (including operators) declared in namespaces. Consider the following simple code excerpt: Comment

#include <iostream>

#include <string>

//…

  std::string s("hello");

  std::cout << s << std::endl;

 

Note that there is no using namespace std; directive, which is the typical practice inside header files, for example. Without such a directive, it is necessary to use the std:: qualifier on the items that are in the std namespace. We have, however, not qualified everything from std that we are using. Can you see what we have left unqualified? Comment

We have not specified which operator functions to use. We want the following to happen, but we don’t want to have to type it!

std::operator<<(std::operator<<(std::cout,s),std::endl);

 

To make the original output statement work as desired, ADL specifies that when an unqualified function call appears and its declaration is not in (normal) scope, the namespaces (or class scopes) of each of its arguments are searched for a matching function declaration. In the original statement, the first function call is: Comment

operator<<(std::cout, s);

 

Since there is no such function in scope in our original excerpt, the compiler notes that this function’s first argument (std::cout) is in the namespace std; so it adds that namespace to the list of scopes to search for a unique function that best matches the signature operator<<(std::ostream&, std::string). It finds this function declared in the std namespace via the <string> header, so that is the function that is called. Namespaces would be very inconvenient without ADL. (But note that, in general, ADL brings in all declarations of the name in question from all eligible namespaces—if there is no best match, an ambiguity will result.) To turn off ADL, you can enclose the function name in parentheses: Comment

(f)(x, y);  // ADL suppressed

 

Now consider the following program, from a presentation by Herb Sutter:

// Lookup.cpp

// Only works on EDG and Metrowerks (special option)

#include <iostream>

using std::cout;

 

void f(double) { cout << "f(double)\n"; }

 

template<class T>

class X {

public:

  void g() { f(1); }

};

 

void f(int) { cout << "f(int)\n"; }

 

int main() {

  X<int>().g();

}

 

The only compiler we have that gets this correct right out of the box is the Edison Design Group front end. (A number of compilers use this front end, including Comeau C++.) The output should be: Comment

f(double)

 

because f is a non-dependent name that can be resolved early by looking in the context where the template is defined, when only f(double) is in scope. Unfortunately, there is a lot of code in existence that depends on the non-standard behavior of binding the call to f(1) inside g( ) to the latter f(int), so compiler writers have been reluctant to make the change. (Some compilers, such as the Metrowerks compiler, have an option to enable the correct lookup behavior.) Comment

Here is a more detailed example, also based on an example from Herb Sutter:

//: C05:Lookup2.cpp

//{-bor}

//{-g++}

// Microsoft: use option –Za (ANSI mode)

#include <iostream>

#include <typeinfo>

using std::cout;

using std::endl;

 

void g() { cout << "global g()\n"; }

 

template <class T>

class Y {

public:

  void g() { cout << "Y<" << typeid(T).name()

                  << ">::g()\n"; }

  void h() { cout << "Y<" << typeid(T).name()

                  << ">::h()\n"; }

  typedef int E;

};

 

typedef double E;

 

template<class T>

void swap(T& t1, T& t2) {

  cout << "global swap\n";

  T temp = t1;

  t1 = t2;

  t2 = temp;

}

 

template<class T>

class X : public Y<T> {

public:

  E f() {

    g();

    this->h();

    T t1 = T(), t2 = T(1);

    cout << t1 << endl;

    swap(t1, t2);

    std::swap(t1, t2);

    cout << typeid(E).name() << endl;

    return E(t2);

  }

};

 

int main() {

  X<int> x;

  cout << x.f() << endl;

} ///:~

 

The output from this program should be:

global g()

Y<int>::h()

0

global swap

double

1

 

Looking at the declarations inside of X::f( ), we observe the following:

·         The return type of X::f( ), which is E, is not a dependent name, so it is looked up when the template is parsed, and the typedef naming E as a double is found. This may seem strange, since with non-template classes the declaration of E in the base class would be found first, but those are the rules. (The base class, Y, is a dependent base class, so it can’t be searched at template definition time). Comment

·         The call to g( ) is also non-dependent, since there is no mention of T. If g had parameters that were of class type of defined in another namespace, ADL would take over, since there is no g with parameters in scope. As it is, this call matches the global declaration of g( ). Comment

·         The call this->h( ) is a qualified name, and the object that qualifies it (this) refers to the current object, which is of type X, which in turn depends on the name Y<T> by inheritance. There is no function h( ) inside of X, so lookup will naturally want to search the scope of X’s base class, Y<T>. Since this is a dependent name, it is looked up at instantiation time, when Y<T> can be reliably known (including any potential specializations that might have been written after the definition of X); so it calls Y<int>::h( ). Comment

·         The declarations of t1 and t2 are dependent, of course. Comment

·         The call to operator<<(cout, t1) is dependent, since t1 is of type T. This is looked up later when T is int, and the inserter for int is found in std. Comment

·         The unqualified call to swap( ) is dependent because its arguments are of type T. This ultimately causes a global swap(int&, int&) to be instantiated, of course. Comment

·         The qualified call to std::swap( ) is not dependent, because std is a fixed namespace; so the compiler knows to look there for the proper declaration. (The qualifier on the left of the “::” must mention a template parameter for a qualified name to be considered dependent.) The std::swap( ) function template later generates std::swap(int&, int&), at instantiation time. No more dependent names remain in X<T>::f( ). Comment

To clarify and summarize: name lookup is done at the point of instantiation if the name is dependent, except that for unqualified dependent names the normal name lookup is also attempted early, at the point of definition. All non-dependent names in templates are looked up early, at the time the template definition is parsed. (If necessary, another lookup occurs at instantiation time, when the type of the actual argument is known.) Comment

(Whew!) If you have studied this example to the point that you understand it, prepare yourself for yet another surprise in the next section when friend declarations enter the picture.

Templates and friends

A friend function declaration inside a class allows a non-member function to access non-public members of that class. If the friend function name is qualified, it will of course be found in the namespace or class that qualifies it. If it is unqualified, however, the compiler must make an assumption about where the definition of the friend function will be, since all identifiers must have a unique scope. The expectation is that the function will be defined in the nearest enclosing namespace (non-class) scope that contains the class granting friendship. Often this is just the global scope. The following non-template example clarifies this issue. Comment

//: C05:FriendScope.cpp

#include <iostream>

using namespace std;

 

class Friendly {

  int i;

public:

  Friendly(int theInt) { i = theInt; }

  friend void f(const Friendly&); // needs global def.

  void g() { f(*this); }

};

 

void h() {

  f(Friendly(1));  // uses ADL

}

 

void f(const Friendly& fo) {  // definition of friend

  cout << fo.i << endl;

}

 

int main() {

  h();               // prints 1

  Friendly(2).g();   // prints 2

} ///:~

 

The declaration of f( ) inside the Friendly class is unqualified, so the compiler will expect to be able to eventually link that declaration to a definition at file scope (the namespace scope that contains Friendly in this case). That definition appears after the definition of the function h( ). The linking of the call to f( ) inside h( ) to the same function is a separate matter, however. This is resolved by ADL. Since the argument of f( ) inside h( ) is a Friendly object, the Friendly class is searched for a declaration of f( ), which succeeds. If the call were f(1) instead (which makes some sense since 1 can be implicitly converted to Friendly(1)), the call should fail, since there is no hint of where the compiler should look for the declaration of f( ). The EDG compiler correctly complains that f is undefined in that case. Comment

Now suppose that Friendly and f are both templates, as in the following program.

//: C05:FriendScope2.cpp

#include <iostream>

using namespace std;

// Necessary forward declarations

template<class T>

class Friendly;

template<class T>

void f(const Friendly<T>&);

 

template<class T>

class Friendly {

  T t;

public:

  Friendly(const T& theT) : t(theT) {}

  friend void f<>(const Friendly<T>&);

  void g() { f(*this); }

};

 

void h() {

  f(Friendly<int>(1));

}

 

template<class T>

void f(const Friendly<T>& fo) {

  cout << fo.t << endl;

}

 

int main() {

  h();

  Friendly<int>(2).g();

} ///:~

 

First notice that angle brackets in the declaration of f inside Friendly. This is necessary to tell the compiler that f is a template. Otherwise, the compiler will look for an ordinary function named f and of course not find it. We could have inserted the template parameter (<T>) in the brackets, but it is easily deduced from the declaration. Comment

The forward declaration of the function template f before the class definition is necessary, even though it wasn’t in the previous example when f was a not a template; the language specifies that friend function templates must be previously declared. Of course, to properly declare f, Friendly must also have been declared, since f takes a Friendly argument, hence the forward declaration of Friendly in the beginning. We could have placed the full definition of f right after the initial declaration of Friendly instead of separating its definition and declaration, but we chose instead to leave it in a form that more closely resembles the previous example. Comment

One last option remains for using friends inside templates: fully define them inside the class itself. Here is how the previous example would appear with that change:

//: C05:FriendScope3.cpp

//{-bor}

// Microsoft: use the -Za (ANSI-compliant) option

#include <iostream>

using namespace std;

 

template<class T>

class Friendly {

  T t;

public:

  Friendly(const T& theT) : t(theT) {}

  friend void f(const Friendly<T>& fo) {

    cout << fo.t << endl;

}

  void g() { f(*this); }

};

 

void h() {

  f(Friendly<int>(1));

}

 

int main() {

  h();

  Friendly<int>(2).g();

} ///:~

 

There is an important difference between this and the previous example: f is not a template here, but is an ordinary function. (Remember that angle brackets were necessary before to imply that f was a template.) Every time the Friendly class template is instantiated, a new, ordinary function overload is created that takes an argument of the current Friendly specialization. This is what Dan Saks has called “making new friends.”[61] This is the most convenient way to define friend functions for templates. Comment

To make this perfectly clear, suppose you have a class template to which you want to add non-member operators as friends. Here is a class template that simply holds a generic value:

template<class T>

class Box {

  T t;

public:

  Box(const T& theT) : t(theT) {}

};

 

Without understanding the likes of the previous examples in this section, novices find themselves frustrated because they can’t get a simple stream output inserter to work. If you don’t define your operators inside the definition of Box, you must provide the forward declarations we showed earlier: Comment

//: C05:Box1.cpp

// Defines template operators

#include <iostream>

using namespace std;

// Forward declarations

template<class T>

class Box;

template<class T>

Box<T> operator+(const Box<T>&, const Box<T>&);

template<class T>

ostream& operator<<(ostream&, const Box<T>&);

 

template<class T>

class Box {

  T t;

public:

  Box(const T& theT) : t(theT) {}

  friend Box operator+<>(const Box<T>&, const Box<T>&);

  friend ostream& operator<< <>(ostream&, const Box<T>&);

};

 

template<class T>

Box<T> operator+(const Box<T>& b1, const Box<T>& b2) {

  return Box<T>(b1.t + b2.t);

}

 

template<class T>

ostream& operator<<(ostream& os, const Box<T>& b) {

  return os << '[' << b.t << ']';

}

 

int main() {

  Box<int> b1(1), b2(2);

  cout << b1 + b2 << endl;  // [3]

//  cout << b1 + 2 << endl; // no implicit conversions!

} ///:~

 

Here we are defining both an addition operator and an output stream operator. The main program reveals a disadvantage of this approach: you can’t depend on implicit conversions (see the expression b1 + 2) because templates do not provide them. Using the in-class, non-template approach is shorter and more robust: Comment

//: C05:Box2.cpp

// Defines non-template operators

#include <iostream>

using namespace std;

 

template<class T>

class Box {

  T t;

public:

  Box(const T& theT) : t(theT) {}

  friend Box operator+(const Box<T>& b1,

                         const Box<T>& b2) {

    return Box<T>(b1.t + b2.t);

  }

  friend ostream& operator<<(ostream& os,

                               const Box<T>& b) {

    return os << '[' << b.t << ']';

  }

};

 

int main() {

  Box<int> b1(1), b2(2);

  cout << b1 + b2 << endl;  // [3]

  cout << b1 + 2 << endl;   // [3]

} ///:~

 

Because the operators are normal functions (overloaded for each specialization of Box—just int in this case, of course), implicit conversions are applied as normal; so the expression b1 + 2 is valid. Comment

Friend templates

You can be precise as to which specializations of a template are friends of a class. In the examples in the previous section, only the specialization of the function template f with the same type that specialized Friendly was a friend. For example, only the specialization f<int>(const Friendly<int>&) is a friend of the class Friendly<int>. This was accomplished by using the template parameter for Friendly to specialize f in its friend declaration. If we had wanted to, we could have made a particular, fixed specialization of f a friend to all instances of Friendly, like this: Comment

// Inside Friendly:

  friend void f<>(const Friendly<double>&);

 

By using double instead of T, the double specialization of f has access to the non-public members of any Friendly specialization. The specialization f<double>( ) still isn’t instantiated unless it is explicitly called, of course. Comment

Likewise, if you were to declare a non-template function with no parameters dependent on T, that single function would be a friend to all instances of Friendly: Comment

// Inside of Friendly:

  friend void g(int);  // g(int) befriends all Friendly’s

 

As always, since g(int) is unqualified, it must be defined at file scope (the namespace scope containing Friendly). Comment

It is also possible to arrange for all specializations of f to be friends for all specializations of Friendly, with a so-called friend template, as follows:

template<class T>

class Friendly {

  template<class U> friend void f<>(const Friendly<U>&);

 

Since the template argument for the friend declaration is independent of T, any combination of T and U is allowed, achieving the friendship objective. Like member templates, friend templates can appear within non-template classes as well. Comment

Template programming idioms

Since language is a tool of thought, new language features tend to spawn new programming techniques. In this section we cover some commonly-used template programming idioms that have emerged in the years since templates were added to the C++ language.[62] Comment

Traits

The traits template technique, pioneered by Nathan Myers, is a means of bundling type-dependent declarations together. In essence, using traits allows you to “mix and match” certain types and values with contexts that use them in a flexible manner, while keeping your code readable and maintainable. Comment

The simplest example of a traits template is the numeric_limits class template defined in <limits>. The primary template is defined as follows:

template<class T> class numeric_limits {

public:

  static const bool is_specialized = false;

  static T min() throw();

  static T max() throw();

  static const int digits = 0;

  static const int digits10 = 0;

  static const bool is_signed = false;

  static const bool is_integer = false;

  static const bool is_exact = false;

  static const int radix = 0;

  static T epsilon() throw();

  static T round_error() throw();

  static const int min_exponent = 0;

  static const int min_exponent10 = 0;

  static const int max_exponent = 0;

  static const int max_exponent10 = 0;

  static const bool has_infinity = false;

  static const bool has_quiet_NaN = false;

  static const bool has_signaling_NaN = false;

  static const float_denorm_style has_denorm =

                                  denorm_absent;

  static const bool has_denorm_loss = false;

  static T infinity() throw();

  static T quiet_NaN() throw();

  static T signaling_NaN() throw();

  static T denorm_min() throw();

  static const bool is_iec559 = false;

  static const bool is_bounded = false;

  static const bool is_modulo = false;

  static const bool traps = false;

  static const bool tinyness_before = false;

  static const float_round_style round_style =

                                 round_toward_zero;

};

 

The <limits> header defines specializations for all fundamental, numeric types (in which case the member is_specialized is set to true). To obtain the base for the double version of your floating-point number system, for example, you can use the expression numeric_limits<double>::radix. To find the smallest integer value available, you can use numeric_limits<int>::min( ). Not all members of numeric_limits apply to all fundamental types, of course. (For example, epsilon( ) is only for floating-point types.) Comment

The values that will always be integral are static data members of numeric_limits; those that may not be integral, such as the minimum value for float, are implemented as static inline member functions. This is because C++ allows only integral static data member constants to be initialized inside a class definition. Other members, such as floating-point values, must be initialized at file scope outside the class definition, which is not appropriate in a header file. Since the needed value in that case will be placed in an implementation (.cpp) file, the value will not be available for compile-time optimization. Inline member functions of a class template, on the other hand, can be included in a header file, and thus facilitate compile-time optimization. Comment

In Chapter 3 you saw how traits are used to control the character-processing functionality used by the string classes. The classes std::string and std::wstring are specializations of the std::basic_string template, which is defined as follows:

template<class charT,

  class traits = char_traits<charT>,

  class allocator = allocator<charT> >

  class basic_string;

 

The template parameter charT represents the underlying character type, which is usually either char or wchar_t. The primary char_traits template is typically empty, and specializations for char and wchar_t are provided by the standard library. Here is the specification of the specialization char_traits<char> according to the C++ standard: Comment

template<>

struct char_traits<char> {

  typedef char char_type;

  typedef int int_type;

  typedef streamoff off_type;

  typedef streampos pos_type;

  typedef mbstate_t state_type;

  static void assign(char_type& c1, const char_type& c2);

  static bool eq(const char_type& c1, const char_type& c2);

  static bool lt(const char_type& c1, const char_type& c2);

  static int compare(const char_type* s1,

                     const char_type* s2, size_t n);

  static size_t length(const char_type* s);

  static const char_type* find(const char_type* s,

                               size_t n,

                               const char_type& a);

  static char_type* move(char_type* s1,

                         const char_type* s2, size_t n);

  static char_type* copy(char_type* s1,

                         const char_type* s2, size_t n);

  static char_type* assign(char_type* s, size_t n,

                           char_type a);

  static int_type not_eof(const int_type& c);

  static char_type to_char_type(const int_type& c);

  static int_type to_int_type(const char_type& c);

  static bool eq_int_type(const int_type& c1,

                          const int_type& c2);

  static int_type eof();

};

 

These functions are used by the basic_string class template for character-based operations common to string processing. When you declare a string variable, such as: Comment

std::string s;

 

you are actually declaring s as follows (because of the default template arguments in the specification of basic_string):

std::basic_string<char, std::char_traits<char>,

                  std::allocator<char> > s;

 

Because the character traits have been separated from the basic_string class template, you can supply a custom traits class to replace std::char_traits. The following example illustrates this flexibility. Comment

//: C05:PoohCorner.cpp

// Illustrates traits classes

#include <iostream>

using namespace std;

 

// Item classes (traits of guests):

class Water {

public:

  friend ostream& operator<<(ostream& os, const Water&) {

    return os << "Water";

  }

};

class Milk {

public:

  friend ostream& operator<<(ostream& os, const Milk&) {

    return os << "Milk";

  }

};

class Honey {

public:

  friend ostream& operator<<(ostream& os, const Honey&) {

    return os << "Honey";

  }

};

class Cookies {

public:

  friend ostream& operator<<(ostream& os, const Cookies&) {

    return os << "Cookies";

  }

};

 

// Guest classes:

class Bear {

public:

  friend ostream& operator<<(ostream& os, const Bear&) {

    return os << "Pooh";

  }

};

class Boy {

public:

  friend ostream& operator<<(ostream& os, const Boy&) {

    return os << "Christopher Robin";

  }

};

 

// Primary traits template (empty—could hold common types)

template<class Guest>

class GuestTraits;

 

// Traits specializations for Guest types

template<>

class GuestTraits<Bear> {

public:

  typedef Water beverage_type;

  typedef Honey snack_type;

};

template<>

class GuestTraits<Boy> {

public:

  typedef Milk beverage_type;

  typedef Cookies snack_type;

};

 

// A custom traits class

class MixedUpTraits {

public:

  typedef Milk beverage_type;

  typedef Honey snack_type;

};

 

// The Guest template (uses a traits class)

template< class Guest, class traits = GuestTraits<Guest> >

class PoohCorner {

  Guest theGuest;

  typedef typename traits::beverage_type beverage_type;

  typedef typename traits::snack_type snack_type;

  beverage_type bev;

  snack_type snack;

public:

  PoohCorner(const Guest& g)

    : theGuest(g), bev(beverage_type()),

      snack(snack_type()) {}

  void entertain() {

    cout << "Entertaining " << theGuest

         << " serving " << bev

         << " and " << snack << endl;

  }

};

 

int main() {

  Boy cr;

  PoohCorner<Boy> pc1(cr);

  pc1.entertain();

  Bear pb;

  PoohCorner<Bear> pc2(pb);

  pc2.entertain();

  PoohCorner<Bear, MixedUpTraits> pc3(pb);

  pc3.entertain();

} ///:~ Comment

 

In this program, instances of the guest classes Boy and Bear are served items appropriate to their tastes. Boys like milk and cookies and Bears like water and honey. This association of guests to items is done via specializations of a primary (empty) traits class template. The default arguments to PoohCorner ensure that guests get their proper items, but you can override this by simply providing a class that meets the requirements of the traits class, as we do with the MixedUpTraits class above. The output of this program is: Comment

Entertaining Christopher Robin serving Milk and Cookies

Entertaining Pooh serving Water and Honey

Entertaining Pooh serving Milk and Honey

 

Using traits provides two key advantages: (1) it allows flexibility in pairing objects with associated attributes or functionality, and (2) it keeps template parameter lists small and readable. If 30 types were associated with a guest, it would be inconvenient to have to specify all 30 arguments directly in each PoohCorner declaration. Factoring the types into a separate traits class simplifies things considerably. Comment

The traits technique is also used in implementing streams and locales, as we showed in Chapter 4.

Policies

If you inspect the char_traits specialization for wchar_t, you’ll see that it is practically identical to its char counterpart:

template<>

  struct char_traits<wchar_t> {

  typedef wchar_t char_type;

  typedef wint_t int_type;

  typedef streamoff off_type;

  typedef wstreampos pos_type;

  typedef mbstate_t state_type;

  static void assign(char_type& c1, const char_type& c2);

  static bool eq(const char_type& c1, const char_type& c2);

  static bool lt(const char_type& c1, const char_type& c2);

  static int compare(const char_type* s1,

                     const char_type*  s2, size_t n);

  static size_t length(const char_type* s);

  static const char_type* find(const char_type* s,

                               size_t n,

                               const char_type& a);

  static char_type* move(char_type* s1,

                         const char_type* s2, size_t n);

  static char_type* copy(char_type* s1,

                         const char_type* s2, size_t n);

  static char_type* assign(char_type* s, size_t n,

                           char_type a);

  static int_type not_eof(const int_type& c);

  static char_type to_char_type(const int_type& c);

  static int_type to_int_type(const char_type& c);

  static bool eq_int_type(const int_type& c1,

                          const int_type& c2);

  static int_type eof();

};

 

The only real difference between the two versions is the set of types involved (char and int vs. wchar_t and wint_t). The functionality provided is the same.[63] This highlights the fact that traits classes are indeed for traits, and therefore the things that change between related traits classes are usually types and constant values, or fixed algorithms that use type-related template parameters. Traits classes tend to be templates themselves, since the types and constants they contain are seen as characteristics of the primary template parameter(s) (for example, char and wchar_t). Comment

It is also useful to be able to associate functionality with template arguments, so that client programmers can easily customize behavior when they code. The following version of the PoohCorner program, for instance, supports different types of entertainment: Comment

//: C05:PoohCorner2.cpp

// Illustrates policy classes

#include <iostream>

using namespace std;

 

// Item classes:

class Water {

public:

  friend ostream& operator<<(ostream& os, const Water&) {

    return os << "Water";

  }

};

class Milk {

public:

  friend ostream& operator<<(ostream& os, const Milk&) {

    return os << "Milk";

  }

};

class Honey {

public:

  friend ostream& operator<<(ostream& os, const Honey&) {

    return os << "Honey";

  }

};

class Cookies {

public:

  friend ostream& operator<<(ostream& os, const Cookies&) {

    return os << "Cookies";

  }

};

 

// Guest classes:

class Bear {

public:

  friend ostream& operator<<(ostream& os, const Bear&) {

    return os << "Pooh";

  }

};

class Boy {

public:

  friend ostream& operator<<(ostream& os, const Boy&) {

    return os << "Christopher Robin";

  }

};

 

// Traits template

template<class Guest>

class GuestTraits;

 

// Traits specializations for Guest types

template<>

class GuestTraits<Bear> {

public:

  typedef Water beverage_type;

  typedef Honey snack_type;

};

template<>

class GuestTraits<Boy> {

public:

  typedef Milk beverage_type;

  typedef Cookies snack_type;

};

 

// Policy classes (require a static doAction() function)

class Feed {

public:

  static const char* doAction() {

    return "Feeding";

  }

};

class Stuff {

public:

  static const char* doAction() {

    return "Stuffing";

  }

};

 

// The Guest template (uses a policy and a traits class)

template< class Guest, class Action, class traits =

                                     GuestTraits<Guest> >

class PoohCorner {

  Guest theGuest;

  typedef typename traits::beverage_type beverage_type;

  typedef typename traits::snack_type snack_type;

  beverage_type bev;

  snack_type snack;

public:

  PoohCorner(const Guest& g)

    : theGuest(g), bev(beverage_type()), snack(snack_type()) {}

  void entertain() {

    cout << Action::doAction() << " " << theGuest

         << " with " << bev

         << " and " << snack << endl;

  }

};

 

int main() {

  Boy cr;

  PoohCorner<Boy, Feed> pc1(cr);

  pc1.entertain();

  Bear pb;

  PoohCorner<Bear, Stuff> pc2(pb);

  pc2.entertain();

} ///:~ Comment

 

The Action template parameter in the PoohCorner class expects to have a static member function named doAction( ), which is used in PoohCorner<>::entertain( ). Users can choose Feed or Stuff at will, both of which provide the required function. Classes that encapsulate functionality in this way are referred to as policy classes. The entertainment “policies” are provided above through Feed::doAction( ) and Stuff::doAction( ). These policy classes happen to be ordinary classes, but they can be templates, and can be combined with inheritance to great advantage. For more in-depth information on policy-based design, see is Andrei Alexandrescu’s book,[64] the definitive source. Comment

The curiously recurring template pattern

Any novice C++ programmer can figure out how to modify a class to keep track of the number of objects of that class that currently exist. All you have to do is to add static members, and modify constructor and destructor logic, as follows: Comment

//: C05:CountedClass.cpp

// Object counting via static members

#include <iostream>

using namespace std;

 

class CountedClass {

  static int count;

public:

  CountedClass() { ++count; }

  CountedClass(const CountedClass&) { ++count; }

  ~CountedClass() { --count; }

  static int getCount() { return count; }

};

 

int CountedClass::count = 0;

 

int main() {

  CountedClass a;

  cout << CountedClass::getCount() << endl;   // 1

  CountedClass b;

  cout << CountedClass::getCount() << endl;   // 2

  { // an arbitrary scope:

    CountedClass c(b);

    cout << CountedClass::getCount() << endl; // 3

    a = c;

    cout << CountedClass::getCount() << endl; // 3

  }

  cout << CountedClass::getCount() << endl;   // 2

} ///:~ Comment

 

All constructors of CountedClass increment the static data member count, and the destructor decrements it. The static member function getCount( ) yields the number of current objects whenever it called. Comment

It would be tremendously tedious to have to manually add these members every time you wanted to add object counting to a class. What is the usual object-oriented device to which one turns to repeat or share code? It’s inheritance, of course, which, unfortunately, is only half a solution in this case. Observe what happens when we collect the counting logic into a base class. Comment

//: C05:CountedClass2.cpp

// Erroneous attempt to count objects

#include <iostream>

using namespace std;

 

class Counted {

  static int count;

public:

  Counted() { ++count; }

  Counted(const Counted&) { ++count; }

  ~Counted() { --count; }

  static int getCount() { return count; }

};

int Counted::count = 0;

 

class CountedClass : public Counted {};

class CountedClass2 : public Counted {};

 

int main() {

  CountedClass a;

  cout << CountedClass::getCount() << endl;    // 1

  CountedClass b;

  cout << CountedClass::getCount() << endl;    // 2

  CountedClass2 c;

  cout << CountedClass2::getCount() << endl;   // 3 (Error)

} ///:~

 

All classes that derive from Counted share the same, single static data member, so the number of objects is tracked collectively across all classes in the Counted hierarchy. What is needed is a way to automatically generate a different base class for each derived class. This is accomplished by the curious template construct illustrated below: Comment

//: C05:CountedClass3.cpp

#include <iostream>

using namespace std;

 

template<class T>

class Counted {

  static int count;

public:

  Counted() { ++count; }

  Counted(const Counted<T>&) { ++count; }

  ~Counted() { --count; }

  static int getCount() { return count; }

};

template<class T>

int Counted<T>::count = 0;

 

// Curious class definitions

class CountedClass : public Counted<CountedClass> {};

class CountedClass2 : public Counted<CountedClass2> {};

 

int main() {

  CountedClass a;

  cout << CountedClass::getCount() << endl;    // 1

  CountedClass b;

  cout << CountedClass::getCount() << endl;    // 2

  CountedClass2 c;

  cout << CountedClass2::getCount() << endl;   // 1 (!)

} ///:~ Comment

 

Each derived class derives from a unique base class that is determined by using itself (the derived class) as a template parameter! This may seem like a circular definition, and it would be, had any base class members used the template argument in a computation. Since all data members of Counted are not dependent on T, its size (which is zero!) is known when the template is parsed. It doesn’t matter, therefore, which argument is used to instantiate Counted; its size is always the same. Therefore, any derivation from an instance of Counted can be completed when it is parsed, and there is no recursion. Since each base class is unique, it has its own static data, thus constituting a handy technique for adding counting to any class whatsoever. Jim Coplien was the first to mention this interesting derivation idiom in print, which he cited in an article, entitled “Curiously Recurring Template Patterns.” [65] Comment

Template metaprogramming

In 1993 compilers were beginning to support simple template constructs so that users could define generic containers and functions. About the same time that the STL was being considered for adoption into standard C++, clever and surprising examples such as the following were passed around among members of the standards committee: Comment

//: C05:Factorial.cpp

// Compile-time computation!

#include <iostream>

using namespace std;

template<int n>

struct Factorial {

   enum {val = Factorial<n-1>::val * n};

};

template<>

struct Factorial<0> {

   enum {val = 1};

};

int main() {

   cout << Factorial<12>::val << endl; // 479001600

} ///:~

 

That this program prints the correct value of 12! is not alarming. What is alarming is that the computation is complete before the program even runs! Comment

When the compiler attempts to instantiate Factorial<12>, it finds it must also instantiate Factorial<11>, which requires Factorial<10>, and so on. Eventually the recursion ends with the specialization Factorial<1>, and the computation unwinds. Eventually, Factorial<12>::val is replaced by the integral constant 479001600, and compilation ends. Since all the computation is done by the compiler, the values involved must be compile-time constants, hence the use of enum.[66] When the program runs, the only work left to do is print that constant followed by a newline. To convince yourself that a specialization of Factorial results in the correct compile-time value, you could use it as an array dimension, such as: Comment

double nums[Factorial<5>::val];

assert(sizeof nums == sizeof(double)*120);

 

Compile-time programming

So what was meant to be a convenient way to perform type parameter substitution turned out to be a mechanism to support compile-time programming. Such a program is called a template metaprogram (since you’re in effect “programming a program”), and it turns out that you can do quite a lot with such a beast. In fact, template metaprogramming is Turing complete because it supports selection (if-else) and looping (through recursion); so theoretically you can perform any computation with it.[67] The factorial example above shows how to implement repetition; write a recursive template and provide a stopping criterion via a specialization. The following example shows how to compute Fibonacci numbers at compile time by the same technique. Comment

//: C05:Fibonacci.cpp

#include <iostream>

using namespace std;

template<int n>

struct Fib {

   enum {val = Fib<n-1>::val + Fib<n-2>::val};

};

template<>

struct Fib<1> {

   enum {val = 1};

};

template<>

struct Fib<0> {

   enum {val = 0};

};

 

int main() {

   cout << Fib<5>::val << endl;   // 6

   cout << Fib<20>::val << endl;  // 6765

} ///:~ Comment

 

Fibonacci numbers are defined mathematically as:

The first two cases lead to the template specializations above, and the rule in the third line becomes the primary template. Comment

Compile-time looping

To compute any loop in a template metaprogram, it must first be reformulated recursively. For example, to raise the integer n to the power p, instead of using a loop such as in the following lines: Comment

int val = 1;

while (p--)

  val *= n;

 

you would have to think of it as a recursive procedure:

int power(int n, int p) {

  return (p == 0) ? 1 : n*power(n, p - 1);

}

 

This can now be easily rendered as a template metaprogram as follows:

//: C05:Power.cpp

#include <iostream>

using namespace std;

 

template<int N, int P>

struct Power {

  enum {val = N * Power<N, P-1>::val};

};

template<int N>

struct Power<N, 0> {

  enum {val = 1};

};

int main() {

  cout << Power<2, 5>::val << endl;  // 32

} ///:~

 

Note that we need to use a partial specialization for the stopping condition, since the value N is still a free template parameter. This program only works for non-negative powers, of course. Comment

The following metaprogram adapted from Czarnecki and Eisenecker[68] is interesting in that it uses a template template parameter, and simulates passing a function as a parameter to another function, which “loops through” the numbers 0..n. Comment

//: C05:Accumulate.cpp

// Passes a "function" as a parameter at compile time

#include <iostream>

using namespace std;

// Accumulates the results of F(0)..F(n)

template<int n, template<int> class F>

struct Accumulate {

   enum {val = Accumulate<n-1, F>::val + F<n>::val};

};

// The stopping criterion (returns the value F(0))

template<template<int> class F>

struct Accumulate<0, F> {

   enum {val = F<0>::val};

};

// Various "functions":

template<int n>

struct Identity {

   enum {val = n};

};

template<int n>

struct Square {

   enum {val = n*n};

};

template<int n>

struct Cube {

   enum {val = n*n*n};

};

int main() {

   cout << Accumulate<4, Identity>::val << endl; // 10

   cout << Accumulate<4, Square>::val << endl;   // 30

   cout << Accumulate<4, Cube>::val << endl;     // 100

} ///:~ Comment

 

The primary Accumulate template attempts to compute the sum F(n)+F(n‑1)…F(0). The stopping criterion is obtained by a partial specialization, which “returns” F(0). The parameter F is itself a template, and acts like a function as in the previous examples in this section. The templates Identity, Square, and Cube compute the corresponding functions of their template parameter that their names suggest. The first instantiation of Accumulate in main( ) computes the sum 4+3+2+1+0, because the Identity function simply “returns” its template parameter. The second line in main( ) adds the squares of those numbers (16+9+4+1+0), and the last computes the sum of the cubes (64+27+8+1+0). Comment

Loop unrolling

Algorithm designers have always endeavored to optimize their programs. One time-honored optimization, especially for numeric programming, is loop unrolling, a technique that minimizes loop overhead. The quintessential loop-unrolling example is matrix multiplication. The following function multiplies a matrix and a vector. (The constants rows and cols have been previously defined.): Comment

void mult(int a[rows][cols], int x[cols], int y[cols]) {

  for (int i = 0; i < rows; ++i) {

      y[i] = 0;

      for (int j = 0; j < cols; ++j)

        y[i] += a[i][j]*x[j];

  }

}

 

If cols is an even number, the overhead of incrementing and comparing the loop control variable j can be cut in half by “unrolling” the computation into pairs in the inner loop: Comment

void mult(int a[rows][cols], int x[cols], int y[cols]) {

  for (int i = 0; i < rows; ++i) {

      y[i] = 0;

      for (int j = 0; j < cols; j += 2)

        y[i] += a[i][j]*x[j] + a[i][j+1]*x[j+1];

  }

}

 

In general, if cols is a factor of k, k operations can be performed each time the inner loop iterates, greatly reducing the overhead. The savings is only noticeable on large arrays, but that is precisely the case with industrial-strength mathematical computations. Comment

Function inlining also constitutes a form of loop unrolling. Consider the following approach to computing powers of integers.

//: C05:Unroll.cpp

// Unrolls an implicit loop via inlining

#include <iostream>

using namespace std;

 

template<int n>

inline int power(int m) {

   return power<n-1>(m) * m;

}

template<>

inline int power<1>(int m) {

   return m;

}

template<>

inline int power<0>(int m) {

   return 1;

}

int main()

{

   int m = 4;

   cout << power<3>(m) << endl;

} ///:~ Comment

 

Conceptually, the compiler must generate three specializations of power<>, one each for the template parameters 3, 2, and 1. Because the code for each of these functions can be inlined, the actual code that is inserted into main( ) is the single expression m*m*m. Thus, a simple template specialization coupled with inlining here provides a way to totally avoid loop control overhead.[69] This approach to loop unrolling is limited by your compiler’s inlining depth, of course. Comment

Compile-time selection

To simulate conditionals at compile time, you can use the conditional ternary operator in an enum declaration. The following program uses this technique to calculate the maximum of two integers at compile time. Comment

//: C05:Max.cpp

#include <iostream>

using namespace std;

 

template<int n1, int n2>

struct Max {

   enum {val = n1 > n2 ? n1 : n2};

};

int main() {

   cout << Max<10, 20>::val << endl;  // 20

} ///:~

 

If you want to use compile-time conditions to govern custom code generation, you can once again use specializations of the values true and false:

//: C05:Conditionals.cpp

// Uses compile-time conditions to choose code

#include <iostream>

using namespace std;

 

template<bool cond>

struct Select {};

 

template<>

struct Select<true> {

  static inline void f() { statement1(); }

private:

  static inline void statement1() {

    cout << "This is";

    cout << " statement1 executing\n";

  }

};

template<>

struct Select<false> {

  static inline void f() { statement2(); }

private:

  static inline void statement2() {

    cout << "This is";

    cout << " statement2 executing\n";

  }

};

template<bool cond>

void execute() {

  Select<cond>::f();

}

 

int main() {

  execute<sizeof(int) == 4>();

} ///:~ Comment

 

This program is the equivalent of the expression:

if (cond)

  statement1();

else

  statement2();

 

except that the condition cond is evaluated at compile time, and the appropriate versions of execute<>( ) and Select<> are instantiated by the compiler. The function Select<>::f( ) executes at runtime, of course. A switch statement can be emulated in similar fashion, but specializing on each case value instead of the values true and false. Comment

Compile-time assertions

In Chapter 2 we touted the virtues of using assertions as part of an overall defensive programming strategy. An assertion is basically an evaluation of a Boolean expression followed by a suitable action: do nothing if the condition is true, or halt with a diagnostic message otherwise. The previous section showed how to evaluate compile-time Boolean expressions. The remaining challenge in emulating assertions at compile time is to print a meaningful error message and halt. All that is required to halt the compiler is a compile error; the trick is to insert helpful text in the error message. The following example of Alexandrescu[70] uses template specialization, a local class, and a little macro magic to do the job. Comment

//: C05:StaticAssert.cpp

//{-g++}

#include <iostream>

using namespace std;

 

// A template and a specialization

template<bool>

struct StaticCheck {

   StaticCheck(...);

};

template<>

struct StaticCheck<false>{};

 

// The macro (generates a local class)

#define STATIC_CHECK(expr, msg) {              \

   class Error_##msg{};                        \

   sizeof((StaticCheck<expr>(Error_##msg()))); \

}

 

// Detects narrowing conversions

template<class To, class From>

To safe_cast(From from) {

   STATIC_CHECK(sizeof(From) <= sizeof(To),

                NarrowingConversion);

   return reinterpret_cast<To>(from);

}

 

int main() {

   void* p = 0;

   int i = safe_cast<int>(p);

   cout << "int cast okay\n";

//!   char c = safe_cast<char>(p);

} ///:~ Comment

 

This example defines a function template, safe_cast<>( ), that checks to see if the object it is casting from is no larger than the type of object it casts to. If the size of the target object type is smaller, then the user will be notified at compile time that a narrowing conversion was attempted. Notice that the StaticCheck class template has the curious feature that anything can be converted to an instance of StaticCheck<true> (because of the ellipsis in its constructor[71]), and nothing can be converted to a StaticCheck<false>, because no conversions are supplied for that specialization. The idea is to attempt to create an instance of a new class and attempt to convert it to StaticCheck<true> at compile time whenever the condition of interest is true, or to a StaticCheck<false> object when the condition being tested is false. Since the sizeof operator does its work at compile time, it is used to attempt the conversion. If the condition is false, the compiler will complain that it doesn’t know how to convert from the new class type to StaticCheck<false>. (The extra parentheses inside the sizeof invocation in STATIC_CHECK( ) are to prevent the compiler from thinking that we’re trying to invoke sizeof on a function, which is illegal.) To get some meaningful information inserted into the error message, the new class name carries key text in its name. Comment

The best way to understand this technique is to walk through a specific case. Consider the line in main( ) above which reads:

   int i = safe_cast<int>(p);

 

The call to safe_cast<int>(p) contains the following macro expansion replacing its first line of code:

{                                                   \

   class Error_NarrowingConversion{};               \

   sizeof(StaticCheck<sizeof(void*) <= sizeof(int)> \

           (Error_NarrowingConversion()));          \

}

 

(Recall that the token-pasting preprocessing operator, ##, concatenates its operand into a single token, so Error_##NarrowingConversion becomes the token Error_NarrowingConversion after preprocessing). The class Error_NarrowingConversion is a local class (meaning that it is declared inside a non-namespace scope) because it is not needed elsewhere in the program. The application of the sizeof operator here attempts to determine the size of an instance of StaticCheck<true> (because sizeof(void*) <= sizeof(int) is true on our platforms), created implicitly from the temporary object returned by the call Error_NarrowingConversion( ). The compiler knows the size of the new class Error_NarrowingConversion (it’s empty), and so the compile-time use of sizeof at the outer level in STATIC_CHECK( ) is valid. Since the conversion from the Error_NarrowingConversion temporary to StaticCheck<true> succeeds, so does this outer application of sizeof, and execution continues. Comment

Now consider what would happen if the comment were removed from the last line of main( ):

   char c = safe_cast<char>(p);

 

In this case the STATIC_CHECK( ) macro inside safe_cast<char>(p) expands to:

{                                                    \

   class Error_NarrowingConversion{};                \

   sizeof(StaticCheck<sizeof(void*) <= sizeof(char)> \

           (Error_NarrowingConversion()));           \

}

 

Since the expression sizeof(void*) <= sizeof(char) is false, a conversion from an Error_NarrowingConversion temporary to StaticCheck<false> is attempted, as follows:

sizeof(StaticCheck<false>(Error_NarrowingConversion()));

 

which fails, so the compiler halts with a message something like the following: Comment

Cannot cast from 'Error_NarrowingConversion' to 'StaticCheck<0>' in function

char safe_cast<char,void *>(void *)

 

The class name Error_NarrowingConversion is the meaningful message judiciously arrange by the coder. In general, to perform a static assertion with this technique, you just invoke the STATIC_CHECK macro with the compile-time condition to check and with a meaningful name to describe the error. Comment

Expression templates

Perhaps the most powerful application of templates is a technique discovered independently in 1994 by Todd Veldhuizen[72] and David Vandevoorde:[73] expression templates. Expression templates enable extensive compile-time optimization of certain computations that results in code that is at least as fast as hand-optimized Fortran, and yet preserves the natural notation of mathematics via operator overloading. Although you wouldn’t be likely to use this technique in everyday programming, it is the basis for a number of sophisticated, high-performance mathematical libraries written in C++.[74] Comment

To motivate the need for expression templates, consider typical numerical linear algebra operations, such as adding together two matrices or vectors,[75] such as in the following:

D = A + B + C;

 

In naive implementations, this expression would result in a number of temporaries—one for A+B, and one for (A+B)+C. When these variables represent immense matrices or vectors, the coincident drain on resources is unacceptable. Expression templates allow you to use the same expression without temporaries. Comment

In the following sample program, we define a MyVector class to simulate mathematical vectors of any size. We use a non-type template argument for the length of the vector. We also define a MyVectorSum class to act as a proxy class for a sum of MyVector objects. This allows us to use lazy evaluation, so the addition of vector components is performed on demand without the need for temporaries. Comment

//: C05:MyVector.cpp

// Optimizes away temporaries via templates

#include <cstddef>

#include <cstdlib>

#include <iostream>

using namespace std;

 

// A proxy class for sums of vectors

template<class, size_t> class MyVectorSum;

 

template<class T, size_t N>

class MyVector {

  T data[N];

public:

  MyVector<T,N>& operator=(const MyVector<T,N>& right) {

    for (size_t i = 0; i < N; ++i)

      data[i] = right.data[i];

    return *this;

  }

  MyVector<T,N>& operator=(const MyVectorSum<T,N>& right);

  const T& operator[](size_t i) const {

    return data[i];

  }

  T& operator[](size_t i) {

    return data[i];

  }

};

// Proxy class hold references; uses lazy addition

template <class T, size_t N>

class MyVectorSum {

  const MyVector<T,N>& left;

  const MyVector<T,N>& right;

public:

  MyVectorSum(const MyVector<T,N>& lhs,

              const MyVector<T,N>& rhs)

      : left(lhs), right(rhs) {}

  T operator[](size_t i) const {

    return left[i] + right[i];

  }

};

// Operator to support v3 = v1 + v2

template<class T, size_t N>

MyVector<T,N>&

MyVector<T,N>::operator=(const MyVectorSum<T,N>& right) {

  for (size_t i = 0; i < N; ++i)

    data[i] = right[i];

  return *this;

}

// operator+ just stores references

template<class T, size_t N>

inline MyVectorSum<T,N>

operator+(const MyVector<T,N>& left,

          const MyVector<T,N>& right) {

  return MyVectorSum<T,N>(left, right);

}

// Convenience functions for the test program below

template<class T, size_t N>

void init(MyVector<T,N>& v) {

  for (size_t i = 0; i < N; ++i)

    v[i] = rand() % 100;

}

template<class T, size_t N>

void print(MyVector<T,N>& v) {

  for (size_t i = 0; i < N; ++i)

    cout << v[i] << ' ';

  cout << endl;

}

int main() {

  MyVector<int, 5> v1;

  init(v1);

  print(v1);

  MyVector<int, 5> v2;

  init(v2);

  print(v2);

  MyVector<int, 5> v3;

  v3 = v1 + v2;

  print(v3);

  MyVector<int, 5> v4;

  // Not yet supported:

//!  v4 = v1 + v2 + v3;

} ///:~ Comment

 

The MyVectorSum class does no computation when it is created; it merely holds references to the two vectors to be added. It is only when you access a component of a vector sum that it is calculated (see its operator[]( )). The overload of the assignment operator for MyVector that takes a MyVectorSum argument is for an expression such as: Comment

v1 = v2 + v3;  // add two vectors

 

When the expression v1+v2 is evaluated, a MyVectorSum object is returned (or actually, inserted inline, since that operator+( ) is declared inline). This is a small, fixed-size object (it holds only two references). Then the assignment operator mentioned above is invoked:

v3.operator=<int,5>(MyVectorSum<int,5>(v2, v3));

 

This assigns to each element of v3 the sum of the corresponding elements of v1 and v2, computed in real time. No temporary MyVector objects are created.

This program does not support an expression that has more than two operands, however, such as

v4 = v1 + v2 + v3;

 

The reason is that after the first addition, a second addition is attempted:

(v1 + v2) + v3;

 

which would require an operator+( ) with a first argument of MyVectorSum and a second argument of type MyVector. We could attempt to provide a number of overloads to meet all situations, but it is better to let templates do the work, as in the following version of the program. Comment

//: C05:MyVector2.cpp

// Handles sums of any length with expression templates

#include <cstddef>

#include <cstdlib>

#include <iostream>

using namespace std;

 

// A proxy class for sums of vectors

template<class, size_t, class, class> class MyVectorSum;

 

template<class T, size_t N>

class MyVector {

  T data[N];

public:

  MyVector<T,N>& operator=(const MyVector<T,N>& right) {

    for (size_t i = 0; i < N; ++i)

      data[i] = right.data[i];

    return *this;

  }

  template<class Left, class Right>

  MyVector<T,N>&

    operator=(const MyVectorSum<T,N,Left,Right>& right);

  const T& operator[](size_t i) const {

    return data[i];

  }

  T& operator[](size_t i) {

    return data[i];

  }

};

// Allows mixing MyVector and MyVectorSum

template <class T, size_t N, class Left, class Right>

class MyVectorSum {

  const Left& left;

  const Right& right;

public:

  MyVectorSum(const Left& lhs, const Right& rhs)

      : left(lhs), right(rhs) {}

  T operator[](size_t i) const {

    return left[i] + right[i];

  }

};

template<class T, size_t N>

template<class Left, class Right>

MyVector<T,N>&

MyVector<T,N>::

operator=(const MyVectorSum<T,N,Left,Right>& right) {

  for (size_t i = 0; i < N; ++i)

    data[i] = right[i];

  return *this;

}

// operator+ just stores references

template<class T, size_t N>

inline MyVectorSum<T,N,MyVector<T,N>,MyVector<T,N> >

operator+(const MyVector<T,N>& left,

          const MyVector<T,N>& right) {

  return

    MyVectorSum<T,N,MyVector<T,N>,MyVector<T,N> >

      (left,right);

}

 

template<class T, size_t N, class Left, class Right>

inline

MyVectorSum<T, N, MyVectorSum<T,N,Left,Right>,

            MyVector<T,N> >

operator+(const MyVectorSum<T,N,Left,Right>& left,

          const MyVector<T,N>& right) {

  return MyVectorSum<T,N,MyVectorSum<T,N,Left,Right>,

                         MyVector<T,N> >

    (left, right);

}

// Convenience functions for the test program below

template<class T, size_t N>

void init(MyVector<T,N>& v) {

  for (size_t i = 0; i < N; ++i)

    v[i] = rand() % 100;

}

template<class T, size_t N>

void print(MyVector<T,N>& v) {

  for (size_t i = 0; i < N; ++i)

    cout << v[i] << ' ';

  cout << endl;

}

int main() {

  MyVector<int, 5> v1;

  init(v1);

  print(v1);

  MyVector<int, 5> v2;

  init(v2);

  print(v2);

  MyVector<int, 5> v3;

  v3 = v1 + v2;

  print(v3);

  // Now supported:

  MyVector<int, 5> v4;

  v4 = v1 + v2 + v3;

  print(v4);

  MyVector<int, 5> v5;

  v5 = v1 + v2 + v3 + v4;

  print(v5);

} ///:~ Comment

 

Instead of committing ahead of time which types the arguments of a sum will be, we let the template facility deduce them with the template arguments, Left and Right. The MyVectorSum template takes these extra two parameters so it can represent a sum of any combination of pairs of MyVector and MyVectorSum. Note also that the assignment operator this time is a member function template. This also allows any <T, N> pair to be coupled with any <Left, Right> pair, so a MyVector object can be assigned from a MyVectorSum holding references to any possible pair of the types MyVector and MyVectorSum. As we did before, let’s trace through a sample assignment to understand exactly what takes place, beginning with the expression Comment

v4 = v1 + v2 + v3;

 

Since the resulting expressions become quite unwieldy, in the explanation that follows, we will use MVS as shorthand for MyVectorSum, and will omit the template arguments. Comment

The first operation is v1+v2, which invokes the inline operator+( ), which in turn inserts MVS(v1, v2) into the compilation stream. This is then added to v3, which results in a temporary object according to the expression MVS(MVS(v1, v2), v3). The final representation of the entire statement is Comment

v4.operator+(MVS(MVS(v1, v2), v3));

 

This transformation is all arranged by the compiler and explains why this technique carries the moniker “expression templates”; the template MyVectorSum represents an expression (an addition, in this case), and the nested calls above are reminiscent of the parse tree of the left-associative expression v1+v2+v3. Comment

An excellent article by Angelika Langer and Klaus Kreft explains how this technique can be extended to more complex computations.[76] Comment

Template compilation models

You have certainly noticed by now that all our template examples place fully-defined templates within each compilation unit. (For example, we place them completely within single-file programs or in header files for multi-file programs.) This runs counter to the conventional practice of separating ordinary function definitions from their declarations by placing the latter in header files and the function implementations in separate (that is, .cpp) files. Everyone knows the reason for this separation: non-inline function bodies in header files can lead to multiple function definitions, which results in a linker error. A nice side benefit of this approach is that vendors can distribute pre-compiled code along with headers so that users cannot see their function implementations, and compile times are shorter since header files are smaller. Comment

The inclusion model

Templates, on the other hand, are not code, per se, but instructions for code generation; only template instantiations are real code. When a compiler has seen a complete template definition during a compilation and then encounters a point of instantiation for that template in the same translation unit, it must deal with the fact that an equivalent point of instantiation may be present in another translation unit. The most common approach consists in generating the code for the instantiation in every translation unit and let the linker weed out duplicates. That particular approach also works well with inline functions that cannot be inlined and with virtual function tables, which is one of the reasons for its popularity. Nonetheless, several compilers prefer instead to rely on more complex schemes to avoid generating a particular instantiation more than once. Either way, it is the responsibility of the C++ translation system to avoid errors due to multiple equivalent points of instantiation. Comment

A drawback of this approach is obviously that all template source code is visible to the client. If you want to know exactly how your standard library is implemented, all you have to do is inspect the headers in your installation. There is little opportunity for library vendors to hide their implementation strategies. Another noticeable disadvantage of the inclusion model is that header files are much, much larger than they would be if function bodies were compiled separately. This can increase compile times dramatically over traditional compilation models. Comment

To help reduce the large headers required by the inclusion model, C++ offers two (non-exclusive) alternative code organization mechanisms: you can manually instantiate each specialization using explicit instantiation or you can use exported templates, which actually support a large degree of separate compilation. Comment

Explicit instantiation

You can manually direct the compiler to instantiate any template specializations of your choice. When you use this technique, there must be one and only one such directive for each such specialization; otherwise you might get multiple definition errors, just as you would with ordinary, non-inline functions with identical signatures. To illustrate, we first (erroneously) separate the declaration of the min template from earlier in this chapter from its definition, following the normal pattern for ordinary, non-inline functions. The following example consists of five files: Comment

·         OurMin.h: contains the declaration of the min function template.

·         OurMin.cpp: contains the definition of the min function template.

·         UseMin1.cpp: attempts to use an int-instantiation of min

·         UseMin2.cpp: attempts to use a double-instantiation of min

·         MinMain.cpp: calls usemin1( ) and usemin2( )

Here are the files:

//: C05:OurMin.h

#ifndef OURMIN_H

#define OURMIN_H

// The declaration of min

template<typename T> const T& min(const T&, const T&);

 

#endif ///:~

 

// OurMin.cpp

#include "OurMin.h"

// The definition of min

template<typename T> const T& min(const T& a, const T& b) {

  return (a < b) ? a : b;

}

 

//: C05:UseMin1.cpp {O}

#include <iostream>

#include "OurMin.h"

void usemin1() {

  std::cout << min(1,2) << std::endl;

} ///:~

 

//: C05:UseMin2.cpp {O}

#include <iostream>

#include "OurMin.h"

void usemin2() {

  std::cout << min(3.1,4.2) << std::endl;

} ///:~

 

//: C05:MinMain.cpp

//{L} UseMin1 UseMin2 MinInstances

void usemin1();

void usemin2();

 

int main() {

  usemin1();

  usemin2();

} ///:~

 

When we attempt to build this program, the linker reports unresolved external references for min<int>( ) and min<double>( ). The reason is that when the compiler encounters the calls to specializations of min in UseMin1 and UseMin2, only the declaration of min is visible. Since the definition is not available, the compiler assumes it will come from some other translation unit, and the needed specializations are therefore not instantiated at that point, leaving the linker to eventually complain that it cannot find them. Comment

To solve this problem, we will introduce a new file, MinInstances.cpp, that explicitly instantiates the needed specializations of min:

//: C05:MinInstances.cpp {O}

#include "OurMin.cpp"

// Explicit Instantiations for int and double

template const int& min<int>(const int&, const int&);

template const double& min<double>(const double&,

                                   const double&);

///:~

 

To manually instantiate a particular template specialization, you precede the specialization’s declaration with the template keyword. That’s it! Note that we must include OurMin.cpp, not OurMin.h, here, because the compiler needs the template definition to perform the instantiation. This is the only place where we have to do this in this program,[77] however, since it gives us the unique instantiations of min that we need; the declarations alone suffice for the other files. Since we are including OurMin.cpp with the macro preprocessor, we add include guards: Comment

//: C05:OurMin.cpp {O}

#ifndef OURMIN_CPP

#define OURMIN_CPP

#include "OurMin.h"

 

template<typename T> const T& min(const T& a, const T& b) {

  return (a < b) ? a : b;

}

#endif ///:~

 

Now when we compile all the files together into a complete program, the unique instances of min are found, and the program executes correctly, giving the output:

1

3.1

 

You can also manually instantiate classes and static data members. When explicitly instantiating a class, all member functions for the requested specialization are instantiated, except any that may have been explicitly instantiated previously. Using only implicit instantiation has the advantage here: only member functions that actually get called are instantiated. Explicit instantiation is intended for large projects in which a hefty chunk of compilation time can be avoided. Whether you use implicit or explicit instantiation is independent of which template compilation you use, of course; you can use manual instantiation with either the inclusion model or the separation model (discussed in the next section). Comment

The separation model

The separation model of template compilation allows you to separate function template definitions or static data member definitions from their declarations across translation units, just like you do with ordinary functions and data, by exporting templates. After reading the preceding two sections, this must sound strange indeed. Why bother to have the inclusion model in the first place if you can just adhere to the status quo? The reasons are both historical and technical. Comment

Historically, the inclusion model was the first to experience widespread commercial use. Part of the reason for that was that the separation model was not well specified until late in the standardization process. It turns out that the inclusion model is the easier of the two to implement. All C++ compilers support the inclusion model. A lot of working code was in existence long before the semantics of the separation model were finalized. Comment

The technical aspect reflects the fact that the separation model is difficult to implement. In fact, as of summer 2003 only one compiler front end (EDG) supports the separation model, and at the moment it still requires that template source code be available at compile time to perform instantiation on demand. Plans are in place to use some form of intermediate code instead of requiring that the original source be at hand, at which point you will be able to ship “pre-compiled” templates without shipping source code. Because of the lookup complexities explained earlier in this chapter (about dependent names being looked up in the template definition context), a full template definition still has to be available in some form when you compile a program that instantiates it. Comment

The syntax to separate the source code of a template definition from its declaration is easy enough. You use the export keyword:

// C05:OurMin2.h

// Declares min as an exported template

//! (Only works with EDG-based compilers)

#ifndef OURMIN2_H

#define OURMIN2_H

export template<typename T> const T& min(const T&,

                                         const T&);

#endif

 

Similar to inline or virtual, the export keyword need only be mentioned once in a compilation stream, where an exported template is introduced. For this reason, we need not repeat it in the implementation file, but it is considered good practice to do so: Comment

// C05:OurMin2.cpp

// The definition of the exported min template

//! (Only works with EDG-based compilers)

#include "OurMin2.h"

export

template<typename T> const T& min(const T& a, const T& b) {

  return (a < b) ? a : b;

} ///:~

 

The UseMin files used previously only need to be updated to include the correct header file (OurMin2.h), and the main program need not change at all. Although this appears to give true separation, the file with the template definition (OurMin2.cpp) must still be shipped to users (because it must be processed for each instantiation of min) until such time as some form of intermediate code representation of template definitions is supported. So while the standard does provide for a true separation model, not all of its benefits can be reaped today. Only one family of compilers currently support export (those based on the EDG front end), and these compilers currently do not exploit the potential ability to distribute template definitions in compiled form. Comment

Summary

Templates have gone far beyond simple type parameterization! When you combine argument type deduction, custom specialization, and template metaprogramming, C++ templates emerge as a powerful code generation mechanism.

One of the weaknesses of C++ templates we skipped in this chapter is the difficulty in interpreting compile-time error messages. When you’re not used to it, the quantity of inscrutable text spewed out by the compiler is quite overwhelming. If it’s any consolation, C++ compilers have actually gotten a lot better about this. Leor Zolman has written a nifty tool STLFilt, that renders these error messages much more readable by extracting the useful information and throwing away the rest.[78] Comment

Another important idea to take away from this chapter is that a template implies an interface. That is, even though the template keyword says “I’ll take any type,” the code in a template definition actually requires that certain operators and member functions be supported—that’s the interface. So in reality, a template definition is saying, “I’ll take any type that supports this interface.” Things would be much nicer if the compiler could simply say, “Hey, this type that you’re trying to instantiate the template with doesn’t support that interface—can’t do it.” Using templates, therefore, constitutes a sort of “latent type checking” that is more flexible than the pure object-oriented practice of requiring all types to derive from certain base classes. Comment

In Chapters 6 and 7 we explore in depth the most famous application of templates, the subset of the standard C++ library commonly known as the Standard Template Library (STL). Chapters 9 and 10 also use template techniques not found in this chapter. Comment

Exercises

                            1.             Write a unary function template that takes a single type template parameter. Create a full specialization for the type int. Also create a non-template overload for this function that takes a single int parameter. Have your main program invoke three function variations.

                            2.             Write a class template that uses a vector to implement a stack data structure.

                       38.             Modify your solution to the previous exercise so that the type of the container used to implement the stack is a template template parameter.

                       39.             In the following code, the class NonComparable does not have an operator=( ). Why would the presence of the class HardLogic cause a compile error, but SoftLogic would not?

class Noncomparable {};

struct HardLogic {

  Noncomparable nc1, nc2;

  void compare() {

     return nc1 == nc2; // Compiler error

  }

};

template<class T>

struct SoftLogic {

  Noncomparable nc1, nc2;

  void noOp() {}

  void compare() {

    nc1 == nc2;

  }

};

int main() {

  SoftLogic<Noncomparable> l;

  l.noOp();
}

 

                       40.             Write a function template that takes a single type parameter (T) and accepts four function arguments: an array of T, a start index, a stop index (inclusive), and an optional initial value. The function returns the sum of all the array elements in the specified range. Use the default constructor of T for the default initial value.

                        41.             Repeat the previous exercise but use explicit instantiation to manually create specializations for int and double, following the technique explained in this chapter.

                       42.             Why does the following code not compile? (Hint: what do class member functions have access to?)

 

class Buddy {};

template<class T>

class My {

  int i;

public:

  void play(My<Buddy>& s) {

    s.i = 3;

  }

};

int main() {

  My<int> h;

  My<Buddy> me, bud;

  h.play(bud);

  me.play(bud);

}

 

                       43.             Why does the following code not compile?

 

template<class T>

double pythag(T a, T b, T c) {

  return (-b + sqrt(double(b*b - 4*a*c))) / 2*a;

}

int main() {

  pythag(1, 2, 3);

  pythag(1.0, 2.0, 3.0);

  pythag(1, 2.0, 3.0);

  pythag<double>(1, 2.0, 3.0);

}

 

                       44.             Write templates that take non-type parameters of the following variety: an int, a pointer to an int, a pointer to a static class member of type int, and a pointer to a static member function.

                       45.             Write a class template that takes two type parameters. Define a partial specialization for the first parameter, and another partial specialization that specifies the second parameter. In each specialization, introduce members that are not in the primary template.

                       46.             Define a class template named Bob that takes a single type parameter. Make Bob a friend of all instances of a template class named Friendly, and a friend of a class template named Picky only when the type parameter of Bob and Picky are identical. Give Bob member functions that demonstrate its friendship.

 

Comment


6: Generic algorithms

Algorithms are at the core of computing. To be able to write an algorithm once and for all to work with any type of sequence makes your programs both simpler and safer. The ability to customize algorithms at runtime has revolutionized software development.

The subset of the standard C++ library known as the Standard Template Library (STL) was originally designed around generic algorithms—code that processes sequences of any type of values in a type-safe manner. The goal was to use predefined algorithms for almost every task, instead of hand-coding loops every time you need to process a collection of data. This power comes with a bit of a learning curve, however. By the time you get to the end of this chapter, you should be able to decide for yourself whether you find the algorithms addictive or too confusing to remember. If you’re like most people, you’ll resist them at first but then tend to use them more and more. Comment

A first look

Among other things, the generic algorithms in the standard library provide a vocabulary with which to describe solutions. That is, once you become familiar with the algorithms, you’ll have a new set of words with which to discuss what you’re doing, and these words are at a higher level than what you had before. You don’t have to say, “This loop moves through and assigns from here to there … oh, I see, it’s copying!” Instead, you just say copy( ). This is the kind of thing we’ve been doing in computer programming from the beginning—creating high-level abstractions to express what you’re doing and spending less time saying how you’re doing it. The how has been solved once and for all and is hidden in the algorithm’s code, ready to be reused on demand. Comment

Here’s an example of how to use the copy algorithm:

//: C06:CopyInts.cpp

// Copies ints without an explicit loop

#include <algorithm>

#include <cassert>

#include <cstddef>  // For size_t

using namespace std;

 

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  int b[SIZE];

  copy(a, a + SIZE, b);

  for (int i = 0; i < SIZE; ++i)

    assert(a[i] == b[i]);

} ///:~ Comment

 

The copy algorithm’s first two parameters represent the range of the input sequence—in this case the array a. Ranges are denoted by a pair of pointers. The first points to the first element of the sequence, and the second points one position past the end of the array (right after the last element). This may seem strange at first, but it is an old C idiom that comes in quite handy. For example, the difference of these two pointers yields the number of elements in the sequence. More important, in implementing copy( ), the second pointer can act as a sentinel to stop the iteration through the sequence. The third argument refers to the beginning of the output sequence, which is the array b in this example. It is assumed that the array that b represents has enough space to receive the copied elements. Comment

The copy( ) algorithm wouldn’t be very exciting if it could only process integers. It can in fact copy any sequence. The following example copies string objects. Comment

//: C06:CopyStrings.cpp

// Copies strings

#include <algorithm>

#include <cassert>

#include <cstddef>

#include <string>

using namespace std;

 

int main() {

  string a[] = {"read", "my", "lips"};

  const size_t SIZE = sizeof a / sizeof a[0];

  string b[SIZE];

  copy(a, a + SIZE, b);

  assert(equal(a, a + SIZE, b));

} ///:~ Comment

 

This example introduces another algorithm, equal( ), which returns true only if each element in the first sequence is equal (using its operator==( )) to the corresponding element in the second sequence. This example traverses each sequence twice, once for the copy, and once for the comparison, without a single explicit loop! Comment

Generic algorithms achieve this flexibility because they are function templates, of course. If you guessed that the implementation of copy( ) looked something like the following, you’d be “almost” right. Comment

template<typename T>

void copy(T* begin, T* end, T* dest) {

  while (begin != end)

    *dest++ = *begin++;

} Comment

 

We say “almost,” because copy( ) can actually process sequences delimited by anything that acts like a pointer, such as an iterator. In this way, copy( ) can duplicate a vector, as in the following example. Comment

//: C06:CopyVector.cpp

// Copies the contents of a vector

#include <algorithm>

#include <cassert>

#include <cstddef>

#include <vector>

using namespace std;

 

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  vector<int> v1(a, a + SIZE);

  vector<int> v2(SIZE);

  copy(v1.begin(), v1.end(), v2.begin());

  assert(equal(v1.begin(), v1.end(), v2.begin()));

} ///:~ Comment

 

The first vector, v1, is initialized from the sequence of integers in the array a. The definition of the vector v2 uses a different vector constructor that makes room for SIZE elements, initialized to zero (the default value for integers).

As with the array example earlier, it’s important that v2 have enough space to receive a copy of the contents of v1. For convenience, a special library function, back_inserter( ), returns a special type of iterator that inserts elements instead of overwriting them, so memory is expanded automatically by the container as needed. The following example uses back_inserter( ), so it doesn’t have to expand the size of the output vector, v2, ahead of time. Comment

//: C06:InsertVector.cpp

// Appends the contents of a vector to another

#include <algorithm>

#include <cassert>

#include <cstddef>

#include <iterator>

#include <vector>

using namespace std;

 

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  vector<int> v1(a, a + SIZE);

  vector<int> v2;  // v2 is empty here

  copy(v1.begin(), v1.end(), back_inserter(v2));

  assert(equal(v1.begin(), v1.end(), v2.begin()));

} ///:~

 

The back_inserter( ) function is defined in the <iterator> header. We’ll explain how insert iterators work in depth in the next chapter. Comment

Since iterators are identical to pointers in all essential ways, you can write the algorithms in the standard library in such a way as to allow both pointer and iterator arguments. For this reason, the implementation of copy( ) looks more like the following code. Comment

template<typename Iterator>

void copy(Iterator begin, Iterator end, Iterator dest) {

  while (begin != end)

    *begin++ = *dest++;

}

 

Whichever argument type you use in the call, copy( ) assumes it properly implements the indirection and increment operators. If it doesn’t, you’ll get a compile-time error. Comment

Predicates

At times, you might want to copy only a well-defined subset of one sequence to another, such as only those elements that satisfy a certain condition. To achieve this flexibility, many algorithms have alternate calling sequences that allow you to supply a predicate, which is simply a function that returns a Boolean value based on some criterion. Suppose, for example, that you only want to extract from a sequence of integers those numbers that are less than or equal to 15. A version of copy( ) called remove_copy_if( ) can do the job, like this: Comment

//: C06:CopyInts2.cpp

// Ignores ints that satisfy a predicate

#include <algorithm>

#include <cstddef>

#include <iostream>

using namespace std;

// You supply this predicate

bool gt15(int x) {

  return 15 < x;

}

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  int b[SIZE];

  int* endb = remove_copy_if(a, a+SIZE, b, gt15);

  int* beginb = b;

  while (beginb != endb)

    cout << *beginb++ << endl; // Prints 10 only

} ///:~ Comment

 

The remove_copy_if( ) function template takes the usual range-delimiting pointers, followed by a predicate of your choosing. The predicate must be a pointer to function[79] that takes a single argument of the same type as the elements in the sequence, and it must return a bool. In this case, the function gt15 returns true if its argument is greater than 15. The remove_copy_if( ) algorithm applies gt15( ) to each element in the input sequence and ignores those elements when writing to the output sequence. Comment

The following program illustrates yet another variation of the copy algorithm.

//: C06:CopyStrings2.cpp

// Replaces strings that satisfy a predicate

#include <algorithm>

#include <cstddef>

#include <iostream>

#include <string>

using namespace std;

// The predicate

bool contains_e(const string& s) {

  return s.find('e') != string::npos;

}

int main() {

  string a[] = {"read", "my", "lips"};

  const size_t SIZE = sizeof a / sizeof a[0];

  string b[SIZE];

  string* endb =

    replace_copy_if(a, a + SIZE, b, contains_e,

                    string("kiss"));

  string* beginb = b;

  while (beginb != endb)

    cout << *beginb++ << endl;

} ///:~ Comment

 

Instead of just ignoring elements that don’t satisfy the predicate, replace_copy_if( ) substitutes a fixed value for such elements when populating the output sequence. The output in this case is

kiss

my

lips

 

because the original occurrence of “read”, the only input string containing the letter e, is replaced by the word “kiss”, as specified in the last argument in the call to replace_copy_if( ). Comment

The replace_if( ) algorithm changes the original sequence in place, instead of writing to a separate output sequence, as the following program shows.

//: C06:ReplaceStrings.cpp

// Replaces strings in-place

#include <algorithm>

#include <cstddef>

#include <iostream>

#include <string>

using namespace std;

bool contains_e(const string& s) {

  return s.find('e') != string::npos;

}

int main() {

  string a[] = {"read", "my", "lips"};

  const size_t SIZE = sizeof a / sizeof a[0];

  replace_if(a, a + SIZE, contains_e, string("kiss"));

  string* p = a;

  while (p != a + SIZE)

    cout << *p++ << endl;

} ///:~

 

Stream iterators

Like any good software library, the Standard C++ Library attempts to provide convenient ways to automate common tasks. We mentioned in the beginning of this chapter that you can use generic algorithms in place of looping constructs. So far, however, our examples have still used an explicit loop to print their output. Since printing output is one of the most common tasks, you would hope for a way to automate that too. Comment

That’s where stream iterators come in. A stream iterator allows you to use a stream as either an input or an output sequence. To eliminate the output loop in the CopyInts2.cpp program, for instance, you can do something like the following. Comment

//: C06:CopyInts3.cpp

// Uses an output stream iterator

#include <algorithm>

#include <cstddef>

#include <iostream>

#include <iterator>

using namespace std;

bool gt15(int x) {

  return 15 < x;

}

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  remove_copy_if(a, a + SIZE,

                 ostream_iterator<int>(cout, "\n"), gt15);

} ///:~ Comment

 

In this example we’ve replaced the output sequence b in the third argument to remove_copy_if( ) with an output stream iterator, which is an instance of the ostream_iterator class template declared in the <iterator> header. Output stream iterators overload their copy-assignment operators to write to their stream. This particular instance of ostream_iterator is attached to the output stream cout. Every time remove_copy_if( ) assigns an integer from the sequence a to cout through this iterator, the iterator writes the integer to cout and also automatically writes an instance of the separator string found in its second argument, which in this case contains just the newline character.

It is just as easy to write to a file instead of to cout, of course. All you have to do is provide an output file stream instead of cout: Comment

//: C06:CopyIntsToFile.cpp

// Uses an output file stream iterator

#include <algorithm>

#include <cstddef>

#include <fstream>

#include <iterator>

using namespace std;

bool gt15(int x) {

  return 15 < x;

}

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  ofstream outf("ints.out");

  remove_copy_if(a, a + SIZE,

                 ostream_iterator<int>(outf, "\n"), gt15);

} ///:~ Comment

 

An input stream iterator allows an algorithm to get its input sequence from an input stream. This is accomplished by having both the constructor and operator++( ) read the next element from the underlying stream and by overloading operator*( ) to yield the value previously read. Since algorithms require two pointers to delimit an input sequence, you can construct an istream_iterator in two ways, as you can see in the program that follows. Comment

//: C06:CopyIntsFromFile.cpp

// Uses an input stream iterator

#include <algorithm>

#include <fstream>

#include <iostream>

#include <iterator>

#include "../require.h"

using namespace std;

bool gt15(int x) {

  return 15 < x;

}

int main() {

  ifstream inf("someInts.dat");

  assure(inf, "someInts.dat");

  remove_copy_if(istream_iterator<int>(inf),

                 istream_iterator<int>(),

                 ostream_iterator<int>(cout, "\n"), gt15);

} ///:~ Comment

 

The first argument to replace_copy_if( ) in this program attaches an istream_iterator object to the input file stream containing ints. The second argument uses the default constructor of the istream_iterator class. This call constructs a special value of istream_iterator that indicates end-of-file, so that when the first iterator finally encounters the end of the physical file, it compares equal to the value istream_iterator<int>( ), allowing the algorithm to terminate correctly. Note that this example avoids using an explicit array altogether. Comment

Algorithm complexity

Using a software library is a matter of trust. You trust the implementers to not only provide correct functionality, but you also hope that the functions execute as efficiently as possible. It’s better to write your own loops than to use algorithms that degrade performance. Comment

To guarantee quality library implementations, the C++ standard not only specifies what an algorithm should do, but how fast it should do it and sometimes how much space it should use. Any algorithm that does not meet the performance requirements does not conform to the standard. The measure of an algorithm’s operational efficiency is called its complexity. Comment

When possible, the standard specifies the exact number of operation counts an algorithm should use. The count_if( ) algorithm, for example, returns the number of elements in a sequence satisfying a given predicate. The following call to count_if( ), if applied to a sequence of integers similar to the examples earlier in this chapter, yields the number of integer elements that are greater than 15: Comment

size_t n = count_if(a, a + SIZE, gt15);

 

Since count_if( ) must look at every element exactly once, it is specified to make a number of comparisons exactly equal to the number of elements in the sequence. Naturally, the copy( ) algorithm has the same specification. Comment

Other algorithms can be specified to take at most a certain number of operations. The find( ) algorithm searches through a sequence in order until it encounters an element equal to its third argument: Comment

int* p = find(a, a + SIZE, 20);

 

It stops as soon as the element is found and returns a pointer to that first occurrence. If it doesn’t find one, it returns a pointer one position past the end of the sequence (a+SIZE in this example). Therefore, find is said to make at most a number of comparisons equal to the number of elements in the sequence. Comment

Sometimes the number of operations an algorithm takes cannot be measured with such precision. In such cases, the standard specifies the algorithm’s asymptotic complexity, which is a measure of how the algorithm behaves with large sequences compared to well-known formulas. A good example is the sort( ) algorithm, which the standard says takes “approximately n log n comparisons on average” (n is the number of elements in the sequence).[80] Such complexity measures give a “feel” for the cost of an algorithm and at least give a meaningful basis for comparing algorithms. As you’ll see in the next chapter, the find( ) member function for the set container has logarithmic complexity, which means that the cost of searching for an element in a set will, for large sets, be proportional to the logarithm of the number of elements. This is much smaller than the number of elements for large n, so it is always better to search a set by using its find( ) member function rather than by using the generic find( ) algorithm. Comment

Function objects

As you study some of the examples earlier in this chapter, you will probably notice the limited utility of the function gt15( ). What if you want to use a number other than 15 as a comparison threshold? You may need a gt20( ) or gt25( ) or others as well. Having to write a separate function for each such comparison has two distasteful difficulties:

1.       You may have to write a lot of functions!

2.      You must know all required values when you write your application code.

The second limitation means that you can’t use runtime values[81] to govern your searches, which is downright unacceptable. Overcoming this difficulty requires a way to pass information to predicates at runtime. For example, you would need a greater-than function that you can initialize with an arbitrary comparison value. Unfortunately, you can’t pass that value as a function parameter, because unary predicates, such as our gt15( ), are applied to each value in a sequence individually and must therefore take only one parameter.

The way out of this dilemma is, as always, to create an abstraction. In this case, we need an abstraction that can act like a function as well as store state, without disturbing the number of function parameters it accepts when used. This abstraction is called a function object.[82]

A function object is an instance of a class that overloads operator( ), the function call operator. This operator allows an object to be used with function call syntax. As with any other object, you can initialize it via its constructors. Here is a function object that can be used in place of gt15( ):

//: C06:GreaterThanN.cpp

#include <iostream>

using namespace std;

class gt_n {

  int value;

public:

  gt_n(int val) : value(val) {}

  bool operator()(int n) {

    return n > value;

  }

};

int main() {

  gt_n f(4);

  cout << f(3) << endl;  // Prints 0 (for false)

  cout << f(5) << endl;  // Prints 1 (for true)

} ///:~

 

The fixed value to compare against (4) is passed when the function object f is created. The expression f(3) is then evaluated by the compiler as the following function call:

f.operator()(3);

 

which returns the value of the expression 3 > value, which is false when value is 4, as it is in this example.

Since such comparisons apply to types other than int, it would make sense to define gt_n( ) as a class template. It turns out you don’t have to do it yourself, though—the standard library has already done it for you. The following descriptions of function objects should not only make that topic clear, but also give you a better understanding of how the generic algorithms work. Comment

Classification of function objects

The standard C++ library classifies function objects based on the number of arguments that their operator( ) takes and the kind of value it returns. This classification is organized according to whether a function object’s operator( ) takes zero, one, or two arguments, as the following definitions illustrate. Comment

Generator: A type of function object that takes no arguments and returns a value of an arbitrary type. A random number generator is an example of a generator. The standard library provides one generator, the function rand( ) declared in <cstdlib>, and has some algorithms, such as generate_n( ), which apply generators to a sequence. Comment

Unary Function: A type of function object that takes a single argument of any type and returns a value that may be of a different type (which may be void). Comment

Binary Function: A type of function object that takes two arguments of any two (possibly distinct) types and returns a value of any type (including void). Comment

Unary Predicate: A Unary Function that returns a bool.

Binary Predicate: A Binary Function that returns a bool.

Strict Weak Ordering: A binary predicate that allows for a more general interpretation of “equality.” Some of the standard containers consider two elements equivalent if neither is less than the other (using operator<( )). This is important when comparing floating-point values, and objects of other types where operator==( ) is unreliable or unavailable. This notion also applies if you want to sort a sequence of data records (structs) on a subset of the struct’s fields, that comparison scheme is considered a strict weak ordering because two records with equal keys are not really “equal” as total objects, but they are equal as far as the comparison you’re using is concerned. The importance of this concept will become clearer in the next chapter. Comment

In addition, certain algorithms make assumptions about the operations available for the types of objects they process. We will use the following terms to indicate these assumptions: Comment

LessThanComparable: A class that has a less-than operator<. Comment

Assignable: A class that has a copy-assignment operator= for its own type. Comment

EqualityComparable: A class that has an equivalence operator== for its own type. Comment

We will use these terms later in this chapter to describe the generic algorithms in the standard library.

Automatic creation of function objects

The <functional> header defines a number of useful generic function objects. They are admittedly simple, but you can use them to compose more complicated function objects. Consequently, in many instances, you can construct complicated predicates without writing a single function yourself! You do so by using function object adapters to take the simple function objects and adapt them for use with other function objects in a chain of operations. Comment

To illustrate, let’s use only standard function objects to accomplish what gt15( ) did earlier. The standard function object, greater, is a binary function object that returns true if its first argument is greater than its second argument. We cannot apply this directly to a sequence of integers through an algorithm such as remove_copy_if( ), because remove_copy_if( ) expects a unary predicate. No problem. We can construct a unary predicate on the fly that uses greater to compare its first argument to a fixed value. We fix the value of the second parameter that greater will use to be 15 with the function object adapter bind2nd, like this: Comment

//: C06:CopyInts4.cpp

// Uses a standard function object and adapter

#include <algorithm>

#include <cstddef>

#include <functional>

#include <iostream>

#include <iterator>

using namespace std;

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  remove_copy_if(a, a + SIZE,

                 ostream_iterator<int>(cout, "\n"),

                 bind2nd(greater<int>(), 15));

} ///:~ Comment

 

This program accomplishes the same thing as CopyInts3.cpp, but without our having to write our own predicate function gt15( ). The function object adapter bind2nd( ) is a template function that creates a function object of type binder2nd, which simply stores the two arguments passed to bind2nd( ), the first of which must be a binary function or function object (that is, anything that can be called with two arguments). The operator( ) function in binder2nd, which is itself a unary function, calls the binary function it stored, passing it its incoming parameter and the fixed value it stored. Comment

To make the explanation concrete for this example, let’s call the instance of binder2nd created by bind2nd( ) by the name b. When b is created, it receives two parameters (greater<int>( ) and 15) and stores them. Let’s call the instance of greater<int> by the name g. For convenience, let’s also call the instance of the output stream iterator by the name o. Then the call to remove_copy_if( ) earlier becomes the following: Comment

remove_copy_if(a, a + SIZE, o, b(g, 15).operator());

 

As remove_copy_if( ) iterates through the sequence, it calls b on each element, to determine whether to ignore the element when copying to the destination. If we denote the current element by the name e, that call inside remove_copy_if( ) is equivalent to Comment

if (b(e))

 

but binder2nd’s function call operator just turns around and calls g(e,15), so the earlier call is the same as Comment

if (greater<int>(e, 15))

 

which is the comparison we were seeking. There is also a bind1st( ) adapter that creates a binder1st object, which fixes the first argument of the associated input binary function. Comment

As another example, let’s count the number of elements in the sequence not equal to 20. This time we’ll use the algorithm count_if( ), introduced earlier. There is a standard binary function object, equal_to, and also a function object adapter, not1( ), that take a unary function object as a parameter and invert its truth value. The following program will do the job. Comment

//: C06:CountNotEqual.cpp

// Count elements not equal to 20

#include <algorithm>

#include <cstddef>

#include <functional>

#include <iostream>

using namespace std;

int main() {

  int a[] = {10, 20, 30};

  const size_t SIZE = sizeof a / sizeof a[0];

  cout << count_if(a, a + SIZE,

                   not1(bind1st(equal_to<int>(), 20)));// 2

} ///:~ Comment

 

As remove_copy_if( ) did in the previous example, count_if( ) calls the predicate in its third argument (let’s call it n) for each element of its sequence and increments its internal counter each time true is returned. If, as before, we call the current element of the sequence by the name e, the statement Comment

if (n(e))

 

in the implementation of count_if is interpreted as

if (!bind1st(equal_to<int>, 20)(e))

 

which of course ends up as

if (!equal_to<int>(20, e))

 

because not1( ) returns the logical negation of the result of calling its unary function argument. The first argument to equal_to is 20 in this case because we used bind1st( ) instead of bind2nd( ). Since testing for equality is symmetric in its arguments, we could have used either bind1st( ) or bind2nd( ) in this example. Comment

The following table shows the templates that generate the standard function objects, along with the kinds of expressions to which they apply. Comment

Name

Type

Result produced

plus

BinaryFunction

arg1 + arg2

minus

BinaryFunction

arg1 - arg2

multiplies

BinaryFunction

arg1 * arg2

divides

BinaryFunction

arg1 / arg2

modulus

BinaryFunction

arg1 % arg2

negate

UnaryFunction

- arg1

equal_to

BinaryPredicate

arg1 == arg2

not_equal_to

BinaryPredicate

arg1 != arg2

greater

BinaryPredicate

arg1 > arg2

less

BinaryPredicate

arg1 < arg2

greater_equal

BinaryPredicate

arg1 >= arg2

less_equal

BinaryPredicate

arg1 <= arg2

logical_and

BinaryPredicate

arg1 && arg2

logical_or

BinaryPredicate

arg1 || arg2

logical_not

UnaryPredicate

!arg1

unary_negate

Unary Logical

!(UnaryPredicate(arg1))

binary_negate

Binary Logical

!(BinaryPredicate(arg1, arg2))

Comment

Adaptable function objects

Standard function adapters such as bind1st( ) and bind2nd( ) make some assumptions about the function objects they process. To illustrate, consider the following expression from the last line of the earlier CountNotEqual.cpp program: Comment

not1(bind1st(equal_to<int>(), 20))

 

The bind1st( ) adapter creates a unary function object of type binder1st, which simply stores an instance of equal_to<int> and the value 20. The binder1st::operator( ) function needs to know its argument type and its return type; otherwise, it will not have a valid declaration. The convention to solve this problem is to expect all function objects to provide nested type definitions for these types. For unary functions, the type names are argument_type and result_type; for binary function objects they are first_argument_type, second_argument_type, and result_type. Looking at the implementation of bind1st( ) and binder1st in the <functional> header reveals these expectations. First inspect bind1st( ), as it might appear in a typical library implementation: Comment

template <class Op, class T>

binder1st<Op>

bind1st(const Op& f, const T& val)

{

  typedef typename Op::first_argument_type Arg1_t;

  return binder1st<Op>(f, Arg1_t(val));

}

 

Note that the template parameter, Op, which represents the type of the binary function being adapted by bind1st( ), must have a nested type named first_argument_type. (Note also the use of typename to inform the compiler that it is a member type name, as explained in Chapter 5.) Now notice how binder1st uses the type names in Op in its declaration of its function call operator: Comment

// Inside the implementation for binder1st<Op>…

typename Op::result_type

operator()(const typename Op::second_argument_type& x)

  const;

 

Function objects whose classes provide these type names are called adaptable function objects. Comment

Since these names are expected of all standard function objects as well as of any function objects you create that you want to use with the function object adapters, the <functional> header provides two templates that define these types for you: unary_function and binary_function. You simply derive from these classes while filling in the argument types as template parameters. Suppose, for example, that we want to make the function object gt_n, defined earlier in this chapter, adaptable. All we need to do is the following: Comment

class gt_n : public unary_function<int, bool> {

  int value;

public:

  gt_n(int val) : value(val) {}

  bool operator()(int n) {

    return n > value;

  }

}; Comment

 

All unary_function does is to provide the appropriate type definitions, which it infers from its template parameters as you can see in its definition: Comment

template <class Arg, class Result>

struct unary_function {

  typedef Arg argument_type;

  typedef Result result_type;

};

 

These types become accessible through gt_n because it derives publicly from unary_function. The binary_function template behaves in a similar manner. Comment

More function object examples

The following FunctionObjects.cpp example provides simple tests for most of the built-in basic function object templates. This way, you can see how to use each template, along with their resulting behavior. This example uses one of the following generators for convenience: Comment

//: C06:Generators.h

// Different ways to fill sequences

#ifndef GENERATORS_H

#define GENERATORS_H

#include <set>

#include <cstdlib>

#include <cstring>

#include <ctime>

// Microsoft namespace work-around

#ifndef _MSC_VER

using std::rand;

using std::srand;

using std::time;

#endif

// A generator that can skip over numbers:

class SkipGen {

  int i;

  int skp;

public:

  SkipGen(int start = 0, int skip = 1)

    : i(start), skp(skip) {}

  int operator()() {

    int r = i;

    i += skp;

    return r;

  }

};

 

// Generate unique random numbers from 0 to mod:

class URandGen {

  std::set<int> used;

  int limit;

public:

  URandGen(int lim) : limit(lim) {

    srand(time(0));

  }

  int operator()() {

    while(true) {

      int i = int(rand()) % limit;

      if(used.find(i) == used.end()) {

        used.insert(i);

        return i;

      }

    }

  }

};

 

// Produces random characters:

class CharGen {

  static const char* source;

  static const int len;

public:

  CharGen() { srand(time(0)); }

  char operator()() {

    return source[rand() % len];

  }

};

 

// Statics created here for convenience, but

// will cause problems if multiply included:

const char* CharGen::source = "ABCDEFGHIJK"

  "LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

const int CharGen::len = strlen(source);

#endif // GENERATORS_H ///:~

 

We’ll be using these generating functions in various examples throughout this chapter. The SkipGen function object returns the next number of an arithmetic sequence whose common difference is held in its skp data member. A URandGen object generates a unique random number in a specified range. (It uses a set container, which we’ll discuss in the next chapter.) A CharGen object returns a random alphabetic character. Here is the sample program we promised, which uses URandGen. Comment

//: C06:FunctionObjects.cpp

//{-bor}

// Illustrates selected predefined function object

// templates from the standard C++ library

#include <algorithm>

#include <functional>

#include <iostream>

#include <iterator>

#include <vector>

#include "Generators.h"

using namespace std;

 

template<class Iter>

void print(Iter b, Iter e, char* msg = "") {

  if(msg != 0 && *msg != 0)

    cout << msg << ":" << endl;

  typedef typename Iter::value_type T;

  copy(b, e, ostream_iterator<T>(cout, " "));

  cout << endl;

}

 

template<typename Contain, typename UnaryFunc>

void testUnary(Contain& source, Contain& dest,

  UnaryFunc f) {

  transform(source.begin(), source.end(),

    dest.begin(), f);

}

 

template<typename Contain1, typename Contain2,

  typename BinaryFunc>

void testBinary(Contain1& src1, Contain1& src2,

  Contain2& dest, BinaryFunc f) {

  transform(src1.begin(), src1.end(),

    src2.begin(), dest.begin(), f);

}

 

// Executes the expression, then stringizes the

// expression into the print statement:

#define T(EXPR) EXPR; print(r.begin(), r.end(), \

  "After " #EXPR);

// For Boolean tests:

#define B(EXPR) EXPR; print(br.begin(), br.end(), \

  "After " #EXPR);

 

// Boolean random generator:

struct BRand {

  BRand() { srand(time(0)); }

  bool operator()() {

    return rand() > RAND_MAX / 2;

  }

};

 

int main() {

  const int sz = 10;

  const int max = 50;

  vector<int> x(sz), y(sz), r(sz);

  // An integer random number generator:

  URandGen urg(max);

  generate_n(x.begin(), sz, urg);

  generate_n(y.begin(), sz, urg);

  // Add one to each to guarantee nonzero divide:

  transform(y.begin(), y.end(), y.begin(),

    bind2nd(plus<int>(), 1));

  // Guarantee one pair of elements is ==:

  x[0] = y[0];

  print(x.begin(), x.end(), "x");

  print(y.begin(), y.end(), "y");

  // Operate on each element pair of x & y,

  // putting the result into r:

  T(testBinary(x, y, r, plus<int>()));

  T(testBinary(x, y, r, minus<int>()));

  T(testBinary(x, y, r, multiplies<int>()));

  T(testBinary(x, y, r, divides<int>()));

  T(testBinary(x, y, r, modulus<int>()));

  T(testUnary(x, r, negate<int>()));

  vector<bool> br(sz); // For Boolean results

  B(testBinary(x, y, br, equal_to<int>()));

  B(testBinary(x, y, br, not_equal_to<int>()));

  B(testBinary(x, y, br, greater<int>()));

  B(testBinary(x, y, br, less<int>()));

  B(testBinary(x, y, br, greater_equal<int>()));

  B(testBinary(x, y, br, less_equal<int>()));

  B(testBinary(x, y, br,

    not2(greater_equal<int>())));

  B(testBinary(x,y,br,not2(less_equal<int>())));

  vector<bool> b1(sz), b2(sz);

  generate_n(b1.begin(), sz, BRand());

  generate_n(b2.begin(), sz, BRand());

  print(b1.begin(), b1.end(), "b1");

  print(b2.begin(), b2.end(), "b2");

  B(testBinary(b1, b2, br, logical_and<int>()));

  B(testBinary(b1, b2, br, logical_or<int>()));

  B(testUnary(b1, br, logical_not<int>()));

  B(testUnary(b1, br, not1(logical_not<int>())));

} ///:~

 

To keep this example short, we used a few handy tricks. The print( ) template is designed to print any sequence, along with an optional message. Since print( ) uses the copy( ) algorithm to send objects to cout via an ostream_iterator, the ostream_iterator must know the type of object it is printing, which we infer from the value_type member of the iterator passed.[83] As you can see in main( ), however, the compiler can deduce the type of T when you hand it a vector<T>, so you don’t have to specify that template argument explicitly; you just say print(x) to print the vector<T> x. Comment

The next two template functions automate the process of testing the various function object templates. There are two since the function objects are either unary or binary. The testUnary( ) function takes a source vector, a destination vector, and a unary function object to apply to the source vector to produce the destination vector. In testBinary( ), two source vectors are fed to a binary function to produce the destination vector. In both cases, the template functions simply turn around and call the transform( ) algorithm, which applies the unary function/function object found in its fourth parameter to each sequence element, writing the result to the sequence indicated by its third parameter, which in this case is the same as the input sequence. Comment

For each test, you want to see a string describing the test, followed by the results of the test. To automate this, the preprocessor comes in handy; the T( ) and B( ) macros each take the expression you want to execute. After evaluating the expression, they pass the appropriate range to print( ). To produce the message the expression is “string-ized” using the preprocessor. That way you see the code of the expression that is executed followed by the result vector. Comment

The last little tool, BRand, is a generator object that creates random bool values. To do this, it gets a random number from rand( ) and tests to see if it’s greater than (RAND_MAX+1)/2. If the random numbers are evenly distributed, this should happen half the time. Comment

In main( ), three vectors of int are created: x and y for source values, and r for results. To initialize x and y with random values no greater than 50, a generator of type URandGen from Generators.h is used. The standard generate_n( ) algorithm populates the sequence specified in its first argument by invoking its third argument (which must be a generator) a given number of times (specified in its second argument). Since there is one operation in which elements of x are divided by elements of y, we must ensure that there are no zero values of y. This is accomplished by once again using the transform( ) algorithm, taking the source values from y and putting the results back into y. The function object for this is created with the expression: Comment

bind2nd(plus<int>(), 1)

 

This expression uses the plus function object to add 1 to its first argument. As we did earlier in this chapter, we use a binder adapter to make this a unary function so it can applied to the sequence by a single call to transform( ). Comment

Another test in the program compares the elements in the two vectors for equality, so it is interesting to guarantee that at least one pair of elements is equivalent; in this case element zero is chosen. Comment

Once the two vectors is printed, T( ) tests each of the function objects that produces a numeric value, and then B( ) tests each function object that produces a Boolean result. The result is placed into a vector<bool>, and when this vector is printed, it produces a ‘1’ for a true value and a ‘0’ for a false value. Here is the output from an execution of FunctionObjects.cpp: Comment

x:

4 8 18 36 22 6 29 19 25 47

y:

4 14 23 9 11 32 13 15 44 30

After testBinary(x, y, r, plus<int>()):

8 22 41 45 33 38 42 34 69 77

After testBinary(x, y, r, minus<int>()):

0 -6 -5 27 11 -26 16 4 -19 17

After testBinary(x, y, r, multiplies<int>()):

16 112 414 324 242 192 377 285 1100 1410

After testBinary(x, y, r, divides<int>()):

1 0 0 4 2 0 2 1 0 1

After testBinary(x, y, r, limit<int>()):

0 8 18 0 0 6 3 4 25 17

After testUnary(x, r, negate<int>()):

-4 -8 -18 -36 -22 -6 -29 -19 -25 -47

After testBinary(x, y, br, equal_to<int>()):

1 0 0 0 0 0 0 0 0 0

After testBinary(x, y, br, not_equal_to<int>()):

0 1 1 1 1 1 1 1 1 1

After testBinary(x, y, br, greater<int>()):

0 0 0 1 1 0 1 1 0 1

After testBinary(x, y, br, less<int>()):

0 1 1 0 0 1 0 0 1 0

After testBinary(x, y, br, greater_equal<int>()):

1 0 0 1 1 0 1 1 0 1

After testBinary(x, y, br, less_equal<int>()):

1 1 1 0 0 1 0 0 1 0

After testBinary(x, y, br, not2(greater_equal<int>())):

0 1 1 0 0 1 0 0 1 0

After testBinary(x,y,br,not2(less_equal<int>())):

0 0 0 1 1 0 1 1 0 1

b1:

0 1 1 0 0 0 1 0 1 1

b2:

0 1 1 0 0 0 1 0 1 1

After testBinary(b1, b2, br, logical_and<int>()):

0 1 1 0 0 0 1 0 1 1

After testBinary(b1, b2, br, logical_or<int>()):

0 1 1 0 0 0 1 0 1 1

After testUnary(b1, br, logical_not<int>()):

1 0 0 1 1 1 0 1 0 0

After testUnary(b1, br, not1(logical_not<int>())):

0 1 1 0 0 0 1 0 1 1

 

A binder doesn’t have to produce a unary predicate; it can also create any unary function (that is, a function that returns something other than bool). For example, suppose you’d like to multiply every element in a vector by 10. Using a binder with the transform( ) algorithm does the trick: Comment

//: C06:FBinder.cpp

// Binders aren't limited to producing predicates

#include <algorithm>

#include <functional>

#include <iostream>

#include <iterator>

#include <vector>

#include "Generators.h"

using namespace std;

 

int main() {

  ostream_iterator<int> out(cout," ");

  vector<int> v(15);

  generate(v.begin(), v.end(), URandGen(20));

  copy(v.begin(), v.end(), out);

  transform(v.begin(), v.end(), v.begin(),

            bind2nd(multiplies<int>(), 10));

  copy(v.begin(), v.end(), out);

} ///:~

 

Since the third argument to transform( ) is the same as the first, the resulting elements are copied back into the source vector. The function object created by bind2nd( ) in this case produces an int result. Comment

The “bound” argument to a binder cannot be a function object, but it does not have to be a compile-time constant. For example: Comment

//: C06:BinderValue.cpp

// The bound argument can vary

#include <algorithm>

#include <functional>

#include <iostream>

#include <iterator>

using namespace std;

 

int boundedRand() { return rand() % 100; }

 

int main() {

  const int sz = 20;

  int a[sz], b[sz] = {0};

  generate(a, a + sz, boundedRand);

  int val = boundedRand();

  int* end = remove_copy_if(a, a + sz, b,

                            bind2nd(greater<int>(), val));

  // Sort for easier viewing:

  sort(a, a + sz);

  sort(b, end);

  ostream_iterator<int> out(cout, " ");

  cout << "Original Sequence:\n";

  copy(a, a + sz, out); cout << endl;

  cout << "Values less <= " << val << endl;

  copy(b, end, out); cout << endl;

} ///:~

 

 

Here, an array is filled with 20 random numbers between 0 and 100, and the user provides a value on the command line. In the remove_copy_if( ) call, you can see that the bound argument to bind2nd( ) is random number in the same range as the sequence. The output of a sample execution follows. Comment

Original Sequence:

4 12 15 17 19 21 26 30 47 48 56 58 60 63 71 79 82 90 92 95

Values less <= 41

4 12 15 17 19 21 26 30

 

Function pointer adapters

Wherever a function-like entity is expected by an algorithm, you can supply either a pointer to an ordinary function or a function object. When the algorithm issues a call, if it is through a function pointer, than the native function-call mechanism is used. If it through a function object, then that objects operator( ) member executes. You saw earlier, for example, that we passed a raw function, gt15( ), as a predicate to remove_copy_if( ) in the program CopyInts2.cpp. We also passed pointers to functions returning random numbers to generate( ) and generate_n( ). Comment

You cannot, however, use raw functions with function object adapters, such as bind2nd( ), because they assume the existence of type definitions for the argument and result types. Instead of manually converting your native functions into function objects yourself, the standard library provides a family of adapters to do the work for you. The ptr_fun( ) adapters take a pointer to a function and turn it into a function object. They are not designed for a function that takes no arguments—they must only be used with unary functions or binary functions. Comment

The following program uses ptr_fun( ) to wrap a unary function.

//: C06:PtrFun1.cpp

// Using ptr_fun() with a unary function

#include <algorithm>

#include <cmath>

#include <functional>

#include <iostream>

#include <iterator>

#include <vector>

using namespace std;

 

int d[] = {123, 94, 10, 314, 315};

const int dsz = sizeof d / sizeof *d;

bool isEven(int x) {

  return x % 2 == 0;

}

int main() {

  vector<bool> vb;

  transform(d, d + dsz, back_inserter(vb),

    not1(ptr_fun(isEven)));

  copy(vb.begin(), vb.end(),

    ostream_iterator<bool>(cout, " "));

  cout << endl;

  // Output: 1 0 0 0 1

} ///:~

 

We can’t simply pass isEven to not1, because not1 needs to know the actual argument type and return type its argument uses. The ptr_fun( ) adapter deduces those types through template argument deduction. The definition of the unary version of ptr_fun( ) looks something like this: Comment

template <class Arg, class Result>

pointer_to_unary_function<Arg, Result>

ptr_fun(Result (*fptr)(Arg))

{

  return pointer_to_unary_function<Arg, Result>(fptr);

}

 

As you can see, this version of ptr_fun( ) deduces the argument and result types from fptr and uses them to initialize a pointer_to_unary_function object that stores fptr. The function call operator for pointer_to_unary_function just calls fptr, as you can see by the last line of its code: Comment

template <class Arg, class Result>

class pointer_to_unary_function

: public unary_function<Arg, Result> {

  Result (*fptr)(Arg); // stores the f-ptr

public:

  pointer_to_unary_function(Result (*x)(Arg))

    : fptr(x) {}

  Result operator()(Arg x) const {return fptr(x);}

};

 

Since pointer_to_unary_function derives from unary_function, the appropriate type definitions come along for the ride and are available to not1. Comment

There is also a binary version of ptr_fun( ), which returns a pointer_to_binary_function object (which derives from binary_function, of course) that behaves analogously to the unary case. The following program uses the binary version of ptr_fun( ) to raise numbers in a sequence to a power. It also reveals a “gotcha” when passing overloaded functions to ptr_fun( ). Comment

//: C06:PtrFun2.cpp

// Using ptr_fun() for a binary function

#include <algorithm>

#include <cmath>

#include <functional>

#include <iostream>

#include <iterator>

#include <vector>

using namespace std;

 

double d[] = { 01.23, 91.370, 56.661,

  023.230, 19.959, 1.0, 3.14159 };

const int dsz = sizeof d / sizeof *d;

 

int main() {

  vector<double> vd;

  transform(d, d + dsz, back_inserter(vd),

    bind2nd(ptr_fun<double, double, double>(pow), 2.0));

  copy(vd.begin(), vd.end(),

    ostream_iterator<double>(cout, " "));

  cout << endl;

} ///:~

 

The pow( ) function is overloaded in the standard C++ header <cmath> for each of the floating-point data types, as follows:

float pow(float, int);  // efficient int power versions…

double pow(double, int);

long double pow(long double, int);

float pow(float, float);

double pow(double, double);

long double pow(long double, long double);

 

Since there are multiple versions of pow( ), the compiler has no way of knowing which to choose. In this case, we have to help the compiler by using explicit function template specialization, as explained in the previous chapter. Comment

An even trickier problem is that of converting a member function into a function object suitable for using with the generic algorithms. As a simple example, suppose we have the classical “shape” problem and want to apply the draw( ) member function to each pointer in a container of Shape: Comment

//: C06:MemFun1.cpp

// Applying pointers to member functions

#include <algorithm>

#include <functional>

#include <iostream>

#include <vector>

#include "../purge.h"

using namespace std;

 

class Shape {

public:

  virtual void draw() = 0;

  virtual ~Shape() {}

};

 

class Circle : public Shape {

public:

  virtual void draw() {

    cout << "Circle::Draw()" << endl;

  }

  ~Circle() {

    cout << "Circle::~Circle()" << endl;

  }

};

 

class Square : public Shape {

public:

  virtual void draw() {

    cout << "Square::Draw()" << endl;

  }

  ~Square() {

    cout << "Square::~Square()" << endl;

  }

};

 

int main() {

  vector<Shape*> vs;

  vs.push_back(new Circle);

  vs.push_back(new Square);

  for_each(vs.begin(), vs.end(),

    mem_fun(&Shape::draw));

  purge(vs);

} ///:~

 

The for_each( ) algorithm does just what it sounds like it: it passes each element in a sequence to the function object denoted by its third argument. In this case, we want the function object to wrap one of the member functions of the class itself, and so the function object’s “argument” becomes the pointer to the object that the member function is called for. To produce such a function object, the mem_fun( ) template takes a pointer to a member as its argument. The purge( ) function is just a little something we wrote that calls delete on every element of sequence. Comment

The mem_fun( ) functions are for producing function objects that are called using a pointer to the object that the member function is called for, while mem_fun_ref( ) is used for calling the member function directly for an object. One set of overloads of both mem_fun( ) and mem_fun_ref( ) is for member functions that take zero arguments and one argument, and this is multiplied by two to handle const vs. non-const member functions. However, templates and overloading takes care of sorting all that out; all you need to remember is when to use mem_fun( ) vs. mem_fun_ref( ). Comment

Suppose you have a container of objects (not pointers), and you want to call a member function that takes an argument. The argument you pass should come from a second container of objects. To accomplish this, use the second overloaded form of the transform( ) algorithm: Comment

//: C06:MemFun2.cpp

// Calling member functions through an object reference

#include <algorithm>

#include <functional>

#include <iostream>

#include <iterator>

#include <vector>

using namespace std;

 

class Angle {

  int degrees;

public:

  Angle(int deg) : degrees(deg) {}

  int mul(int times) {

    return degrees *= times;

  }

};

 

int main() {

  vector<Angle> va;

  for(int i = 0; i < 50; i += 10)

    va.push_back(Angle(i));

  int x[] = { 1, 2, 3, 4, 5 };

  transform(va.begin(), va.end(), x,

    ostream_iterator<int>(cout, " "),

    mem_fun_ref(&Angle::mul));

  cout << endl;

  // Output: 0 20 60 120 200

} ///:~

 

Because the container is holding objects, mem_fun_ref( ) must be used with the pointer-to-member function. This version of transform( ) takes the start and end point of the first range (where the objects live); the starting point of the second range, which holds the arguments to the member function; the destination iterator, which in this case is standard output; and the function object to call for each object. This function object is created with mem_fun_ref( ) and the desired pointer to member. Notice that the transform( ) and for_each( ) template functions are incomplete; transform( ) requires that the function it calls return a value, and there is no for_each( ) that passes two arguments to the function it calls. Thus, you cannot call a member function that returns void and takes an argument using transform( ) or for_each( ). Comment

Most any member function works with mem_fun_ref( ). You can also use standard library member functions, if your compiler doesn’t add any default arguments beyond the normal arguments specified in the standard.[84] For example, suppose you’d like to read a file and search for blank lines; your compiler may allow you to use the string::empty( ) member function like this: Comment

//: C06:FindBlanks.cpp

// Demonstrates mem_fun_ref() with string::empty()

#include <algorithm>

#include <cassert>

#include <cstddef>

#include <fstream>

#include <functional>

#include <string>

#include <vector>

#include "../require.h"

using namespace std;

 

typedef vector<string>::iterator LSI;

 

int main(int argc, char* argv[]) {

  char* fname = "FindBlanks.cpp";

  if(argc > 1) fname = argv[1];

  ifstream in(fname);

  assure(in, fname);

  vector<string> vs;

  string s;

  while(getline(in, s))

    vs.push_back(s);

  vector<string> cpy = vs; // For testing

  LSI lsi = find_if(vs.begin(), vs.end(),

     mem_fun_ref(&string::empty));

  while(lsi != vs.end()) {

    *lsi = "A BLANK LINE";

    lsi = find_if(vs.begin(), vs.end(),

      mem_fun_ref(&string::empty));

  }

  for(size_t i = 0; i < cpy.size(); i++)

    if(cpy[i].size() == 0)

      assert(vs[i] == "A BLANK LINE");

    else

      assert(vs[i] != "A BLANK LINE");

} ///:~

 

This example uses find_if( ) to locate the first blank line in the given range using mem_fun_ref( ) with string::empty( ). After the file is opened and read into the vector, the process is repeated to find every blank line in the file. Each time a blank line is found, it is replaced with the characters “A BLANK LINE.” All you have to do to accomplish this is dereference the iterator to select the current string. Comment

Writing your own function object adapters

Consider how to write a program that converts strings representing floating-point numbers to their actual numeric values. To get things started, here’s a generator that creates the strings: Comment

//: C06:NumStringGen.h

// A random number generator that produces

// strings representing floating-point numbers

#ifndef NUMSTRINGGEN_H

#define NUMSTRINGGEN_H

#include <string>

#include <cstdlib>

#include <ctime>

 

class NumStringGen {

  const int sz; // Number of digits to make

public:

  NumStringGen(int ssz = 5) : sz(ssz) {

    std::srand(std::time(0));

  }

  std::string operator()() {

    static char n[] = "0123456789";

    const int nsz = sizeof n / sizeof *n;

    std::string r(sz, ' ');

    for(int i = 0; i < sz; i++)

      if(i == sz/2)

        r[i] = '.'; // Insert a decimal point

      else

        r[i] = n[std::rand() % nsz];

    return r;

  }

};

#endif // NUMSTRINGGEN_H ///:~

 

You tell it how big the strings should be when you create the NumStringGen object. The random number generator selects digits, and a decimal point is placed in the middle. Comment

The following program uses NumStringGen to fill a vector<string>. However, to use the standard C library function atof( ) to convert the strings to floating-point numbers, the string objects must first be turned into char pointers, since there is no automatic type conversion from string to char*. The transform( ) algorithm can be used with mem_fun_ref( ) and string::c_str( ) to convert all the strings to char*, and then these can be transformed using atof. Comment

//: C06:MemFun3.cpp

// Using mem_fun()

#include <algorithm>

#include <functional>

#include <iostream>

#include <iterator>

#include <string>

#include <vector>

#include "NumStringGen.h"

using namespace std;

 

int main() {

  const int sz = 9;

  vector<string> vs(sz);

  // Fill it with random number strings:

  generate(vs.begin(), vs.end(), NumStringGen());

  copy(vs.begin(), vs.end(),

    ostream_iterator<string>(cout, "\t"));

  cout << endl;

  const char* vcp[sz];

  transform(vs.begin(), vs.end(), vcp,

    mem_fun_ref(&string::c_str));

  vector<double> vd;

  transform(vcp, vcp + sz, back_inserter(vd),

    std::atof);

  copy(vd.begin(), vd.end(),

    ostream_iterator<double>(cout, "\t"));

  cout << endl;

} ///:~

 

This program does two transformations: one to convert strings to C-style strings (arrays of characters), and one to convert the C-style strings to numbers via atof( ). It would be nice to combine these two operations into one. After all, we can compose functions in mathematics, so why not C++? Comment

The obvious approach takes the two functions as arguments and applies them in the proper order:

//: C06:ComposeTry.cpp

// A first attempt at implementing function composition

#include <cassert>

#include <cstdlib>

#include <functional>

#include <iostream>

#include <string>

using namespace std;

 

template<typename R, typename E, typename F1, typename F2>

class unary_composer {

   F1 f1;

   F2 f2;

public:

   unary_composer(F1 fone, F2 ftwo) : f1(fone), f2(ftwo) {}

   R operator()(E x) {

      return f1(f2(x));

   }

};

template<typename R, typename E, typename F1, typename F2>

unary_composer<R, E, F1, F2> compose(F1 f1, F2 f2) {

   return unary_composer<R, E, F1, F2>(f1, f2);

}

int main()

{

  double x =

    compose<double, const string&>(atof,

      mem_fun_ref(&string::c_str))("12.34");

  assert(x == 12.34);

} ///:~

 

The unary_composer object in this example stores the function pointers atof and string::c_str such that the latter function is applied first when its operator( ) is called. The compose( ) function adapter is a convenience, so we don’t have to supply all four template arguments explicitly—F1 and F2 are deduced from the call. Comment

It would be much better, of course, if we didn’t have to supply any template arguments at all. This is achieved by adhering to the convention for type definitions for adaptable function objects; in other words, we will assume that the functions to be composed are adaptable. This requires that we use ptr_fun( ) for atof( ). For maximum flexibility, we also make unary_composer adaptable in case it gets passed to a function adapter. The following program does so and easily solves the original problem. Comment

//: C06:ComposeFinal.cpp

// An adaptable composer

#include <algorithm>

#include <cassert>

#include <cstdlib>

#include <functional>

#include <iostream>

#include <iterator>

#include <string>

#include <vector>

#include "NumStringGen.h"

using namespace std;

 

template<typename F1, typename F2>

class unary_composer

  : public unary_function<typename F2::argument_type,

                          typename F1::result_type> {

public:

   unary_composer(F1 f1, F2 f2) : f1(f1), f2(f2) {}

   typename F1::result_type

     operator()(typename F2::argument_type x) {

      return f1(f2(x));

   }

private:

   F1 f1;

   F2 f2;

};

template<typename F1, typename F2>

unary_composer<F1, F2> compose(F1 f1, F2 f2) {

   return unary_composer<F1, F2>(f1, f2);

}

int main() {

  const int sz = 9;

  vector<string> vs(sz);

  // Fill it with random number strings:

  generate(vs.begin(), vs.end(), NumStringGen());

  copy(vs.begin(), vs.end(),

    ostream_iterator<string>(cout, "\t"));

  cout << endl;

  vector<double> vd;

  transform(vs.begin(), vs.end(), back_inserter(vd),

    compose(ptr_fun(atof), mem_fun_ref(&string::c_str)));

  copy(vd.begin(), vd.end(),

    ostream_iterator<double>(cout, "\t"));

  cout << endl;

} ///:~

 

Once again we must use typename to let the compiler know that the member we are referring to is a nested type. Comment

Some implementations[85] support composition of function objects as an extension, and the C++ standards committee is likely to add these capabilities to the next version of standard C++. Comment

A catalog of STL algorithms

This section provides a quick reference for when you’re searching for the appropriate algorithm. We leave the full exploration of all the STL algorithms to other references (see the end of this chapter, and Appendix A), along with the more intimate details of performance, and so on. Our goal here is for you to become rapidly comfortable and facile with the algorithms, and we’ll assume you will look into the more specialized references if you need more depth of detail. Comment

Although you will often see the algorithms described using their full template declaration syntax, we’re not doing that here because you already know they are templates, and it’s quite easy to see what the template arguments are from the function declarations. The type names for the arguments provide descriptions for the types of iterators required. We think you’ll find this form is easier to read, and you can quickly find the full declaration in the template header file if for some reason you feel the need. Comment

The reason for all the fuss about iterators is to accommodate any type of container that meets the requirements in the standard library. So far we have illustrated the generic algorithms with only arrays and vectors as sequences, but in the next chapter you’ll see a broad range of data structures that support less robust iteration. For this reason, the algorithms are categorized in part by the types of iteration facilities they require. Comment

The names of the iterator classes describe the iterator type to which they must conform. There are no interface base classes to enforce these iteration operations—they are just expected to be there. If they are not, your compiler will complain. The various flavors of iterators are described briefly as follows. Comment

InputIterator. An input iterator only allows reading elements of its sequence in a single, forward pass using operator++ and operator*. Input iterators can also be tested with operator== and operator!=. That’s all. Comment

OutputIterator. An output iterator only allows writing elements to a sequence in a single, forward pass using operator++ and operator*. OutputIterators cannot be tested with operator== and operator!=, however, because you assume that you can just keep sending elements to the destination and that you don’t have to see if the destination’s end marker was reached. That is, the container that an OutputIterator references can take an infinite number of objects, so no end-checking is necessary. This requirement is important so that an OutputIterator can be used with ostreams (via ostream_iterator), but you’ll also commonly use the “insert” iterators such as are the type of iterator returned by back_inserter( )). Comment

There is no way to determine whether multiple InputIterators or OutputIterators point within the same range, so there is no way to us multiple such iterators in concert. Just think in terms of iterators to support istreams and ostreams, and InputIterator and OutputIterator will make perfect sense. Also note that algorithms that use InputIterators or OutputIterators put the weakest restrictions on the types of iterators they will accept, which means that you can use any “more sophisticated” type of iterator when you see InputIterator or OutputIterator used as STL algorithm template arguments. Comment

ForwardIterator. Because you can only read from an InputIterator and write to an OutputIterator, you can’t use either of them to simultaneously read and modify a range, and you can’t dereference such an iterator more than once. With a ForwardIterator these restrictions are relaxed; you can still only move forward using operator++, but you can both write and read, and you can compare such iterators in the same range for equality. Since forward iterators can both read and write, they can of course be used wherever an input or output iterator is called for. Comment

BidirectionalIterator. Effectively, this is a ForwardIterator that can also go backward. That is, a BidirectionalIterator supports all the operations that a ForwardIterator does, but in addition it has an operator--. Comment

RandomAccessIterator. This type of iterator supports all the operations that a regular pointer does: you can add and subtract integral values to move it forward and backward by jumps (rather than just one element at a time), you can subscript it with operator[ ], you can subtract one iterator from another, and you can compare iterators to see which is greater using operator<, operator>, and so on. If you’re implementing a sorting routine or something similar, random access iterators are necessary to be able to create an efficient algorithm. Comment

The names used for the template parameter types in the algorithm descriptions later in this chapter consist of the listed iterator types (sometimes with a ‘1’ or ‘2’ appended to distinguish different template arguments) and can also include other arguments, often function objects. Comment

When describing the group of elements that an operation is performed on, mathematical “range” notation is often used. In this, the square bracket means “includes the end point,” and the parenthesis means “does not include the end point.” When using iterators, a range is determined by the iterator pointing to the initial element and by the “past-the-end” iterator, pointing past the last element. Since the past-the-end element is never used, the range determined by a pair of iterators can thus be expressed as [first, last), in which first is the iterator pointing to the initial element, and last is the past-the-end iterator. Comment

Most books and discussions of the STL algorithms arrange them according to side-effects: non-mutating algorithms don’t change the elements in the range, mutating algorithms do change the elements, and so on. These descriptions are based primarily on the underlying behavior or implementation of the algorithm—that is, on the designer’s perspective. In practice, we don’t find this a useful categorization, so instead we’ll organize them according to the problem you want to solve: are you searching for an element or set of elements, performing an operation on each element, counting elements, replacing elements, and so on. This should help you find the algorithm you want more easily. Comment

Note that all the algorithms are in the namespace std. If you do not see a different header such as <utility> or <numeric> above the function declarations, it appears in <algorithm>.

Support tools for example creation

It’s useful to create some basic tools with which to test the algorithms. In the examples that follow we’ll use the generators mentioned earlier in Generators.h, as well as what appears below. Comment

Displaying a range is something that will be done constantly, so here is a templatized function that let you print any sequence, regardless of the type in that sequence: Comment

//: C06:PrintSequence.h

// Prints the contents of any sequence

#ifndef PRINTSEQUENCE_H

#define PRINTSEQUENCE_H

#include <iostream>

#include <iterator>

 

template<typename InputIter>

void print(InputIter first, InputIter last,

  char* nm = "", char* sep = "\n",

  std::ostream& os = std::cout) {

  if(nm != 0 && *nm != '\0')

    os << nm << ": " << sep;

  while(first != last)

    os << *first++ << sep;

  os << std::endl;

}

#endif // PRINTSEQUENCE_H ///:~

 

The default prints to cout with newlines as separators, but you can change that. You can also provide a message to print at the head of the output. Comment

Finally, a number of the STL algorithms that move elements of a sequence around distinguish between “stable” and “unstable” reordering of a sequence. This refers to preserving the original relative order of those elements that are equivalent as far as the comparison function is concerned. For example, consider a sequence { c(1), b(1), c(2), a(1), b(2), a(2) }. These elements are tested for equivalence based on their letters, but their numbers indicate how they first appeared in the sequence. If you sort (for example) this sequence using an unstable sort, there’s no guarantee of any particular order among equivalent letters, so you could end up with { a(2), a(1), b(1), b(2), c(2), c(1) }. However, if you use a stable sort, you will get { a(1), a(2), b(1), b(2), c(1), c(2) }. The STL sort( ) algorithm uses a variation of quicksort and is therefore unstable, but a stable_sort( ) is also provided.[86] Comment

To demonstrate the stability versus instability of algorithms that reorder a sequence, we need some way to keep track of how the elements originally appeared. The following is a kind of string object that keeps track of the order in which that particular object originally appeared, using a static map that maps NStrings to Counters. Each NString then contains an occurrence field that indicates the order in which this NString was discovered. Comment

//: C06:NString.h

// A "numbered string" that indicates which

// occurrence this is of a particular word

#ifndef NSTRING_H

#define NSTRING_H

#include <algorithm>

#include <iostream>

#include <string>

#include <utility>

#include <vector>

typedef std::pair<std::string, int> psi;

 

// Only compare on the first element

bool operator==(const psi& l, const psi& r) {

  return l.first == r.first;

}