This is autobook.info, produced by makeinfo version 4.7 from
autobook.texi.

INFO-DIR-SECTION GNU programming tools
START-INFO-DIR-ENTRY
* Autoconf, Automake, Libtool: (autobook).	Using the GNU autotools.
END-INFO-DIR-ENTRY

   This file documents GNU Autoconf, Automake and Libtool.

   Copyright (C) 1999, 2000 Gary V. Vaughan, Ben Elliston, Tom Tromey,
Ian Lance Taylor

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.


File: autobook.info,  Node: Removing --foreign,  Next: Installing Header Files,  Prev: Using Libtool Libraries,  Up: A Large GNU Autotools Project

12.2 Removing `--foreign'
=========================

Now that I have the bulk of the project in place, I want it to adhere to
the GNU standard layout.  By removing the `--foreign' option from the
call to `automake' in the `bootstrap' file, `automake' is able to warn
me about missing, or in some cases(1), malformed files, as follows:

     $ ./bootstrap
     + aclocal -I config
     + libtoolize --force --copy
     Putting files in AC_CONFIG_AUX_DIR, config.
     + autoheader
     + automake --add-missing --copy
     automake: Makefile.am: required file ./NEWS not found
     automake: Makefile.am: required file ./README not found
     automake: Makefile.am: required file ./AUTHORS not found
     automake: Makefile.am: required file ./THANKS not found
     + autoconf

   The GNU standards book(2) describes the contents of these files in
more detail.  Alternatively, take a look at a few other GNU packages
from `ftp://ftp.gnu.org/gnu'.

   ---------- Footnotes ----------

   (1) For example, when I come to using the `make dist' rule.

   (2) The GNU standard is distributed from
`http://www.gnu.org/prep/standards.html'.


File: autobook.info,  Node: Installing Header Files,  Next: Including Texinfo Documentation,  Prev: Removing --foreign,  Up: A Large GNU Autotools Project

12.3 Installing Header Files
============================

     One of the more difficult problems with GNU Autotools driven
     projects is that each of them depends on `config.h' (or its
     equivalent) and the project specific symbols that it defines.  The
     purpose of this file is to be `#include'd from all of the project
     source files.  The preprocessor can tailor then the code in these
     files to the target environment.

     It is often difficult and sometimes impossible to not introduce a
     dependency on `config.h' from one of the project's installable
     header files. It would be nice if you could simply install the
     generated `config.h', but even if you name it carefully or install
     it to a subdirectory to avoid filename problems, the macros it
     defines will clash with those from any other GNU Autotools based
     project which also installs _its_ `config.h'.

     For example, if Sic installed its `config.h' as
     `/usr/include/sic/config.h', and had `#include <sic/config.h>' in
     the installed `common.h', when another GNU Autotools based project
     came to use the Sic library it might begin like this:

          #if HAVE_CONFIG_H
          #  include <config.h>
          #endif

          #if HAVE_SIC_H
          #  include <sic.h>
          #endif

          static const char version_number[] = VERSION;

     But, `sic.h' says `#include <sic/common.h>', which in turn says
     `#include <sic/config.h>'.  Even though the other project has the
     correct value for `VERSION' in its own `config.h', by the time the
     preprocessor reaches the `version_number' definition, it has been
     redefined to the value in `sic/config.h'.  Imagine the mess you
     could get into if you were using several libraries which each
     installed their own `config.h' definitions.  GCC issues a warning
     when a macro is redefined to a different value which would help
     you to catch this error.  Some compilers do not issue a warning,
     and perhaps worse, other compilers will warn even if the repeated
     definitions have the same value, flooding you with hundreds of
     warnings for each source file that reads multiple `config.h'
     headers.

     The Autoconf macro `AC_OUTPUT_COMMANDS'(1) provides a way to solve
     this problem.  The idea is to generate a system specific but
     installable header from the results of the various tests performed
     by `configure'.  There is a 1-to-1 mapping between the
     preprocessor code that relied on the configure results written to
     `config.h', and the new shell code that relies on the configure
     results saved in `config.cache'.


   The following code is a snippet from `configure.in', in the body of
the `AC_OUTPUT_COMMANDS' macro:

         # Add the code to include these headers only if autoconf has
         # shown them to be present.
         if test x$ac_cv_header_stdlib_h = xyes; then
           echo '#include <stdlib.h>' >> $tmpfile
         fi
         if test x$ac_cv_header_unistd_h = xyes; then
           echo '#include <unistd.h>' >> $tmpfile
         fi
         if test x$ac_cv_header_sys_wait_h = xyes; then
           echo '#include <sys/wait.h>' >> $tmpfile
         fi
         if test x$ac_cv_header_errno_h = xyes; then
           echo '#include <errno.h>' >> $tmpfile
         fi
         cat >> $tmpfile << '_EOF_'
     #ifndef errno
     /* Some sytems #define this! */
     extern int errno;
     #endif
     _EOF_
         if test x$ac_cv_header_string_h = xyes; then
           echo '#include <string.h>' >> $tmpfile
         elif test x$ac_cv_header_strings_h = xyes; then
           echo '#include <strings.h>' >> $tmpfile
         fi
         if test x$ac_cv_header_assert_h = xyes; then
           cat >> $tmpfile << '_EOF_'

     #include <assert.h>
     #define SIC_ASSERT assert

     _EOF_
         else
             echo '#define SIC_ASSERT(expr)  ((void) 0)' >> $tmpfile
         fi

Compare this with the equivalent C pre-processor code from
`sic/common.h', which it replaces:

     #if STDC_HEADERS || HAVE_STDLIB_H
     #  include <stdlib.h>
     #endif

     #if HAVE_UNISTD_H
     #  include <unistd.h>
     #endif

     #if HAVE_SYS_WAIT_H
     #  include <sys/wait.h>
     #endif

     #if HAVE_ERRNO_H
     #  include <errno.h>
     #endif
     #ifndef errno
     /* Some systems #define this! */
     extern int errno;
     #endif

     #if HAVE_STRING_H
     #  include <string.h>
     #else
     #  if HAVE_STRING_H
     #    include <strings.h>
     #  endif
     #endif

     #if HAVE_ASSERT_H
     #  include <assert.h>
     #  define SIC_ASSERT assert
     #else
     #  define SIC_ASSERT(expr) ((void) 0)
     #endif

   Apart from the mechanical process of translating the preprocessor
code, there is some plumbing needed to ensure that the `common.h' file
generated by the new code in `configure.in' is functionally equivalent
to the old code, and is generated in a correct and timely fashion.

   Taking my lead from some of the Automake generated `make' rules to
regenerate `Makefile' from `Makefile.in' by calling `config.status', I
have added some similar rules to `sic/Makefile.am' to regenerate
`common.h' from `common-h.in'.

     # Regenerate common.h with config.status whenever common-h.in changes.
     common.h: stamp-common
             @:
     stamp-common: $(srcdir)/common-h.in $(top_builddir)/config.status
             cd $(top_builddir) \
               && CONFIG_FILES= CONFIG_HEADERS= CONFIG_OTHER=sic/common.h \
               $(SHELL) ./config.status
             echo timestamp > $@

   The way that `AC_OUTPUT_COMMANDS' works, is to copy the contained
code into `config.status' (*note Generated File Dependencies::).  It is
actually `config.status' that creates the generated files - for
example, `automake' generated
`Makefile's are able to regenerate themselves from corresponding
`Makefile.in's by calling `config.status' if they become out of date.
Unfortunately, this means that `config.status' doesn't have direct
access to the cache values generated while `configure' was running
(because it has finished its work by the time `config.status' is
called).  It is tempting to read in the cache file at the top of the
code inside `AC_OUTPUT_COMMANDS', but that only works if you know where
the cache file is saved.  Also the package installer can use the
`--cache-file' option of `configure' to change the location of the
file, or turn off caching entirely with `--cache-file=/dev/null'.

   `AC_OUTPUT_COMMANDS' accepts a second argument which can be used to
pass the variable settings discovered by `configure' into
`config.status'.  It's not pretty, and is a little error prone.  In the
first argument to `AC_OUTPUT_COMMANDS', you must be careful to check
that *every single* configure variable referenced is correctly set
somewhere in the second argument.

   A slightly stripped down example from the sic project `configure.in'
looks like this:

     # ----------------------------------------------------------------------
     # Add code to config.status to create an installable host dependent
     # configuration file.
     # ----------------------------------------------------------------------
     AC_OUTPUT_COMMANDS([
       if test -n "$CONFIG_FILES" && test -n "$CONFIG_HEADERS"; then
         # If both these vars are non-empty, then config.status wasn't run by
         # automake rules (which always set one or the other to empty).
         CONFIG_OTHER=${CONFIG_OTHER-sic/common.h}
       fi
       case "$CONFIG_OTHER" in
       *sic/common.h*)
         outfile=sic/common.h
         stampfile=sic/stamp-common
         tmpfile=${outfile}T
         dirname="sed s,^.*/,,g"

         echo creating $outfile
         cat > $tmpfile << _EOF_
     /*  -*- Mode: C -*-
      * --------------------------------------------------------------------
      * DO NOT EDIT THIS FILE!  It has been automatically generated
      * from:    configure.in and `echo $outfile|$dirname`.in
      * on host: `(hostname || uname -n) 2>/dev/null | sed 1q`
      * --------------------------------------------------------------------
      */

     #ifndef SIC_COMMON_H
     #define SIC_COMMON_H 1

     #include <stdio.h>
     #include <sys/types.h>
     _EOF_

         if test x$ac_cv_func_bzero = xno && \
            test x$ac_cv_func_memset = xyes; then
           cat >> $tmpfile << '_EOF_'
     #define bzero(buf, bytes) ((void) memset (buf, 0, bytes))
     _EOF_
         fi
         if test x$ac_cv_func_strchr = xno; then
           echo '#define strchr index' >> $tmpfile
         fi
         if test x$ac_cv_func_strrchr = xno; then
           echo '#define strrchr rindex' >> $tmpfile
         fi

         # The ugly but portable cpp stuff comes from here
         infile=$srcdir/sic/`echo $outfile | sed 's,.*/,,g;s,\..*$,,g'`-h.in
         sed '/^##.*$/d' $infile >> $tmpfile

     ],[
       srcdir=$srcdir
       ac_cv_func_bzero=$ac_cv_func_bzero
       ac_cv_func_memset=$ac_cv_func_memset
       ac_cv_func_strchr=$ac_cv_func_strchr
       ac_cv_func_strrchr=$ac_cv_func_strrchr
     ])

You will notice that the contents of `common-h.in' are copied into
`common.h' verbatim as it is generated.  It's just an easy way of
collecting together the code that belongs in `common.h', but which
doesn't rely on configuration tests, without cluttering `configure.in'
any more than necessary.

   I should point out that, although this method has served me well for
a number of years now, it is inherently fragile because it relies on
undocumented internals of both Autoconf and Automake.  There is a very
real possibility that if you also track the latest releases of GNU
Autotools, it may stop working.  Future releases of GNU Autotools will
address the interface problems that force us to use code like this, for
the lack of a better way to do things.

   ---------- Footnotes ----------

   (1) This is for Autoconf version 2.13.  Autoconf version 2.50
recommends `AC_CONFIG_COMMANDS'.


File: autobook.info,  Node: Including Texinfo Documentation,  Next: Adding a Test Suite,  Prev: Installing Header Files,  Up: A Large GNU Autotools Project

12.4 Including Texinfo Documentation
====================================

Automake provides a few facilities to make the maintenance of Texinfo
documentation within projects much simpler than it used to be.  Writing
a `Makefile.am' for Texinfo documentation is extremely straightforward:

     ## Process this file with automake to produce Makefile.in

     MAINTAINERCLEANFILES    = Makefile.in
     info_TEXINFOS           = sic.texi

The `TEXINFOS' primary will not only create rules for generating
`.info' files suitable for browsing with the GNU info reader, but also
for generating `.dvi' and `.ps' documentation for printing.

   You can also create other formats of documentation by adding the
appropriate `make' rules to `Makefile.am'.  For example, because the
more recent Texinfo distributions have begun to support generation of
HTML documentation from the `.texi' format master document, I have
added the appropriate rules to the `Makefile.am':

     SUFFIXES                = .html

     html_docs               = sic.html

     .texi.html:
             $(MAKEINFO) --html $<

     .PHONY: html
     html: version.texi $(html_docs)

For ease of maintenance, these `make' rules employ a suffix rule which
describes how to generate HTML from equivalent `.texi' source - this
involves telling make about the `.html' suffix using the automake
`SUFFIXES' macro.  I haven't defined `MAKEINFO' explicitly (though I
could have done) because I know that Automake has already defined it
for use in the `.info' generation rules.

   The `html' target is for convenience; typing `make html' is a little
easier than typng `make sic.html'.  I have also added a `.PHONY' target
so that featureful `make' programs will know that the `html' target
doesn't actually generate a file called literally, `html'.  As it
stands, this code is not quite complete, since the toplevel
`Makefile.am' doesn't know how to call the `html' rule in the `doc'
subdirectory.

   There is no need to provide a general solution here in the way
Automake does for its `dvi' target, for example. A simple recursive
call to `doc/Makefile' is much simpler:

     docdir                        = $(top_builddir)/doc

     html:
             @echo Making $@ in $(docdir)
             @cd $(docdir) && make $@

   Another useful management function that Automake can perform for you
with respect to Texinfo documentation is to automatically generate the
version numbers for your Texinfo documents.  It will add `make' rules
to generate a suitable `version.texi', so long as `automake' sees
`@include version.texi' in the body of the Texinfo source:

     \input texinfo   @c -*-texinfo-*-
     @c %**start of header
     @setfilename sic.info
     @settitle Dynamic Modular Interpreter Prototyping
     @setchapternewpage odd
     @c %**end of header
     @headings             double

     @include version.texi

     @dircategory Programming
     @direntry
     * sic: (sic).    The dynamic, modular, interpreter prototyping tool.
     @end direntry

     @ifinfo
     This file documents sic.

     @end ifinfo

     @titlepage
     @sp 10
     @title Sic
     @subtitle Edition @value{EDITION}, @value{UPDATED}
     @subtitle $Id: sic.texi,v 1.1 2004/03/16 07:08:18 joostvb Exp $
     @author Gary V. Vaughan

     @page
     @vskip 0pt plus 1filll
     @end titlepage

   `version.texi' sets Texinfo variables, `VERSION', `EDITION' and
`UPDATE', which can be expanded elsewhere in the main Texinfo
documentation by using `@value{EDITION}' for example. This makes use of
another auxiliary file, `mdate-sh' which will be added to the scripts
in the `$ac_aux_dir' subdirectory by Automake after adding the
`version.texi' reference to `sic.texi':

     $ ./bootstrap
     + aclocal -I config
     + libtoolize --force --copy
     Putting files in AC_CONFIG_AUX_DIR, config.
     + autoheader
     + automake --add-missing --copy
     doc/Makefile.am:22: installing config/mdate-sh
     + autoconf
     $ make html
     /bin/sh ./config.status --recheck
     ...
     Making html in ./doc
     make[1]: Entering directory /tmp/sic/doc
     Updating version.texi
     makeinfo --html sic.texi
     make[1]: Leaving directory /tmp/sic/doc

   Hopefully, it now goes without saying that I also need to add the
`doc' subdirectory to `AC_OUTPUT' in `configure.in' and to `SUBDIRS' in
the top-level `Makefile.am'.


File: autobook.info,  Node: Adding a Test Suite,  Prev: Including Texinfo Documentation,  Up: A Large GNU Autotools Project

12.5 Adding a Test Suite
========================

Automake has very flexible support for automated test-suites within a
project distribution, which are discussed more fully in the Automake
manual.  I have added a simple shell script based testing facility to
Sic using this support - this kind of testing mechanism is perfectly
adequate for command line projects.  The tests themselves simply feed
prescribed input to the uninstalled `sic' interpreter and compare the
actual output with what is expected.

   Here is one of the test scripts:

     ## -*- sh -*-
     ## incomplete.test -- Test incomplete command handling

     # Common definitions
     if test -z "$srcdir"; then
         srcdir=echo "$0" | sed s,[^/]*$,,'
         test "$srcdir" = "$0" && srcdir=.
         test -z "$srcdir" && srcdir=.
         test "${VERBOSE+set}" != set && VERBOSE=1
     fi
     . $srcdir/defs

     # this is the test script
     cat <<\EOF > in.sic
     echo "1
     2
     3"
     EOF

     # this is the output we should expect to see
     cat <<\EOF >ok
     1
     2
     3
     EOF

     cat <<\EOF >errok
     EOF

     # Run the test saving stderr to a file, and showing stdout
     # if VERBOSE == 1
     $RUNSIC in.sic  2> err | tee -i out >&2

     # Test against expected output
     if ${CMP} -s out ok; then
         :
     else
         echo "ok:" >&2
         cat ok >&2
         exit 1
     fi

     # Munge error output to remove leading directories, `lt-' or
     # trailing `.exe'
     sed -e "s,^[^:]*[lt-]*sic[.ex]*:,sic:," err >sederr && mv sederr err

     # Show stderr if doesnt match expected output if VERBOSE == 1
     if "$CMP" -s err errok; then
         :
     else
         echo "err:" >&2
         cat err >&2
         echo "errok:" >&2
         cat errok >&2
         exit 1
     fi

The tricky part of this script is the first part which discovers the
location of (and loads)  `$srcdir/defs'.  It is a little convoluted
because it needs to work if the user has compiled the project in a
separate build tree - in which case the `defs' file is in a separate
source tree and not in the actual directory in which the test is
executed.

   The `defs' file allows me to factor out the common definitions from
each of the test files so that it can be maintained once in a single
file that is read by all of the tests:

     #! /bin/sh

     # Make sure srcdir is an absolute path.  Supply the variable
     # if it does not exist.  We want to be able to run the tests
     # stand-alone!!
     #
     srcdir=${srcdir-.}
     if test ! -d $srcdir ; then
         echo "defs: installation error" 1>&2
         exit 1
     fi

     #  IF the source directory is a Unix or a DOS root directory, ...
     #
     case "$srcdir" in
         /* | [A-Za-z]:\\*) ;;
         *) srcdir=`\cd $srcdir && pwd` ;;
     esac

     case "$top_builddir" in
         /* | [A-Za-z]:\\*) ;;
         *) top_builddir=`\cd ${top_builddir-..} && pwd` ;;
     esac

     progname=`echo "$0" | sed 's,^.*/,,'`
     testname=`echo "$progname" | sed 's,-.*$,,'`
     testsubdir=${testsubdir-testSubDir}

     SIC_MODULE_PATH=$top_builddir/modules
     export SIC_MODULE_PATH

     # User can set VERBOSE to prevent output redirection
     case x$VERBOSE in
         xNO | xno | x0 | x)
             exec > /dev/null 2>&1
             ;;
     esac

     rm -rf $testsubdir > /dev/null 2>&1
     mkdir $testsubdir
     cd $testsubdir \
        || { echo "Cannot make or change into $testsubdir"; exit 1; }

     echo "=== Running test $progname"

     CMP="${CMP-cmp}"
     RUNSIC="${top_builddir}/src/sic"

   Having written a few more test scripts, and made sure that they are
working by running them from the command line, all that remains is to
write a suitable `Makefile.am' so that `automake' can run the test
suite automatically.

     ## Makefile.am -- Process this file with automake to produce Makefile.in

     EXTRA_DIST              = defs $(TESTS)
     MAINTAINERCLEANFILES    = Makefile.in

     testsubdir              = testSubDir

     TESTS_ENVIRONMENT       = top_builddir=$(top_builddir)

     TESTS                   =                       \
                             empty-eval.test         \
                             empty-eval-2.test       \
                             empty-eval-3.test       \
                             incomplete.test         \
                             multicmd.test

     distclean-local:
             -rm -rf $(testsubdir)

I have used the `testsubdir' macro to run the tests in their own
subdirectory so that the directory containing the actual test scripts is
not polluted with lots of fallout files generated by running the tests.
For completeness I have used a "hook target"(1) to remove this
subdirectory when the user types:

     $ make distclean
     ...
     rm -rf testSubDir
     ...

   Adding more tests is accomplished by creating a new test script and
adding it to the list in `noinst_SCRIPTS'.  Remembering to add the new
`tests' subdirectory to `configure.in' and the top-level `Makefile.am',
and reconfiguring the project to propagate the changes into the various
generated files, I can run the whole test suite from the top directory
with:

     $ make check

   It is often useful run tests in isolation, either when developing new
tests, or to examine more closely why a test has failed unexpectedly.
Having set this test suite up as I did, individual tests can be executed
with:

     $ VERBOSE=1 make check TESTS=incomplete.test
     make  check-TESTS
     make[1]: Entering directory
     /tmp/sic/tests
     === Running test incomplete.test
     1
     2
     3
     PASS: incomplete.test
     ==================
     All 1 tests passed
     ==================
     make[1]: Leaving directory /tmp/sic/tests
     $ ls testSubDir/
     err   errok   in.sic   ok   out

The `testSubDir' subdirectory now contains the expected and actual
output from that particular test for both `stdout' and `stderr', and
the input file which generated the actual output.  Had the test failed,
I would be able to look at these files to decide whether there is a bug
in the program or simply a bug in the test script.  Being able to
examine individual tests like this is invaluable, especially when the
test suite becomes very large - because you will, naturally, add tests
every time you add features to a project or find and fix a bug.

   Another alternative to the pure shell based test mechanism I have
presented here is the Autotest facility by Franc,ois Pinard, as used in
Autoconf after release 2.13.

   Later in *Note A Complex GNU Autotools Project::, the Sic project
will be revisited to take advantage of some of the more advanced
features of GNU Autotools.  But first these advanced features will be
discussed in the next several chapters - starting, in the next chapter,
with a discussion of how GNU Autotools can help you to make a tarred
distribution of your own projects.

   ---------- Footnotes ----------

   (1) This is a sort of callback function which will be called by the
`make' rules generated by Automake.


File: autobook.info,  Node: Rolling Distribution Tarballs,  Next: Installing and Uninstalling,  Prev: A Large GNU Autotools Project,  Up: Top

13 Rolling Distribution Tarballs
********************************

There's something about the word `tarballs' that make you want to avoid
them altogether, let alone get involved in the disgusting process of
rolling one.  And, in the past, that was apparently the attitude of
most developers, as witnessed by the strange ways distribution tar
archives were created and unpacked.  Automake largely automates this
tedious process, in a sense providing you with the obliviousness you
crave.

* Menu:

* Introduction to Distributions::
* What goes in::
* The distcheck rule::
* Some caveats::
* Implementation::


File: autobook.info,  Node: Introduction to Distributions,  Next: What goes in,  Up: Rolling Distribution Tarballs

13.1 Introduction to Distributions
==================================

The basic approach to creating a tar distribution is to run
     make
     make dist

   The generated tar file is named PACKAGE-VERSION.tar.gz, and will
unpack into a directory named PACKAGE-VERSION.  These two rules are
mandated by the GNU Coding Standards, and are just good ideas in any
case, because it is convenient for the end user to have the version
information easily accessible while building a package.  It removes any
doubt when she goes back to an old tree after some time away from it.
Unpacking into a fresh directory is always a good idea - in the old
days some packages would unpack into the current directory, requiring
an annoying clean-up job for the unwary system administrator.

   The unpacked archive is completely portable, to the extent of
Automake's ability to enforce this.  That is, all the generated files
(e.g., `configure') are newer than their inputs (e.g., `configure.in'),
and the distributed `Makefile.in' files should work with any version of
`make'.  Of course, some of the responsibility for portability lies
with you: you are free to introduce non-portable code into your
`Makefile.am', and Automake can't diagnose this.  No special tools
beyond the minimal tool list (*note Minimal Tool List:
(standards)Utilities in Makefiles.)  plus whatever your own `Makefile'
and `configure' additions use, will be required for the end user to
build the package.

   By default Automake creates a `.tar.gz' file.  It notices if you are
using GNU `tar' and arranges to create portable archives in this case.
(1)

   People do sometimes want to make other sorts of distributions.
Automake allows this through the use of options.

`dist-bzip2'
     Add a `dist-bzip2' target, which creates a `.tar.bz2' file.  These
     files are frequently smaller than the corresponding `.tar.gz' file.

`dist-shar'
     Add a `dist-shar' target, which creates a `shar' archive.

`dist-zip'
     Add a `dist-zip' target, which creates a `zip' file.  These files
     are popular for Windows distributions.

`dist-tarZ'
     Add a `dist-tarZ' target, which creates a `.tar.Z' file.  This
     exists mostly for die-hard old-time Unix hackers; the rest of the
     world has moved on to `gzip' or `bzip2'.

   ---------- Footnotes ----------

   (1) By default, GNU `tar' can create non-portable archives in
certain (rare) situations.  To be safe, Automake arranges to use the
`-o' compatibility flag when GNU `tar' is used.


File: autobook.info,  Node: What goes in,  Next: The distcheck rule,  Prev: Introduction to Distributions,  Up: Rolling Distribution Tarballs

13.2 What goes in
=================

Automake tries to make creating a distribution as easy as possible.  The
rules are set up by default to distribute those things which Automake
knows belong in a distribution.  For instance, Automake always
distributes your `configure' script and your `NEWS' file.  All the
files Automake automatically distributes are shown by `automake --help':

     $ automake --help
     ...
     Files which are automatically distributed, if found:
       ABOUT-GNU         README           config.guess      ltconfig
       ABOUT-NLS         THANKS           config.h.bot      ltmain.sh
       AUTHORS           TODO             config.h.top      mdate-sh
       BACKLOG           acconfig.h       config.sub        missing
       COPYING           acinclude.m4     configure         mkinstalldirs
       COPYING.LIB       aclocal.m4       configure.in      stamp-h.in
       ChangeLog         ansi2knr.1       elisp-comp        stamp-vti
       INSTALL           ansi2knr.c       install-sh        texinfo.tex
       NEWS              compile          libversion.in     ylwrap
     ...

   Automake also distributes some files about which it has no built-in
knowledge, but about which it learns from your `Makefile.am'.  For
instance, the source files listed in a `_SOURCES' variable go into the
distribution.  This is why you ought to list uninstalled header files
in the `_SOURCES' variable: otherwise you'll just have to introduce
another variable to distribute them - Automake will only know about
them if you tell it.

   Not all primaries are distributed by default.  The rule is arbitrary,
but pretty simple: of all the primaries, only `_TEXINFOS' and
`_HEADERS' are distributed by default.  (Sources that make up programs
and libraries are also distributed by default, but, perhaps
confusingly, `_SOURCES' is not considered a primary.)

   While there is no rhyme, there is a reason: defaults were chosen
based on feedback from users.  Typically, `enough' reports of the form
`I auto-generate my `_SCRIPTS'.  How do I prevent them from ending up
in the distribution?' would cause a change in the default.

   Although the defaults are adequate in many situations, sometimes you
have to distribute files which aren't covered automatically.  It is
easy to add additional files to a distribution; simply list them in the
macro `EXTRA_DIST'.  You can list files in subdirectories here.  You
can also list a directory's name here and the entire contents will be
copied into the distribution by `make dist'.  Use this last feature
with care.  A typical failure is that you'll put a `temporary' file in
the directory and then it will end up in the distribution when you
forget to remove it.  Similarly, version control files, such as a `CVS'
subdirectory, can easily end up in a distribution this way.

   If a primary is not distributed by default, but in your case it
ought to be, you can easily correct it with `EXTRA_DIST':

     EXTRA_DIST = $(bin_SCRIPTS)

   The next major Automake release (1) will have a better method for
controlling whether primaries do or do not go into the distribution.
In 1.5 you will be able to use the `dist' and `nodist' prefixes to
control distribution on a per-variable basis.  You will even be able to
simultaneously use both prefixes with a given primary to include some
files and omit others:

     dist_bin_SCRIPTS = distribute-this
     nodist_bin_SCRIPTS = but-not-this

   ---------- Footnotes ----------

   (1) Probably numbered 1.5.


File: autobook.info,  Node: The distcheck rule,  Next: Some caveats,  Prev: What goes in,  Up: Rolling Distribution Tarballs

13.3 The distcheck rule
=======================

The `make dist' documentation sounds nice, and `make dist' did do
something, but how do you know it really works?  It is a terrible
feeling when you realize your carefully crafted distribution is missing
a file and won't compile on a user's machine.

   I wouldn't write such an introduction unless Automake provided a
solution.  The solution is a smoke test known as `make distcheck'.
This rule performs a `make dist' as usual, but it doesn't stop there.
Instead, it then proceeds to untar the new archive into a fresh
directory, build it in a fresh build directory separate from the source
directory, install it into a third fresh directory, and finally run
`make check' in the build tree.  If any step fails, `distcheck' aborts,
leaving you to fix the problem before it will create a distribution.

   While not a complete test - it only tries one architecture, after all
- `distcheck' nevertheless catches most packaging errors (as opposed to
portability bugs), and its use is highly recommended.


File: autobook.info,  Node: Some caveats,  Next: Implementation,  Prev: The distcheck rule,  Up: Rolling Distribution Tarballs

13.4 Some caveats
=================

Earlier, if you were awake, you noticed that I recommended the use of
`make' before `make dist' or `make distcheck'.  This practice ensures
that all the generated files are newer than their inputs.  It also
solves some problems related to dependency tracking (*note Advanced GNU
Automake Usage::).

   Note that currently Automake will allow you to make a distribution
when maintainer mode is off, or when you do not have all the required
maintainer tools.  That is, you can make a subtly broken distribution if
you are motivated or unlucky.  This will be addressed in a future
version of Automake.


File: autobook.info,  Node: Implementation,  Prev: Some caveats,  Up: Rolling Distribution Tarballs

13.5 Implementation
===================

In order to understand how to use the more advanced `dist'-related
features, you must first understand how `make dist' is implemented.
For most packages, what we've already covered will suffice.  Few
packages will need the more advanced features, though I note that many
use them anyway.

   The `dist' rules work by building a copy of the source tree and then
archiving that copy.  This copy is made in stages: a `Makefile' in a
particular directory updates the corresponding directory in the shadow
tree.  In some cases, `automake' is run to create a new `Makefile.in'
in the new distribution tree.

   After each directory's `Makefile' has had a chance to update the
distribution directory, the appropriate command is run to create the
archive.  Finally, the temporary directory is removed.

   If your `Makefile.am' defines a `dist-hook' rule, then Automake will
arrange to run this rule when the copying work for this directory is
finished.  This rule can do literally anything to the distribution
directory, so some care is required - careless use will result in an
unusable distribution.  For instance, Automake will create the shadow
tree using links, if possible.  This means that it is inadvisable to
modify the files in the `dist' tree in a dist hook.  One common use for
this rule is to remove files that erroneously end up in the
distribution (in rare situations this can happen).  The variable
`distdir' is defined during the `dist' process and refers to the
corresponding directory in the distribution tree; `top_distdir' refers
to the root of the distribution tree.

   Here is an example of removing a file from a distribution:

     dist-hook:
             -rm $(distdir)/remove-this-file


File: autobook.info,  Node: Installing and Uninstalling,  Next: Writing Portable C,  Prev: Rolling Distribution Tarballs,  Up: Top

14 Installing and Uninstalling Configured Packages
**************************************************

Have you ever seen a package where, once built, you were expected to
keep the build tree around forever, and always `cd' there before
running the tool?  You might have to cast your mind way, way back to the
bad old days of 1988 to remember such a horrible thing.

   The GNU Autotools provides a canned solution to this problem.  While
not without flaws, it does provide a reasonable and easy-to-use
framework.  In this chapter we discuss how the GNU Autotools
installation model, how to convince `automake' to install files where
you want them, and finally we conclude with some information about
uninstalling, including a brief discussion of its flaws.

14.1 Where files are installed
==============================

If you've ever run `configure --help', you've probably been frightened
by the huge number of options offered.  Although nobody ever uses more
than two or three of these, they are still important to understand when
writing your package; their proper use will help you figure out where
each file should be installed.  For a background on these standard
directories and their uses, refer to *Note Invoking configure::.

   We do recommend using the standard directories as described.  While
most package builders only use `--prefix' or perhaps `--exec-prefix',
some packages (eg. GNU/Linux distributions) require more control.  For
instance, if your package `quux' puts a file into `sysconfigdir', then
in the default configuration it will end up in `/usr/local/var'.
However, for a GNU/Linux distribution it would make more sense to
configure with `--sysconfigdir=/var/quux'.

   Automake makes it very easy to use the standard directories.  Each
directory, such as `bindir', is mapped onto a `Makefile' variable of
the same name.  Automake adds three useful variables to the standard
list:

`pkgincludedir'
     This is a convenience variable whose value is
     `$(includedir)/$(PACKAGE)'.

`pkgdatadir'
     A convenience variable whose value is `$(datadir)/$(PACKAGE)'.

`pkglibdir'
     A variable whose value is `$(libdir)/$(PACKAGE)'.

   These cannot be set on the `configure' command line but are always
defined as above.  (1)

   In Automake, a directory variable's name, without the `dir' suffix,
can be used as a prefix to a primary to indicate install location.
Confused yet?  And example will help: items listed in `bin_PROGRAMS'
are installed in `bindir'.

   Automake's rules are actually a bit more precise than this: the
directory and the primary must agree.  It doesn't make sense to install
a library in `datadir', so Automake won't let you.  Here is a complete
list showing primaries and the directories which can be used with them:

`PROGRAMS'
     `bindir', `sbindir', `libexecdir', `pkglibdir'.

`LIBRARIES'
     `libdir', `pkglibdir'.

`LTLIBRARIES'
     `libdir', `pkglibdir'.

`SCRIPTS'
     `bindir', `sbindir', `libexecdir', `pkgdatadir'.

`DATA'
     `datadir', `sysconfdir', `sharedstatedir', `localstatedir',
     `pkgdatadir'.

`HEADERS'
     `includedir', `oldincludedir', `pkgincludedir'.

`TEXINFOS'
     `infodir'.

`MANS'
     `man', `man0', `man1', `man2', `man3', `man4', `man5', `man6',
     `man7', `man8', `man9', `mann', `manl'.

   There are two other useful prefixes which, while not directory names,
can be used in their place.  These prefixes are valid with any primary.
The first of these is `noinst'.  This prefix tells Automake that the
listed objects should not be installed, but should be built anyway.
For instance, you can use `noinst_PROGRAMS' to list programs which will
not be installed.

   The second such non-directory prefix is `check'.  This prefix tells
Automake that this object should not be installed, and furthermore that
it should only be built when the user runs `make check'.

   Early in Automake history we discovered that even Automake's extended
built-in list of directories was not enough - basically anyone who had
written a `Makefile.am' sent in a bug report about this.  Now Automake
lets you extend the list of directories.

   First you must define your own directory variable.  This is a macro
whose name ends in `dir'.  Define this variable however you like.  We
suggest that you define it relative to an autoconf directory variable;
this gives the user some control over the value.  Don't hardcode it to
something like `/etc'; absolute hardcoded paths are rarely portable.

   Now you can attach the base part of the new variable to a primary
just as you can with the built-in directories:

     foodir = $(datadir)/foo
     foo_DATA = foo.txt

   Automake lets you attach such a variable to any primary, so you can
do things you ordinarily wouldn't want to do or be allowed to do.  For
instance, Automake won't diagnose this piece of code that tries to
install a program in an architecture-independent location:

     foodir = $(datadir)/foo
     foo_PROGRAMS = foo

14.2 Fine-grained control of install
====================================

The second most common way (2) to configure a package is to set
`prefix' and `exec-prefix' to different values.  This way, a system
administrator on a heterogeneous network can arrange to have the
architecture-independent files shared by all platforms.  Typically this
doesn't save very much space, but it does make in-place bug fixing or
platform-independent runtime configuration a lot easier.

   To this end, Automake provides finer control to the user than a
simple `make install'.  For instance, the user can strip all the package
executables at install time by running `make install-strip' (though we
recommend setting the various `INSTALL' environment variables instead;
this is discussed later).  More importantly, Automake provides a way to
install the architecture-dependent and architecture-independent parts
of a package independently.

   In the above scenario, installing the architecture-independent files
more than once is just a waste of time.  Our hypothetical administrator
can install those pieces exactly once, with `make install-data', and
then on each type of build machine install only the
architecture-dependent files with `make install-exec'.

   Nonstandard directories specified in `Makefile.am' are also
separated along `data' and `exec' lines, giving the user complete
control over installation.  If, and only if, the directory variable
name contains the string `exec', then items ending up in that directory
will be installed by `install-exec' and not `install-data'.

   At some sites, the paths referred to by software at runtime differ
from those used to actually install the software.  For instance, suppose
`/usr/local' is mounted read-only throughout the network.  On the
server, where new packages are built, the file system is available
read-write as `/w/usr/local' - a directory which is not mounted
anywhere else.  In this situation the sysadmin can configure and build
using the _runtime_ values, but use the `DESTDIR' trick to temporarily
change the paths at install time:

     ./configure --prefix=/usr/local
     make
     make DESTDIR=/w install

   Note that `DESTDIR' operates as a prefix only.  Sometimes this isn't
enough.  In this situation you can explicitly override each directory
variable:

     ./configure --prefix=/usr/local
     make
     make prefix=/w/usr/local datadir=/w/usr/share install

   Here is a full example (3) showing how you can unpack, configure,
and build a typical GNU program on multiple machines at the same time:

     sunos$ tar zxf foo-0.1.tar.gz
     sunos$ mkdir sunos linux

   In one window:

     sunos$ cd sunos
     sunos$ ../foo-0.1/configure --prefix=/usr/local \
     > --exec-prefix=/usr/local/sunos
     sunos$ make
     sunos$ make install

   And in another window:

     sunos$ rsh linux
     linux$ cd ~/linux
     linux$ ../foo-0.1/configure --prefix=/usr/local \
     > --exec-prefix=/usr/local/linux
     linux$ make
     linux$ make install-exec

   In this example we install everything on the `sunos' machine, but we
only install the platform-dependent files on the `linux' machine.  We
use a different `exec-prefix', so for example GNU/Linux executables
will end up in `/usr/local/linux/bin/'.

14.3 Install hooks
==================

As with `dist', the install process allows for generic targets which
can be used when the existing install functionality is not enough.
There are two types of targets which can be used: local rules and hooks.

   A local rule is named either `install-exec-local' or
`install-data-local', and is run during the course of the normal
install procedure.  This rule can be used to install things in ways that
Automake usually does not support.

   For instance, in `libgcj' we generate a number of header files, one
per Java class.  We want to install them in `pkgincludedir', but we
want to preserve the hierarchical structure of the headers (e.g., we
want `java/lang/String.h' to be installed as
`$(pkgincludedir)/java/lang/String.h', not
`$(pkgincludedir)/String.h'), and Automake does not currently support
this.  So we resort to a local rule, which is a bit more complicated
than you might expect:

     install-data-local:
             @for f in $(nat_headers) $(extra_headers); do \
     ## Compute the install directory at runtime.
               d="echo $$f | sed -e s,/[^/]*$$,,'"; \
     ## Make the install directory.
               $(mkinstalldirs) $(DESTDIR)$(includedir)/$$d; \
     ## Find the header file -- in our case it might be in srcdir or
     ## it might be in the build directory.  "p" is the variable that
     ## names the actual file we will install.
               if test -f $(srcdir)/$$f; then p=$(srcdir)/$$f; else p=$$f; fi; \
     ## Actually install the file.
               $(INSTALL_DATA) $$p $(DESTDIR)$(includedir)/$$f; \
             done

   A hook is guaranteed to run after the install of objects in this
directory has completed.  This can be used to modify files after they
have been installed.  There are two install hooks, named
`install-data-hook' and `install-exec-hook'.

   For instance, suppose you have written a program which must be
`setuid' root.  You can accomplish this by changing the permissions
after the program has been installed:

     bin_PROGRAMS = su
     su_SOURCES = su.c

     install-exec-hook:
             chown root $(bindir)/su
             chmod u+s $(bindir)/su

   Unlike an install hook, and install rule is not guaranteed to be
after all other install rules are run.  This lets it be run in parallel
with other install rules when a parallel `make' is used.  Ordinarily
this is not very important, and in practice you almost always see local
hooks and not local rules.

   The biggest caveat to using a local rule or an install hook is to
make sure that it will work when the source and build directories are
not the same--many people forget to do this.  This means being sure to
look in `$(srcdir)' when the file is a source file.

   It is also very important to make sure that you do not use a local
rule when install order is important - in this case, your `Makefile'
will succeed on some machines and fail on others.

14.4 Uninstall
==============

As if things arent confusing enough, there is still one more major
installation-related feature which we haven't mentioned: uninstall.
Automake adds an `uninstall' target to your `Makefile' which does the
reverse of `install': it deletes the newly installed package.

   Unlike `install', there is no `uninstall-data' or `uninstall-exec';
while possible in theory we don't think this would be useful enough to
actually use.  Like `install', you can write `uninstall-local' or
`uninstall-hook' rules.

   In our experience, `uninstall' is not a very useful feature.
Automake implements it because it is mandated by the GNU Standards, but
it doesn't work reliably across packages.  Maintainers who write
install hooks typically neglect to write uninstall hooks.  Also, since
it can't reliably uninstall a _previously_ installed version of a
package, it isn't useful for what most people would want to use it for
anyway.  We recommend using a real packaging system, several of which
are freely available.  In particular, GNU Stow, RPM, and the Debian
packaging system seem like good choices.

   ---------- Footnotes ----------

   (1) There has been some debate in the Autoconf community about
extending Autoconf to allow new directories to be set on the
`configure' command line.  Currently the consensus seems to be that
there are too many arguments to `configure' already.

   (2) The most common way being to simply set `prefix'.

   (3) This example assumes the use of GNU tar when extracting; this is
standard on Linux but does not come with Solaris.


File: autobook.info,  Node: Writing Portable C,  Next: Writing Portable C++,  Prev: Installing and Uninstalling,  Up: Top

15 Writing Portable C with GNU Autotools
****************************************

GNU Autotools permits you to write highly portable programs.  However,
using GNU Autotools is not by itself enough to make your programs
portable.  You must also write them portably.

   In this chapter we will give an introduction to writing portable
programs in C.  We will start with some notes on portable use of the C
language itself.  We will then discuss cross-Unix portability.  We will
finish with some notes on portability between Unix and Windows.

   Portability is a big topic, and we can not cover everything in this
chapter.  The basic rule of portable code is to remember that every
system is in some ways unique.  Do not assume that every other system is
like yours.  It is very helpful to be familiar with relevant standards,
such as the ISO C standard and the POSIX.1 standard.  Finally, there is
no substitute for experience; if you have the opportunity to build and
test your program on different systems, do so.

* Menu:

* C Language Portability::
* Cross-Unix Portability::
* Unix/Windows Portability::


File: autobook.info,  Node: C Language Portability,  Next: Cross-Unix Portability,  Up: Writing Portable C

15.1 C Language Portability
===========================

The C language makes it easy to write non-portable code.  In this
section we discuss these portability issues, and how to avoid them.

   We concentrate on differences that can arise on systems in common use
today.  For example, all common systems today define `char' to be 8
bits, and define a pointer to hold the address of an 8-bit byte.  We do
not discuss the more exotic possibilities found on historical machines
or on certain supercomputers.  If your program needs to run in unusual
settings, make sure you understand the characteristics of those systems;
the system documentation should include a C portability guide describing
the problems you are likely to encounter.

* Menu:

* ISO C::
* C Data Type Sizes::
* C Endianness::
* C Structure Layout::
* C Floating Point::
* GNU cc Extensions::


File: autobook.info,  Node: ISO C,  Next: C Data Type Sizes,  Up: C Language Portability

15.1.1 ISO C
------------

The ISO C standard first appeared in 1989 (the standard is often called
ANSI C).  It added several new features to the C language, most notably
function prototypes.  This led to many years of portability issues when
deciding whether to use ISO C features.

   We think that programs written today can assume the presence of an
ISO C compiler.  Therefore, we will not discuss issues related to the
differences between ISO C compilers and older compilers--often called
K&R compilers, from the first book on C by Kernighan and Ritchie.  You
may see these differences handled in older programs.

   There is a newer C standard called `C9X'.  Because compilers that
support it are not widely available as of this writing, this discussion
does not cover it.


File: autobook.info,  Node: C Data Type Sizes,  Next: C Endianness,  Prev: ISO C,  Up: C Language Portability

15.1.2 C Data Type Sizes
------------------------

The C language defines data types in terms of a minimum size, rather
than an exact size.  As of this writing, this mainly matters for the
types `int' and `long'.  A variable of type `int' must be at least 16
bits, and is often 32 bits.  A variable of type `long' must be at least
32 bits, and is sometimes 64 bits.

   The range of a 16 bit number is -32768 to 32767 for a signed number,
or 0 to 65535 for an unsigned number.  If a variable may hold numbers
larger than 16 bits, use `long' rather than `int'.  Never assume that
`int' or `long' have a specific size, or that they will overflow at a
particular point.  When appropriate, use variables of system defined
types rather than `int' or `long':

`size_t'
     Use this to hold the size of an object, as returned by `sizeof'.

`ptrdiff_t'
     Use this to hold the difference between two pointers into the same
     array.

`time_t'
     Use this to hold a time value as returned by the `time' function.

`off_t'
     On a Unix system, use this to hold a file position as returned by
     `lseek'.

`ssize_t'
     Use this to hold the result of the Unix `read' or `write'
     functions.

   Some books on C recommend using typedefs to specify types of
particular sizes, and then adjusting those typedefs on specific systems.
GNU Autotools supports this using the `AC_CHECK_SIZEOF' macro.
However, while we agree with using typedefs for clarity, we do not
recommend using them purely for portability.  It is safest to rely only
on the minimum size assumptions made by the C language, rather than to
assume that a type of a specific size will always be available.  Also,
most C compilers will define `int' to be the most efficient type for
the system, so it is normally best to simply use `int' when possible.


File: autobook.info,  Node: C Endianness,  Next: C Structure Layout,  Prev: C Data Type Sizes,  Up: C Language Portability

15.1.3 C Endianness
-------------------

When a number longer than a single byte is stored in memory, it must be
stored in some particular format.  Modern systems do this by storing the
number byte by byte such that the bytes can simply be concatenated into
the final number.  However, the order of storage varies: some systems
store the least significant byte at the lowest address in memory, while
some store the most significant byte there.  These are referred to as
"little-endian" and "big-endian" systems, respectively.(1)

   This difference means that portable code may not make any assumptions
about the order of storage of a number.  For example, code like this
will act differently on different systems:
       /* Example of non-portable code; don't do this */
       int i = 4;
       char c = *(char *) i;

   Although that was a contrived example, real problems arise when
writing numeric data in a file or across a network connection.  If the
file or network connection may be read on a different type of system,
numeric data must be written in a format which can be unambiguously
recovered.  It is not portable to simply do something like
       /* Example of non-portable code; don't do this */
       write (fd, &i, sizeof i);
   This example is non-portable both because of endianness and because
it assumes that the size of the type of `i' are the same on both
systems.

   Instead, do something like this:
       int j;
       char buf[4];
       for (j = 0; j < 4; ++j)
         buf[j] = (i >> (j * 8)) & 0xff;
       write (fd, buf, 4); /* In real code, check the return value */
   This unambiguously writes out a little endian 4 byte value.  The code
will work on any system, and the result can be read unambiguously on any
system.

   Another approach to handling endianness is to use the `htonS' and
`ntohS' functions available on most systems.  These functions convert
between "network endianness" and host endianness.  Network endianness
is big-endian; it has that name because the standard TCP/IP network
protocols use big-endian ordering.

   These functions come in two sizes: `htonl' and `ntohl' operate on
4-byte quantities, and `htons' and `ntohs' operate on 2-byte
quantities.  The `hton' functions convert host endianness to network
endianness.  The `ntoh' functions convert network endianness to host
endianness.  On big-endian systems, these functions simply return their
arguments; on little-endian systems, they return their arguments after
swapping the bytes.

   Although these functions are used in a lot of existing code, they
can be difficult to use in highly portable code, because they require
knowing the exact size of your data types.  If you know that the type
`int' is exactly 4 bytes long, then it is possible to write code like
the following:
       int j;
       j = htonl (i);
       write (fd, &j, 4);
   However, if `int' is not exactly 4 bytes long, this example will not
work correctly on all systems.

   ---------- Footnotes ----------

   (1) These names come from `Gulliver's Travels'.


File: autobook.info,  Node: C Structure Layout,  Next: C Floating Point,  Prev: C Endianness,  Up: C Language Portability

15.1.4 C Structure Layout
-------------------------

C compilers on different systems lay out structures differently.  In
some cases there can even be layout differences between different C
compilers on the same system.  Compilers add gaps between fields, and
these gaps have different sizes and are at different locations.  You can
normally assume that there are no gaps between fields of type `char' or
array of `char'.  However, you can not make any assumptions about gaps
between fields of any larger type.  You also can not make any
assumptions about the layout of bitfield types.

   These structure layout issues mean that it is difficult to portably
use a C struct to define the format of data which may be read on another
type of system, such as data in a file or sent over a network
connection.  Portable code must read and write such data field by field,
rather than trying to read an entire struct at once.

   Here is an example of non-portable code when reading data which may
have been written to a file or a network connection on another type of
system.  Don't do this.
       /* Example of non-portable code; don't do this */
       struct {
         short i;
         int j;
       } s;
       read (fd, &s, sizeof s);

   Instead, do something like this (the struct `s' is assumed to be the
same as above):
       unsigned char buf[6];
       read (fd, buf, sizeof buf); /* Should check return value */
       s.i = buf[0] | (buf[1] << 8);
       s.j = buf[2] | (buf[3] << 8) | (buf[4] << 16) | (buf[5] << 24);
   Naturally the code to write out the structure should be similar.


File: autobook.info,  Node: C Floating Point,  Next: GNU cc Extensions,  Prev: C Structure Layout,  Up: C Language Portability

15.1.5 C Floating Point
-----------------------

Most modern systems handle floating point following the IEEE-695
standard.  However, there are still portability issues.

   Most processors use 64 bits of precision when computing floating
point values.  However, the widely used Intel x86 series of processors
compute temporary values using 80 bits of precision, as do most
instances of the Motorola 68k series.  Some other processors, such as
the PowerPC, provide fused multiply-add instructions which perform a
multiplication and an addition using high precision for the
intermediate value.  Optimizing compilers will generate such
instructions based on sequences of C operations.

   For almost all programs, these differences do not matter.  However,
for programs which do intensive floating point operations, the
differences can be significant.  It is possible to write floating point
loops which terminate on one sort of processor but not on another.

   Unfortunately, there is no rule of thumb that can be used to avoid
these problems.  Most compilers provide an option to disable the use of
extended precision (for GNU cc, the option is `-ffloat-store').
However, on the one hand, this merely shifts the portability problem
elsewhere, and, on the other, the extended precision is often good
rather than bad.  Although these portability problems can not be easily
avoided, you should at least be aware of them if you write programs
which require very precise floating point operations.

   The IEEE-695 standard specifies certain flags which the floating
point processor should make available (e.g., overflow, underflow,
inexact), and specifies that there should be some control over the
floating point rounding mode.  Most processors make these flags and
controls available; however, there is no portable way to access them.
A portable program should not assume that it will have this degree of
control over floating point operations.


File: autobook.info,  Node: GNU cc Extensions,  Prev: C Floating Point,  Up: C Language Portability

15.1.6 GNU cc Extensions
------------------------

The GNU `cc' compiler has several useful extensions, which are
documented in the GNU `cc' manual.  A program which must be portable to
other C compilers must naturally avoid these extensions; the
`-pedantic' option may be used to warn about any accidental use of an
extension.

   However, the GNU cc compiler is itself highly portable, and it runs
on all modern Unix platforms as well as on Windows.  Depending upon your
portability requirements, you may be able to simply assume that GNU cc
is available, in which case your program may use extensions when they
are useful.  Note that some extensions are inherently non-portable, such
as inline assembler code, or using attributes to specify a particular
section for a function or a global variable.


File: autobook.info,  Node: Cross-Unix Portability,  Next: Unix/Windows Portability,  Prev: C Language Portability,  Up: Writing Portable C

15.2 Cross-Unix Portability
===========================

In the previous section, we discussed issues related to the C language.
Here we will discuss the portability of C programs across different Unix
implementations.  All modern Unix systems conform to the POSIX.1 (1990
edition) and POSIX.2 (1992 edition) standards.  They also all support
the sockets interface for networking code.  However, there are still
significant differences between systems which can affect portability.

   We will not discuss portability to older Unix systems which do not
conform to the POSIX standards.  If you need this sort of portability,
you can often find some valuable hints in the set of macros defined by
`autoconf', and in the `configure.in' files of older programs which use
`autoconf'.

* Menu:

* Cross-Unix Function Calls::
* Cross-Unix System Interfaces::


File: autobook.info,  Node: Cross-Unix Function Calls,  Next: Cross-Unix System Interfaces,  Up: Cross-Unix Portability

15.2.1 Cross-Unix Function Calls
--------------------------------

Functions not mentioned in POSIX.1 may not be available on all systems.
If you want to use one of these functions, you should normally check for
its presence by using `AC_CHECK_FUNCS' in your `configure.in' script,
and adapt to its absence if possible.  Here is a list of some popular
functions which are available on many, but not all, modern Unix systems:
`alloca'
     There are several portability issues with `alloca'.  See the
     description of `AC_FUNC_ALLOCA' in the autoconf manual.  Although
     this function can be very convenient, it is normally best to avoid
     it in highly portable code.

`dlopen'
     GNU libtool provides a portable alternate interface to `dlopen'.
     *Note Dynamic Loading::.

`getline'
     In some cases `fgets' may be used as a fallback.  In others, you
     will need to provide your own version of this function.

`getpagesize'
     On some systems, the page size is available as the macro
     `PAGE_SIZE' in the header file `sys/param.h'.  On others, the page
     size is available via the `sysconf' function.  If none of those
     work, you must generally simply guess a value such as `4096'.

`gettimeofday'
     When this is not available, fall back to a less precise function
     such as `time' or `ftime' (which itself is not available on all
     systems).

`mmap'
     In some cases you can use either `mmap' or ordinary file I/O.  In
     others, a program which uses `mmap' will simply not be portable to
     all Unix systems.  Note that `mmap' is an optional part of the 1996
     version of POSIX.1, so it is likely to be added to all Unix
     systems over time.

`ptrace'
     Unix systems without `ptrace' generally provide some other
     mechanism for debugging subprocesses, such as `/proc'.  However,
     there is no widely portable method for controlling subprocesses, as
     evidenced by the source code to the GNU debugger, `gdb'.

`setuid'
     Different Unix systems handle this differently.  On some systems,
     any program can switch between the effective user ID of the
     executable and the real user ID.  On others, switching to the real
     user ID is final; some of those systems provide the `setreuid'
     function instead to switch the effective and real user ID.  The
     effect when a program run by the superuser calls `setuid' varies
     among systems.

`snprintf'
     If this is not available, then in some cases it will be reasonable
     to simply use `sprintf', and in others you will need to write a
     little routine to estimate the required length and allocate an
     appropriate buffer before calling `sprintf'.

`strcasecmp'
`strdup'
`strncasecmp'
     You can normally provide your own version of these simple
     functions.

`valloc'
     When this is not available, just use `malloc' instead.

`vfork'
     When this is not available, just use `fork' instead.


File: autobook.info,  Node: Cross-Unix System Interfaces,  Prev: Cross-Unix Function Calls,  Up: Cross-Unix Portability

15.2.2 Cross-Unix System Interfaces
-----------------------------------

There are several Unix system interfaces which have associated
portability issues.  We do not have the space here to discuss all of
these in detail across all Unix systems.  However, we mention them here
to indicate issues where you may need to consider portability.

`curses'
`termcap'
`terminfo'
     Many Unix systems provide the `curses' interface for simple
     graphical terminal access, but the name of the library varies.
     Typical names are `-lcurses' or `-lncurses'.  Some Unix systems do
     not provide `curses', but do provide the `-ltermcap' or
     `-lterminfo' library.  The latter libraries only provide an
     interface to the `termcap' file or `terminfo' files.  These files
     contain information about specific terminals, the difference being
     mainly the manner in which they are stored.

`proc file system'
     The `/proc' file system is not available on all Unix systems, and
     when it is available the actual set of files and their format
     varies.

`pseudo terminals'
     All Unix systems provide pseudo terminals, but the interface to
     obtain them varies widely.  We recommend examining the
     configuration of an existing program which uses them, such as GNU
     emacs or Expect.

`shared libraries'
     Shared libraries differ across Unix systems.  The GNU libtool
     program was written to provide an interface to hide the
     differences.  *Note Introducing GNU Libtool::.

`termios'
`termio'
`tty'
     The `termios' interface to terminals is standard on modern Unix
     systems.  Avoid the older, non-portable, `termio' and `tty'
     interfaces (these interfaces are defined in `termio.h' and
     `sgtty.h', respectively).

`threads'
     Many, but not all, Unix systems support multiple threads in a
     single process, but the interfaces differ.  One thread interface,
     pthreads, was standardized in the 1996 edition of POSIX.1, so Unix
     systems are likely to converge on that interface over time.

`utmp'
`wtmp'
     Most Unix systems maintain the `utmp' and `wtmp' files to record
     information about which users are logged onto the system.
     However, the format of the information in the files varies across
     Unix systems, as does the exact location of the files and the
     functions which some systems provide to access the information.
     Programs which merely need to obtain login information will be
     more portable if they invoke a program such as `w'.  Programs
     which need to update the login information must be prepared to
     handle a range of portability issues.

`X Window System'
     Version 11 of the X Window System is widely available across Unix
     systems.  The actual release number varies somewhat, as does the
     set of available programs and window managers.  Extensions such as
     OpenGL are not available on all systems.


File: autobook.info,  Node: Unix/Windows Portability,  Prev: Cross-Unix Portability,  Up: Writing Portable C

15.3 Unix/Windows Portability
=============================

Unix and Windows are very different operating systems, with very
different APIs and functionality.  However, it is possible to write
programs which run on both Unix and Windows, with significant extra work
and some sacrifice in functionality.  For more information on how GNU
Autotools can help you write programs which run on both Unix and
Windows, see *Note Integration with Cygnus Cygwin::.

* Menu:

* Unix/Windows Emulation::
* Unix/Windows Portable Scripting Language::
* Unix/Windows User Interface Library::
* Unix/Windows Specific Code::
* Unix/Windows Issues::


File: autobook.info,  Node: Unix/Windows Emulation,  Next: Unix/Windows Portable Scripting Language,  Up: Unix/Windows Portability

15.3.1 Unix/Windows Emulation
-----------------------------

The simplest way to write a program which runs on both Unix and Windows
is to use an emulation layer.  This generally results in a program which
runs, but does not really feel like other programs for the operating
system in question.

   For example, the Cygwin package, which is freely available from
Cygnus Solutions(1), provides a Unix API which works on Windows.  This
permits Unix programs to be compiled to run on Windows.  It is even
possible to run an X server in the Cygwin environment, so graphical
programs will work as well, although they will not have the Windows
look and feel.  The Cygwin package is discussed in more detail in *note
Integration with Cygnus Cygwin::.

   There are also commercial packages available to compile Unix programs
for Windows (e.g., Interix) and to compile Windows programs on Unix
(e.g., Bristol Technology).

   The main disadvantage with using an emulation layer is that the
resulting programs have the wrong look and feel.  They do not behave as
users expect, so they are awkward to use.  This is generally not
acceptable for high quality programs.

   ---------- Footnotes ----------

   (1) `http://sourceware.cygnus.com/cygwin/'


File: autobook.info,  Node: Unix/Windows Portable Scripting Language,  Next: Unix/Windows User Interface Library,  Prev: Unix/Windows Emulation,  Up: Unix/Windows Portability

15.3.2 Unix/Windows Portable Scripting Language
-----------------------------------------------

Another approach to Unix/Windows portability is to develop the program
using a portable scripting language.  An example of such a scripting
language is Tcl/Tk(1).  Programs written in Tcl/Tk will work on both
Unix and Windows (and on the Apple Macintosh operating system as well,
for that matter).  Graphical programs will more or less follow the look
and feel for the platform upon which they are run.  Since Tcl/Tk was
originally developed on Unix, graphical Tcl/Tk programs will typically
not look quite right to experienced Windows users, but they will be
usable and of reasonable quality.  Other portable scripting languages
are Perl, Python, and Guile.

   One disadvantage of this approach is that scripting languages tend
to be less efficient than straight C code, but it is often possible to
recode important routines in C.  Another disadvantage is the need to
learn a new language, one which furthermore may not be well designed
for large programming projects.

   ---------- Footnotes ----------

   (1) `http://www.scriptics.com/'


File: autobook.info,  Node: Unix/Windows User Interface Library,  Next: Unix/Windows Specific Code,  Prev: Unix/Windows Portable Scripting Language,  Up: Unix/Windows Portability

15.3.3 Unix/Windows User Interface Library
------------------------------------------

Some programs' main interaction with the operating system is drawing on
the screen.  It is often possible to write such programs using a cross
platform user interface library.

   A cross-platform user interface library is a library providing basic
windowing functions which has been implemented separately for Unix and
Windows.  The program calls generic routines which are translated into
the appropriate calls on each platform.  These libraries generally
provide a good look and feel on each platform, so this can be a
reasonable approach for programs which do not require additional
services from the system.

   The main disadvantage is the least common denominator effect: the
libraries often only provide functionality which is available on both
Unix and Windows.  Features specific to either Unix or Windows may be
very useful for the program, but they may not be available via the
library.


File: autobook.info,  Node: Unix/Windows Specific Code,  Next: Unix/Windows Issues,  Prev: Unix/Windows User Interface Library,  Up: Unix/Windows Portability

15.3.4 Unix/Windows Specific Code
---------------------------------

When writing a program which should run on both Unix and Windows, it is
possible to simply write different code for the two platforms.  This
requires a careful separation of the operating system interface,
including the graphical user interface, from the rest of the program.
An API must be designed to provide the system needs, and that API must
be implemented separately on Unix and Windows.  The API should be set at
an appropriate level to avoid the least common denominator effect.

   This approach can be useful for a program which has significant
platform independent computation as well as significant user interface
or other system needs.  It generally produces better results than the
other approaches discussed above.  The disadvantage is that this
approach requires much more work that the others discussed above.


File: autobook.info,  Node: Unix/Windows Issues,  Prev: Unix/Windows Specific Code,  Up: Unix/Windows Portability

15.3.5 Unix/Windows Issues
--------------------------

Whatever approach is used to support the program on both Unix and
Windows, there are certain issues which may affect the design of the
program, or many specific areas of the program.

* Menu:

* Unix/Windows Text/Binary::
* Unix/Windows Filesystems::
* Unix/Windows Miscellaneous::


File: autobook.info,  Node: Unix/Windows Text/Binary,  Next: Unix/Windows Filesystems,  Up: Unix/Windows Issues

15.3.5.1 Text and Binary Files
..............................

Windows supports two different types of files: text files and binary
files.  On Unix, there is no such distinction.  On Windows, any program
which uses files must know whether each file is text or binary, and open
and use them accordingly.

   In a text file on Windows, each line is terminated with a carriage
return character followed by a line feed character.  When the file is
read by a C program in text mode, the C library converts each carriage
return/line feed pair into a single line feed character.  If the file is
read in binary mode, the program will see both the carriage return and
the line feed.

   You may have seen this distinction when transferring files between
Unix and Window systems via FTP.  You need to set the FTP program into
binary or text mode as appropriate for the file you want to transfer.

   When transferring a binary file, the FTP program simply transfers
the data unchanged.  When transferring a text file, the FTP program
must convert each carriage return/line feed pair into a single line
feed.

   When using the C standard library, a binary file is indicated by
adding `b' after the `r', `w', or `a' in the call to `fopen'.  When
reading a text file, the program can not simply count characters and
use that when computing arguments to `fseek'.


File: autobook.info,  Node: Unix/Windows Filesystems,  Next: Unix/Windows Miscellaneous,  Prev: Unix/Windows Text/Binary,  Up: Unix/Windows Issues

15.3.5.2 File system Issues
...........................

There are several differences between the file systems used on Unix and
Windows, mainly in the areas of what names can be used for files.  The
program `doschk', which can be found in the gcc distribution, may be
used on Unix to check for filenames which are not permitted on DOS or
Windows.

* Menu:

* DOS Filename Restrictions::
* Windows File Name Case::
* Windows Whitespace in File Names::
* Windows Separators and Drive Letters::


File: autobook.info,  Node: DOS Filename Restrictions,  Next: Windows File Name Case,  Up: Unix/Windows Filesystems

15.3.5.3 DOS Filename Restrictions
..................................

The older DOS FAT file systems have severe limitations on file names.
These limitations no longer apply to Windows, but they do apply to DOS
based systems such as DJGPP.

   A file name may consist of no more than 8 characters, followed by an
optional extension of no more than 3 characters.  This is commonly
referred to as an 8.3 file name.  Filenames are case insensitive.

   There are a couple of filenames which are treated specially.  You can
not name a file `aux' or `prn'.  In some cases, you can not even use an
extension, such as `aux.c'.  These restrictions apply to DOS and also
to at least some versions of Windows.


File: autobook.info,  Node: Windows File Name Case,  Next: Windows Whitespace in File Names,  Prev: DOS Filename Restrictions,  Up: Unix/Windows Filesystems

15.3.5.4 Windows File Name Case
...............................

Windows normally folds case when referring to files, unlike Unix.  That
is, on Windows, the file names `file', `File', and `FiLe' all refer to
the same file.  You must be aware of this when porting Unix programs to
Windows, as the Unix programs may expect that using different case is
reflected in the file system.

   For example, the procedure used to build the program `perl' from
source relies on distinguishing between the files `PERL' and `perl'.
This fails on Windows.

   As a matter of interest, the Windows file system stores files under
the name with which they were created.  The DOS shell displays the
names in all upper case.  The `Explorer' shell displays them with each
word in the file name capitalized.


File: autobook.info,  Node: Windows Whitespace in File Names,  Next: Windows Separators and Drive Letters,  Prev: Windows File Name Case,  Up: Unix/Windows Filesystems

15.3.5.5 Whitespace in File Names
.................................

Both Unix and Windows file systems permit whitespace in file names.
However, Unix users rarely take advantage of this, while Windows users
often do.  For example, many Windows systems use a directory named
`Program Files', whose name has an embedded space.  This is a clash of
conventions.

   Many programs developed on Unix unintentionally assume that there
will be no spaces in file and directory names, and behave mysteriously
if any are encountered.  On Unix these bugs will almost never be seen.
On Windows, they will pop up immediately.

   When writing a program which must run on Windows, consider these
issues.  Don't forget to test it on directories and files with embedded
spaces.


File: autobook.info,  Node: Windows Separators and Drive Letters,  Prev: Windows Whitespace in File Names,  Up: Unix/Windows Filesystems

15.3.5.6 Windows Separators and Drive Letters
.............................................

On Unix, directories in a file name are separated by a forward slash
(`/').  On Windows, directories are separated by a backward slash
(`\').  For example, the Unix file `dir/file' on Windows would be
`dir\file'.(1)

   On Unix, a list of directories is normally separated by a colon
(`:').  On Windows, a list of directories is normally separated by a
semicolon (`;').  For example, a simple Unix search path might look
like this: `/bin:/usr/bin'.  The same search path on Windows would
probably look like this: `c:\bin;c:\usr\bin'.

   On Unix, the file system is a single tree rooted at the directory
simply named `/'.  On Windows, there are multiple file system trees.
Absolute file names often start with a drive letter followed by a colon.
Windows maintains a default drive, and a default directory on each
drive, which can make it difficult for a program to convert a relative
file name into the absolute file name intended by the user.  Windows
permits referring to files on other systems by using a file name which
starts with two slashes followed by a system name.

   ---------- Footnotes ----------

   (1) Windows does permit a program to use a forward slash to separate
directories when calling routines such as `fopen'.  However, Windows
users do not expect to type forward slashes when they enter file names,
and they do not expect to see forward slashes when a file name is
printed.


File: autobook.info,  Node: Unix/Windows Miscellaneous,  Prev: Unix/Windows Filesystems,  Up: Unix/Windows Issues

15.3.5.7 Miscellaneous Issues
.............................

Windows shared libraries (DLLs) are different from typical Unix shared
libraries.  They require special declarations for global variables
declared in a shared library.  Programs which use shared libraries must
generally use special macros in their header files to define these
appropriately.  GNU libtool can help with some shared library issues,
but not all.

   There are some Unix system features which are not supported under
Windows: pseudo terminals, effective user ID, file modes with
user/group/other permission, named FIFOs, an executable overriding
functions called by shared libraries, `select' on anything other than
sockets.

   There are some Windows system features which are not supported under
Unix: the Windows event loop, many graphical capabilities, some aspects
of the rich set of interthread communication mechanisms, the
`WSAAsyncSelect' function.  You should keep these issues in mind when
designing and writing a program which should run on both Unix and
Windows.


File: autobook.info,  Node: Writing Portable C++,  Next: Dynamic Loading,  Prev: Writing Portable C,  Up: Top

16 Writing Portable C++ with GNU Autotools
******************************************

My first task in industry was to port a large C++ application from one
Unix platform to another.  My colleagues immediately offered their
sympathies and I remember my initial reaction-`what's the big deal?'.
After all, this application used the C++ standard library, a modest
subset of common Unix system calls and C++ was approaching ISO
standardization.  Little did I know what lay ahead--endless hurdles
imposed by differences to C++ implementations in use on those platforms.

   Being essentially a superset of the C programming language, C++
suffers from all of the machine-level portability issues described in
*Note Writing Portable C::.  In addition to this, variability in the
language and standard libraries present additional trouble when writing
portable C++ programs.

   There have been comprehensive guides written on C++ portability
(*note Further Reading::).  This chapter will attempt to draw attention
to the less portable areas of the C++ language and describe how the GNU
Autotools can help you overcome these (*note How GNU Autotools Can
Help::).  In many instances, the best approach to multi-platform C++
portability is to simply re-express your programs using more widely
supported language constructs.  Fortunately, this book has been written
at a time when the C++ language standard has been ratified and C++
implementations are rapidly conforming.  Gladly, as time goes on the
necessity for this chapter will diminish.

* Menu:

* Brief History of C++::
* Changeable C++::
* Compiler Quirks::
* How GNU Autotools Can Help::
* Further Reading::


File: autobook.info,  Node: Brief History of C++,  Next: Changeable C++,  Up: Writing Portable C++

16.1 Brief History of C++
=========================

C++ was developed in 1983 by Bjarne Stroustrup at AT&T.  Stroustrup was
seeking a new object-oriented language with which to write simulations.
C++ has now become a mainstream systems programming language and is
increasingly being used to implement free software packages. C++
underwent a lengthy standardization process and was ratified as an ISO
standard in 1998.

   The first specification of C++ was available in a book titled `The
Annotated C++ Reference Manual' by Stroustrup and Ellis, also known as
the `ARM'.  Since this initial specification, C++ has developed in some
areas.  These developments will be discussed in *Note Changeable C++::.

   The first C++ compiler, known as "cfront", was produced by
Stroustrup at AT&T.  Because of its strong ties to C and because C is
such a general purpose systems language, cfront consisted of a
translator from C++ to C.  After translation, an existing C compiler was
used to compile the intermediate C code down to machine code for almost
any machine you care to mention.  C++ permits overloaded
functions--that is, functions with the same name but different argument
lists, so cfront implemented a _name mangling_ algorithm (*note Name
Mangling::) to give each function a unique name in the linker's symbol
table.

   In 1989, the first true C++ compiler, G++, was written by Michael
Tiemann of Cygnus Support.  G++ mostly consisted of a new front-end to
the GCC portable compiler, so G++ was able to produce code for most of
the targets that GCC already supported.

   In the years following, a number of new C++ compilers were produced.
Unfortunately many were unable to keep pace with the development of the
language being undertaken by the standards committee.  This divergence
of implementations is the fundamental cause of non-portable C++
programs.


File: autobook.info,  Node: Changeable C++,  Next: Compiler Quirks,  Prev: Brief History of C++,  Up: Writing Portable C++

16.2 Changeable C++
===================

The C++ standard encompasses the language and the interface to the
standard library, including the Standard Template Library (*note
Standard Template Library::).  The language has evolved somewhat since
the ARM was published; mostly driven by the experience of early C++
users.

   In this section, the newer features of C++ will be briefly explained.
Alternatives to these features, where available, will be presented when
compiler support is lacking.  The alternatives may be used if you need
to make your code work with older C++ compilers or to avoid these
features until the compilers you are concerned with are mature.  If you
are releasing a free software package to the wider community, you may
need to specify a minimum level of standards conformance for the
end-user's C++ compiler, or use the unappealing alternative of using
lowest-common denominator C++ features.

   In covering these, we'll address the following language features:

   * Built-in `bool' type

   * Exceptions

   * Casts

   * Variable scoping in `for' loops

   * Namespaces

   * The `explicit' keyword

   * The `mutable' keyword

   * The `typename' keyword

   * Runtime Type Identification (RTTI)

   * Templates

   * Default template arguments

   * Standard library headers

   * Standard Template Library (STL)

* Menu:

* Built-in bool type::
* Exceptions::
* Casts::
* Variable Scoping in For Loops::
* Namespaces::
* The explicit Keyword::
* The mutable Keyword::
* The typename Keyword::
* Runtime Type Identification (RTTI)::
* Templates::
* Default template arguments::
* Standard library headers::
* Standard Template Library::


File: autobook.info,  Node: Built-in bool type,  Next: Exceptions,  Up: Changeable C++

16.2.1 Built-in bool type
-------------------------

C++ introduced a built-in boolean data type called `bool'.  The
presence of this new type makes it unnecessary to use an `int' with the
values `0' and `1' and improves type safety.  The two possible values
of a `bool' are `true' and `false'-these are reserved words.  The
compiler knows how to coerce a `bool' into an `int' and vice-versa.

   If your compiler does not have the `bool' type and `false' and
`true' keywords, an alternative is to produce such a type using a
`typedef' of an enumeration representing the two possible values:

     enum boolvals { false, true };
     typedef enum boolvals bool;

   What makes this simple alternative attractive is that it prevents
having to adjust the prolific amount of code that might use `bool'
objects once your compiler supports the built-in type.


File: autobook.info,  Node: Exceptions,  Next: Casts,  Prev: Built-in bool type,  Up: Changeable C++

16.2.2 Exceptions
-----------------

Exception handling is a language feature present in other modern
programming languages.  Ada and Java both have exception handling
mechanisms.  In essence, exception handling is a means of propagating a
classified error by unwinding the procedure call stack until the error
is caught by a higher procedure in the procedure call chain.  A
procedure indicates its willingness to handle a kind of error by
_catching_ it:

     void foo ();

     void
     func ()
     {
       try {
         foo ();
       }
       catch (...) {
         cerr << "foo failed!" << endl;
       }
     }

   Conversely, a procedure can throw an exception when something goes
wrong:

     typedef int io_error;

     void
     init ()
     {
       int fd;
       fd = open ("/etc/passwd", O_RDONLY);
       if (fd < 0) {
         throw io_error(errno);
       }
     }

   C++ compilers tend to implement exception handling in full, or not at
all.  If any C++ compiler you may be concerned with does not implement
exception handling, you may wish to take the lowest common denominator
approach and eliminate such code from your project.


File: autobook.info,  Node: Casts,  Next: Variable Scoping in For Loops,  Prev: Exceptions,  Up: Changeable C++

16.2.3 Casts
------------

C++ introduced a collection of _named_ casting operators to replace the
conventional C-style cast of the form `(type) expr'.  The new casting
operators are `static_cast', `reinterpret_cast', `dynamic_cast' and
`const_cast'.  They are reserved words.

   These refined casting operators are vastly preferred over
conventional C casts for C++ programming.  In fact, even Stroustrup
recommends that the older style of C casts be banished from programming
projects where at all possible `The C++ Programming Language, 3rd
edition'.  Reasons for preferring the new named casting operators
include:

   - They provide the programmer with a mechanism for more explicitly
     specifying the kind of type conversion.  This assists the compiler
     in identifying incorrect conversions.

   - They are easier to locate in source code, due to their unique
     syntax: `X_cast<type>(expr)'.

   If your compiler does not support the new casting operators, you may
have to continue to use C-style casts--and carefully!  I have seen one
project agree to use macros such as the one shown below to encourage
those involved in the project to adopt the new operators.  While the
syntax does not match that of the genuine operators, these macros make
it easy to later locate and alter the casts where they appear in source
code.

     #define static_cast(T,e) (T) e


File: autobook.info,  Node: Variable Scoping in For Loops,  Next: Namespaces,  Prev: Casts,  Up: Changeable C++

16.2.4 Variable Scoping in For Loops
------------------------------------

C++ has always permitted the declaration of a control variable in the
initializer section of `for' loops:

     for (int i = 0; i < 100; i++)
     {
       ...
     }

   The original language specification allowed the control variable to
remain live until the end of the scope of the loop itself:

     for (int i = 0; i < j; i++)
     {
       if (some condition)
         break;
     }

     if (i < j)
       // loop terminated early

   In a later specification of the language, the control variable's
scope only exists within the body of the `for' loop.  The simple
resolution to this incompatible change is to not use the older style.
If a control variable needs to be used outside of the loop body, then
the variable should be defined before the loop:

     int i;

     for (i = 0; i < j; i++)
     {
       if (some condition)
         break;
     }

     if (i < j)
       // loop terminated early


File: autobook.info,  Node: Namespaces,  Next: The explicit Keyword,  Prev: Variable Scoping in For Loops,  Up: Changeable C++

16.2.5 Namespaces
-----------------

C++ namespaces are a facility for expressing a relationship between a
set of related declarations such as a set of constants.  Namespaces also
assist in constraining _names_ so that they will not collide with other
identical names in a program.  Namespaces were introduced to the
language in 1993 and some early compilers were known to have incorrectly
implemented namespaces.  Here's a small example of namespace usage:

     namespace Animals {
       class Bird {
       public:
         fly (); {} // fly, my fine feathered friend!
       };
     };

     // Instantiate a bird.
     Animals::Bird b;

   For compilers which do not correctly support namespaces it is
possible to achieve a similar effect by placing related declarations
into an enveloping structure.  Note that this utilises the fact that C++
structure members have public protection by default:

     struct Animals {
       class Bird {
       public:
         fly (); {} // fly, my find feathered friend!
       };
     protected
       // Prohibit construction.
       Animals ();
     };

     // Instantiate a bird.
     Animals::Bird b;


File: autobook.info,  Node: The explicit Keyword,  Next: The mutable Keyword,  Prev: Namespaces,  Up: Changeable C++

16.2.6 The `explicit' Keyword
-----------------------------

C++ adopted a new `explicit' keyword to the language.  This keyword is
a qualifier used when declaring constructors.  When a constructor is
declared as `explicit', the compiler will never call that constructor
implicitly as part of a type conversion.  This allows the compiler to
perform stricter type checking and to prevent simple programming
errors.  If your compiler does not support the `explicit' keyword, you
should avoid it and do without the benefits that it provides.


File: autobook.info,  Node: The mutable Keyword,  Next: The typename Keyword,  Prev: The explicit Keyword,  Up: Changeable C++

16.2.7 The `mutable' Keyword
----------------------------

C++ classes can be designed so that they behave correctly when `const'
objects of those types are declared.  Methods which do not alter
internal object state can be qualified as `const':

     class String
     {
     public:
       String (const char* s);
       ~String ();

       size_t Length () const { return strlen (buffer); }

     private:
       char* buffer;
     };

   This simple, though incomplete, class provides a `Length' method
which guarantees, by virtue of its `const' qualifier, to never modify
the object state.  Thus, `const' objects of this class can be
instantiated and the compiler will permit callers to use such objects'
`Length' method.

   The `mutable' keyword enables classes to be implemented where the
concept of constant objects is sensible, but details of the
implementation make it difficult to declare essential methods as
`const'.  A common application of the `mutable' keyword is to implement
classes that perform caching of internal object data.  A method may not
modify the logical state of the object, but it may need to update a
cache-an implementation detail.  The data members used to implement the
cache storage need to be declared as `mutable' in order for `const'
methods to alter them.

   Let's alter our rather farfetched `String' class so that it
implements a primitive cache that avoids needing to call the `strlen'
library function on each invocation of `Length ()':

     class String
     {
     public:
       String (const char* s) :length(-1) { /* copy string, etc. */ }
       ~String ();

       size_t Length () const
       {
         if (length < 0)
           length = strlen(buffer);
         return length;
       }

     private:
       char* buffer;
       mutable size_t length;
     }

   When the `mutable' keyword is not available, your alternatives are
to avoid implementing classes that need to alter internal data, like our
caching string class, or to use the `const_cast' casting operator
(*note Casts::) to cast away the `constness' of the object.


File: autobook.info,  Node: The typename Keyword,  Next: Runtime Type Identification (RTTI),  Prev: The mutable Keyword,  Up: Changeable C++

16.2.8 The `typename' Keyword
-----------------------------

The `typename' keyword was added to C++ after the initial specification
and is not recognized by all compilers.  It is a hint to the compiler
that a name following the keyword is the name of a type.  In the usual
case, the compiler has sufficient context to know that a symbol is a
defined type, as it must have been encountered earlier in the
compilation:

     class Foo
     {
     public:
       typedef int map_t;
     };

     void
     func ()
     {
       Foo::map_t m;
     }

   Here, `map_t' is a type defined in class `Foo'.  However, if `func'
happened to be a function template, the class which contains the
`map_t' type may be a template parameter.  In this case, the compiler
simply needs to be guided by qualifying `T::map_t' as a _type name_:

     class Foo
     {
     public:
       typedef int map_t;
     };

     template <typename T>
     void func ()
     {
       typename T::map_t t;
     }


File: autobook.info,  Node: Runtime Type Identification (RTTI),  Next: Templates,  Prev: The typename Keyword,  Up: Changeable C++

16.2.9 Runtime Type Identification (RTTI)
-----------------------------------------

Run-time Type Identification, or RTTI, is a mechanism for interrogating
the type of an object at runtime.  Such a mechanism is useful for
avoiding the dreaded _switch-on-type_ technique used before RTTI was
incorporated into the language.  Until recently, some C++ compilers did
not support RTTI, so it is necessary to assume that it may not be
widely available.

   Switch-on-type involves giving all classes a method that returns a
special type token that an object can use to discover its own type.  For
example:

             class Shape
             {
             public:
               enum types { TYPE_CIRCLE, TYPE_SQUARE };
               virtual enum types type () = 0;
             };

             class Circle: public Shape
             {
             public:
              enum types type () { return TYPE_CIRCLE; }
             };

             class Square: public Shape
             {
             public:
               enum types type () { return TYPE_SQUARE; }
             };

   Although switch-on-type is not elegant, RTTI isn't particularly
object-oriented either.  Given the limited number of times you ought to
be using RTTI, the switch-on-type technique may be reasonable.


File: autobook.info,  Node: Templates,  Next: Default template arguments,  Prev: Runtime Type Identification (RTTI),  Up: Changeable C++

16.2.10 Templates
-----------------

Templates--known in other languages as _generic types_--permit you to
write C++ classes which represent parameterized data types.  A common
application for _class templates_ is container classes.  That is,
classes which implement data structures that can contain data of any
type.  For instance, a well-implemented binary tree is not interested
in the type of data in its nodes.  Templates have undergone a number of
changes since their initial inclusion in the ARM.  They are a
particularly troublesome C++ language element in that it is difficult to
implement templates well in a C++ compiler.

   Here is a fictitious and overly simplistic C++ class template that
implements a fixed-sized stack.  It provides a pair of methods for
setting (and getting) the element at the bottom of the stack.  It uses
the modern C++ template syntax, including the new `typename' keyword
(*note The typename Keyword::).

     template <typename T> class Stack
     {
     public:
       T first () { return stack[9]; }
       void set_first (T t) { stack[9] = t; }

     private:
       T stack[10];
     };

   C++ permits this class to be instantiated for any type you like,
using calling code that looks something like this:

     int
     main ()
     {
       Stack<int> s;
       s.set_first (7);
       cout << s.first () << endl;
       return 0;
     }

   An old trick for fashioning class templates is to use the C
preprocessor.  Here is our limited `Stack' class, rewritten to avoid
C++ templates:

     #define Stack(T) \
       class Stack__##T##__LINE__ \
       { \
       public: \
         T first () { return stack[0]; } \
         void set_first (T t) { stack[0] = t; } \
       \
       private: \
         T stack[10]; \
       }

   There is a couple of subtleties being used here that should be
highlighted.  This generic class declaration uses the C preprocessor
operator `##' to generate a type name which is unique amongst stacks of
any type.  The `__LINE__' macro is defined by the preprocessor and is
used here to maintain unique names when the template is instantiated
multiple times.  The trailing semicolon that must follow a class
declaration has been omitted from the macro.

     int
     main ()
     {
       Stack (int) s;
       s.set_first (7);
       cout << s.first () << endl;
       return 0;
     }

   The syntax for instantiating a `Stack' is slightly different to
modern C++, but it does work relatively well, since the C++ compiler
still applies type checking after the preprocessor has expanded the
macro.  The main problem is that unless you go to great lengths, the
generated type name (such as `Stack__int') could collide with other
instances of the same type in the program.


File: autobook.info,  Node: Default template arguments,  Next: Standard library headers,  Prev: Templates,  Up: Changeable C++

16.2.11 Default template arguments
----------------------------------

A later refinement to C++ templates was the concept of _default
template arguments_.  Templates allow C++ types to be _parameterized_
and as such, the parameter is in essence a variable that the programmer
must specify when instantiating the template. This refinement allows
defaults to be specified for the template parameters.

   This feature is used extensively throughout the Standard Template
Library (*note Standard Template Library::) to relieve the programmer
from having to specify a comparison function for sorted container
classes.  In most circumstances, the default less-than operator for the
type in question is sufficient.

   If your compiler does not support default template arguments, you may
have to suffer without them and require that users of your class and
function templates provide the default parameters themselves.
Depending on how inconvenient this is, you might begrudgingly seek some
assistance from the C preprocessor and define some preprocessor macros.


File: autobook.info,  Node: Standard library headers,  Next: Standard Template Library,  Prev: Default template arguments,  Up: Changeable C++

16.2.12 Standard library headers
--------------------------------

Newer C++ implementations provide a new set of standard library header
files.  These are distinguished from older incompatible header files by
their filenames--the new headers omit the conventional `.h' extension.
Classes and other declarations in the new headers are placed in the
`std' namespace.  Detecting the kind of header files present on any
given system is an ideal application of Autoconf.  For instance, the
header `<vector>' declares the class `std::vector<T>'.  However, if it
is not available, `<vector.h>' declares the class `vector<T>' in the
global namespace.


File: autobook.info,  Node: Standard Template Library,  Prev: Standard library headers,  Up: Changeable C++

16.2.13 Standard Template Library
---------------------------------

The Standard Template Library (STL) is a library of containers,
iterators and algorithms.  I tend to think of the STL in terms of the
container classes it provides, with algorithms and iterators necessary
to make these containers useful.  By segregating these roles, the STL
becomes a powerful library--containers can store any kind of data and
algorithms can use iterators to traverse the containers.

   There are about half a dozen STL implementations.  Since the STL
relies so heavily on templates, these implementations tend to inline
all of their method definitions.  Thus, there are no precompiled STL
libraries, and as an added bonus, you're guaranteed to get the source
code to your STL implementation.  Hewlett-Packard and SGI produce
freely redistributable STL implementations.

   It is widely known that the STL can be implemented with complex C++
constructs and is a certain workout for any C++ compiler.  The best
policy for choosing an STL is to use a modern compiler such as GCC 2.95
or to use the STL that your vendor may have provided as part of their
compiler.

   Unfortunately, using the STL is pretty much an `all or nothing'
proposition.  If it is not available on a particular system, there are
no viable alternatives.  There is a macro in the Autoconf macro archive
(*note Autoconf macro archive::) that can test for a working STL.


File: autobook.info,  Node: Compiler Quirks,  Next: How GNU Autotools Can Help,  Prev: Changeable C++,  Up: Writing Portable C++

16.3 Compiler Quirks
====================

C++ compilers are complex pieces of software.  Sadly, sometimes the
details of a compiler's implementations leak out and bother the
application programmer.  The two aspects of C++ compiler implementation
that have caused grief in the past are efficient template instantiation
and name mangling.  Both of these aspects will be explained.

* Menu:

* Template Instantiation::
* Name Mangling::


File: autobook.info,  Node: Template Instantiation,  Next: Name Mangling,  Up: Compiler Quirks

16.3.1 Template Instantiation
-----------------------------

The problem with template instantiation exists because of a number of
complex constraints:

   - The compiler should only generate an instance of a template once,
     to speed the compilation process.

   - The linker needs to be smart about where to locate the object code
     for instantiations produced by the compiler.

   This problem is exacerbated by separate compilation--that is, the
method bodies for `List<T>' may be located in a header file or in a
separate compilation unit.  These files may even be in a different
directory than the current directory!

   Life is easy for the compiler when the template definition appears in
the same compilation unit as the site of the instantiation--everything
that is needed is known:

     template <class T> class List
     {
     private:
       T* head;
       T* current;
     };

     List<int> li;

   This becomes significantly more difficult when the site of a template
instantiation and the template definition is split between two different
compilation units.  In `Linkers and Loaders', Levine describes in
detail how the compiler driver deals with this by iteratively attempting
to link a final executable and noting, from `undefined symbol' errors
produced by the linker, which template instantiations must be performed
to successfully link the program.

   In large projects where templates may be instantiated in multiple
locations, the compiler may generate instantiations multiple times for
the same type.  Not only does this slow down compilation, but it can
result in some difficult problems for linkers which refuse to link
object files containing duplicate symbols.  Suppose there is the
following directory layout:

     src
     |
     `--- core
     |    `--- core.cxx
     `--- modules
     |    `--- http.cxx
     `--- lib
          `--- stack.h

   If the compiler generates `core.o' in the `core' directory and
`libhttp.a' in the `http' directory, the final link may fail because
`libhttp.a' and the final executable may contain duplicate
symbols--those symbols generated as a result of both `http.cxx' and
`core.cxx' instantiating, say, a `Stack<int>'.  Linkers, such as that
provided with AIX will allow duplicate symbols during a link, but many
will not.

   Some compilers have solved this problem by maintaining a template
repository of template instantiations.  Usually, the entire template
definition is expanded with the specified type parameters and compiled
into the repository, leaving the linker to collect the required object
files at link time.

   The main concerns about non-portability with repositories center
around getting your compiler to do the right thing about maintaining a
single repository across your entire project.  This often requires a
vendor-specific command line option to the compiler, which can detract
from portability.  It is conceivable that Libtool could come to the
rescue here in the future.


File: autobook.info,  Node: Name Mangling,  Prev: Template Instantiation,  Up: Compiler Quirks

16.3.2 Name Mangling
--------------------

Early C++ compilers mangled the names of C++ symbols so that existing
linkers could be used without modification.  The cfront C++ translator
also mangled names so that information from the original C++ program
would not be lost in the translation to C.  Today, name mangling remains
important for enabling overloaded function names and link-time type
checking.  Here is an example C++ source file which illustrates name
mangling in action:

     class Foo
     {
     public:
       Foo ();

       void go ();
       void go (int where);

     private:
       int pos;
     };

     Foo::Foo ()
     {
       pos = 0;
     }

     void
     Foo::go ()
     {
       go (0);
     }

     void
     Foo::go (int where)
     {
       pos = where;
     }

     int
     main ()
     {
       Foo f;
       f.go (10);
     }

     $ g++ -Wall example.cxx -o example.o

     $ nm --defined-only example.o
     00000000 T __3Foo
     00000000 ? __FRAME_BEGIN__
     00000000 t gcc2_compiled.
     0000000c T go__3Foo
     0000002c T go__3Fooi
     00000038 T main

   Even though `Foo' contains two methods with the same name, their
argument lists (one taking an `int', one taking no arguments) help to
differentiate them once their names are mangled.  The `go__3Fooi' is
the version which takes an `int' argument.  The `__3Foo' symbol is the
constructor for `Foo'.  The GNU binutils package includes a utility
called `c++filt' that can demangle names.  Other proprietary tools
sometimes include a similar utility, although with a bit of
imagination, you can often demangle names in your head.

     $ nm --defined-only example.o | c++filt
     00000000 T Foo::Foo(void)
     00000000 ? __FRAME_BEGIN__
     00000000 t gcc2_compiled.
     0000000c T Foo::go(void)
     0000002c T Foo::go(int)
     00000038 T main

   Name mangling algorithms differ between C++ implementations so that
object files assembled by one tool chain may not be linked by another if
there are legitimate reasons to prohibit linking.  This is a deliberate
move, as other aspects of the object file may make them
incompatible--such as the calling convention used for making function
calls.

   This implies that C++ libraries and packages cannot be practically
distributed in binary form.  Of course, you were intending to distribute
the source code to your package anyway, weren't you?


File: autobook.info,  Node: How GNU Autotools Can Help,  Next: Further Reading,  Prev: Compiler Quirks,  Up: Writing Portable C++

16.4 How GNU Autotools Can Help
===============================

Each of the GNU Autotools contribute to C++ portability.  Now that you
are familiar with the issues, the following subsections will outline
precisely how each tool contributes to achieving C++ portability.

* Menu:

* Testing C++ Implementations with Autoconf::
* Automake C++ support::
* Libtool C++ support::


File: autobook.info,  Node: Testing C++ Implementations with Autoconf,  Next: Automake C++ support,  Up: How GNU Autotools Can Help

16.4.1 Testing C++ Implementations with Autoconf
------------------------------------------------

Of the GNU Autotools, perhaps the most valuable contribution to the
portability of your C++ programs will come from Autoconf.  All of the
portability issues raised in *Note Changeable C++:: can be detected
using Autoconf macros.

   Luc Maisonobe has written a large suite of macros for this purpose
and they can be found in the Autoconf macro archive (*note Autoconf
macro archive::).  If any of these macros become important enough, they
may become incorporated into the core Autoconf release.  These macros
perform their tests by compiling small fragments of C++ code to ensure
that the compiler accepts them.  As a side effect, these macros
typically use `AC_DEFINE' to define preprocessor macros of the form
`HAVE_feature', which may then be exploited through conditional
compilation.


File: autobook.info,  Node: Automake C++ support,  Next: Libtool C++ support,  Prev: Testing C++ Implementations with Autoconf,  Up: How GNU Autotools Can Help

16.4.2 Automake C++ support
---------------------------

Automake provides support for compiling C++ programs.  In fact, it makes
it practically trivial: files listed in a `SOURCES' primary may include
`.c++', `.cc', `.cpp', `.cxx' or `.C' extensions and Automake will know
to use the C++ compiler to build them.

   For a project containing C++ source code, it is necessary to invoke
the `AC_PROG_CXX' macro in `configure.in' so that Automake knows how to
run the most suitable compiler.  Fortunately, when little details like
this happen to escape you, `automake' will produce a warning:

     $ automake
     automake: Makefile.am: C++ source seen but CXX not defined in
     automake: Makefile.am: `configure.in'


File: autobook.info,  Node: Libtool C++ support,  Prev: Automake C++ support,  Up: How GNU Autotools Can Help

16.4.3 Libtool C++ support
--------------------------

At the moment, Libtool is the weak link in the chain when it comes to
working with C++.  It is very easy to naively build a shared library
from C++ source using `libtool':

     $ libtool -mode=link g++ -o libfoo.la -rpath /usr/local/lib foo.c++

This works admirably for trivial examples, but with real code, there are
several things that can go wrong:

   - On many architectures, for a variety of reasons, `libtool' needs
     to perform object linking using `ld'.  Unfortunately, the C++
     compiler often links in standard libraries at this stage, and using
     `ld' causes them to be dropped.

     This can be worked around (at the expense of portability) by
     explicitly adding these missing libraries to the link line in your
     `Makefile'.  You could even write an Autoconf macro to probe the
     host machine to discover likely candidates.

   - The C++ compiler likes to instantiate static constructors in the
     library objects, which C++ programmers often rely on.  Linking
     with `ld' will cause this to fail.

     The only reliable way to work around this currently is to not
     write C++ that relies on static constructors in libraries.   You
     might be lucky enough to be able to link with `LD=$CXX' in your
     environment with some projects, but it would be prone to stop
     working as your project develops.

   - Libtool's inter-library dependency analysis can fail when it can't
     find the special runtime library dependencies added to a shared
     library by the C++ compiler at link time.

     The best way around this problem is to explicitly add these
     dependencies to `libtool''s link line:

          $ libtool -mode=link g++ -o libfoo.la -rpath /usr/local/lib foo.cxx \
          -lstdc++ -lg++

   Now that C++ compilers on Unix are beginning to see widespread
acceptance and are converging on the ISO standard, it is becoming
unacceptable for Libtool to impose such limits.  There is work afoot to
provide generalized multi-language and multi-compiler support into
Libtool---currently slated to arrive in Libtool 1.5.  Much of the work
for supporting C++ is already finished at the time of writing, pending
beta testing and packaging(1).

   ---------- Footnotes ----------

   (1) Visit the Libtool home page at
`http://www.gnu.org/software/libtool' for breaking news.


File: autobook.info,  Node: Further Reading,  Prev: How GNU Autotools Can Help,  Up: Writing Portable C++

16.5 Further Reading
====================

A number of books have been published which are devoted to the topic of
C++ portability.  Unfortunately, the problem with printed publications
that discuss the state of C++ is that they date quickly.  These
publications may also fail to cover inadequacies of your particular
compiler, since portability know-how is something that can only be
acquired by collective experience.

   Instead, online guides such as the Mozilla C++ Portability Guide (1)
tend to be a more useful resource.  An online guide such as this can
accumulate the knowledge of a wider developer community and can be
readily updated as new facts are discovered.  Interestingly, the Mozilla
guide is aggressive in its recommendations for achieving true C++
portability: item 3, for instance, states `Don't use exceptions'.
While you may not choose to follow each recommendation, there is
certainly a lot of useful experience captured in this document.

   ---------- Footnotes ----------

   (1) `http://www.mozilla.org/hacking/portable-cpp.html'


File: autobook.info,  Node: Dynamic Loading,  Next: Using GNU libltdl,  Prev: Writing Portable C++,  Up: Top

17 Dynamic Loading
******************

An increasingly popular way of adding functionality to a project is to
give a program the ability to dynamically load plugins, or modules.   By
doing this your users can extend your project in new ways, which even
you perhaps hadn't envisioned.  "Dynamic Loading", then, is the process
of loading compiled objects into a running program and executing some or
all of the code from the loaded objects in the same context as the main
executable.

   This chapter begins with a discussion of the mechanics of dynamic
modules and how they are used, and ends with example code for very
simple module loading on GNU/Linux, along with the example code for a
complementary dynamically loadable module.  Once you have read this
chapter and understand the principles of dynamic loading, the next
chapter will explain how to use GNU Autotools to write portable dynamic
module loading code and address some of the shortcomings of native
dynamic loading APIs.

* Menu:

* Dynamic Modules::
* Module Access Functions::
* Finding a Module::
* A Simple GNU/Linux Module Loader::
* A Simple GNU/Linux Dynamic Module::


File: autobook.info,  Node: Dynamic Modules,  Next: Module Access Functions,  Up: Dynamic Loading

17.1 Dynamic Modules
====================

In order to dynamically load some code into your executable, that code
must be compiled in some special but architecture dependent fashion.
Depending on the compiler you use and the platform you are compiling
for, there are different conventions you must observe in the code for
the module, and for the particular combination of compiler options you
need to select if the resulting objects are to be suitable for use in a
dynamic module.  For the rest of this chapter I will concentrate on the
conventions used when compiling dynamic modules with GCC on GNU/Linux,
which although peculiar to this particular combination of compiler and
host architecture, are typical of the sorts of conventions you would
need to observe on other architectures or with a different compiler.

   With GCC on GNU/Linux, you must compile each of the source files
with `-fPIC'(1), the resulting objects must be linked into a loadable
module with `gcc''s `-shared' option:

     $ gcc -fPIC -c foo.c
     $ gcc -fPIC -c bar.c
     $ gcc -shared -o baz.so foo.o bar.o

   This is pretty similar to how you might go about linking a shared
library, except that the `baz.so' module will never be linked with a
`-lbaz' option, so the `lib' prefix isn't necessary.  In fact, it would
probably be confusing if you used the prefix.  Similarly, there is no
constraint to use any particular filename suffix, but it is sensible to
use the target's native shared library suffix (GNU/Linux uses `.so') to
make it obvious that the compiled file is some sort of shared object,
and not a normal executable.

   Apart from that, the only difference between a shared library built
for linking at compile-time and a dynamic module built for loading at
run-time is that the module must provide known "entry points" for the
main executable to call.  That is, when writing code destined for a
dynamic module, you must provide functions or variables with known names
and semantics that the main executable can use to access the
functionality of the module.  This _is_ different to the function and
variable names in a regular library, which are already known when you
write the client code, since the libraries are always  written _before_
the code that uses them;  a runtime module loading system must, by
definition, be able to cope with modules that are written _after_ the
code that uses those modules.

   ---------- Footnotes ----------

   (1) Not essential but will be slower without this option, see *Note
Position Independent Code::.


File: autobook.info,  Node: Module Access Functions,  Next: Finding a Module,  Prev: Dynamic Modules,  Up: Dynamic Loading

17.2 Module Access Functions
============================

In order to access the functionality of dynamic modules, different
architectures provide various APIs to bring the code from the module
into the address space of the loading program, and to access the
symbols exported by that module.

   GNU/Linux uses the dynamic module API introduced by Sun's Solaris
operating system, and widely adopted (and adapted!) by the majority of
modern Unices(1).  The interface consists of four functions.  In
practice, you really ought not to use these functions, since you would
be locking your project into this single API, and the class of machines
that supports it.  This description is over-simplified to serve as a
comparison with the fully portable libltdl API described in *Note Using
GNU libltdl::.  The minutiae are not discussed, because therein lie the
implementation peculiarities that spoil the portability of this API.
As they stand, these descriptions give a good overview of how the
functions work at a high level, and are broadly applicable to the
various implementations in use.  If you are curious, the details of
your machines  particular dynamic loading API will be available in its
system manual pages.

 -- Function: void * dlopen (const char *FILENAME, int FLAG)
     This function brings the code from a named module into the address
     space of the running program that calls it, and returns a handle
     which is used by the other API functions.  If FILENAME is not an
     absolute path, GNU/Linux will search for it in directories named
     in the `LD_LIBRARY_PATH' environment variable, and then in the
     standard library directories before giving up.

     The flag argument is made by `OR'ing together various flag bits
     defined in the system headers.  On GNU/Linux, these flags are
     defined in `dlfcn.h':

    `RTLD_LAZY'
          Resolve undefined symbols when they are first used.

    `RTLD_NOW'
          If all symbols cannot be resolved when the module is loaded,
          `dlopen' will fail and return `NULL'.

    `RTLD_GLOBAL'
          All of the global symbols in the loaded module will be
          available to resolve undefined symbols in subsequently loaded
          modules.

 -- Function: void * dlsym (void *HANDLE, char *NAME)
     Returns the address of the named symbol in the module which
     returned HANDLE when it was `dlopen'ed.  You must cast the returned
     address to a known type before using it.

 -- Function: int dlclose (void *HANDLE)
     When you are finished with a particular module, it can be removed
     from memory using this function.

 -- Function: const char * dlerror (void)
     If any of the other three API calls fails, this function returns a
     string which describes the last error that occurred.

   In order to use these functions on GNU/Linux, you must `#include
<dlfcn.h>' for the function prototypes, and link with `-ldl' to provide
the API implementation.  Other Unices use `-ldld' or provide the
implementation of the API inside the standard C library.

   ---------- Footnotes ----------

   (1) HP-UX being the most notable exception.


File: autobook.info,  Node: Finding a Module,  Next: A Simple GNU/Linux Module Loader,  Prev: Module Access Functions,  Up: Dynamic Loading

17.3 Finding a Module
=====================

When you are writing a program that will load dynamic modules, a major
stumbling block is writing the code to find the modules you wish to
load.  If you are worried about portability (which you must be, or you
wouldn't be reading this book!), you can't rely on the default search
algorithm of the vendor `dlopen' function, since it varies from
implementation to implementation.  You can't even rely on the name of
the module, since the module suffix will vary according to the
conventions of the target host (though you could insist on a particular
suffix for modules you are willing to load).

   Unfortunately, this means that you will need to implement your own
searching algorithm and always use an absolute pathname when you call
`dlopen'.  A widely adopted mechanism is to look for each module in
directories listed in an environment variable specific to your
application, allowing your users to inform the application of the
location of any modules they have written.  If a suitable module is not
yet found, the application would then default to looking in a list of
standard locations - say, in a subdirectory of the user's home
directory, and finally a subdirectory of the application installation
tree.  For application `foo', you might use `/usr/lib/foo/module.so' -
that is, `$(pkglibdir)/module.so' if you are using Automake.

   This algorithm can be further improved:

   * If you try different module suffixes to the named module for every
     directory in the search path, which will avoid locking your code
     into a subset of machines that use the otherwise hardcoded module
     suffix.  With this in place you could ask the module loader for
     module `foomodule', and if it was not found in the first search
     directory, the module loader could try `foomodule.so',
     `foomodule.sl' and `foomodule.dll' before moving on to the next
     directory.

   * You might also provide command line options to your application
     which will preload modules before starting the program proper or
     to modify the module search path.  For example, GNU M4, version
     1.5, will have the following dynamic loading options:

     $ m4 --help
     Usage: m4 [OPTION]... [FILE]...
     ...
     Dynamic loading features:
       -M, --module-directory=DIRECTORY  add DIRECTORY to the search path
       -m, --load-module=MODULE          load dynamic MODULE from M4MODPATH
     ...
     Report bugs to <bug-m4@gnu.org>.


File: autobook.info,  Node: A Simple GNU/Linux Module Loader,  Next: A Simple GNU/Linux Dynamic Module,  Prev: Finding a Module,  Up: Dynamic Loading

17.4 A Simple GNU/Linux Module Loader
=====================================

Something to be aware of, is that when your users write dynamic modules
for your application, they are subject to the interface you design.  It
is very important to design a dynamic module interface that is clean and
functional before other people start to write modules for your code.  If
you ever need to change the interface, your users will need to rewrite
their modules.  Of course you can carefully change the interface  to
retain backwards compatibility to save your users the trouble of
rewriting their modules, but that is no substitute for designing a good
interface from the outset.  If you do get it wrong, and subsequently
discover that the design you implemented is misconceived (this is the
voice of experience speaking!), you will be left with a difficult
choice: try to tweak the broken API so that it does work while
retaining backwards compatibility,  and the maintenance and performance
penalty that brings? Or start again with a fresh design born of the
experience gained last time, and rewrite all of the modules you have so
far?

   If there are other applications which have similar module
requirements to you, it is worth writing a loader that uses the same
interface and semantics.  That way, you will (hopefully) be building
from a known good API design, and you will have access to all the
modules for that other application too, and vice versa.

   For the sake of clarity, I have sidestepped any issues of API design
for the following example, by choosing this minimal interface:

 -- Function: int run (const char *ARGUMENT)
     When the module is successfully loaded a function with the
     following prototype is called with the argument given on the
     command line.  If this entry point is found and called, but
     returns `-1', an error message is displayed by the calling program.

   Here's a simplistic but complete dynamic module loading application
you can build for this interface with the  GNU/Linux dynamic loading
API:

     #include <stdio.h>
     #include <stdlib.h>
     #ifndef EXIT_FAILURE
     #  define EXIT_FAILURE        1
     #  define EXIT_SUCCESS        0
     #endif

     #include <limits.h>
     #ifndef PATH_MAX
     #  define PATH_MAX 255
     #endif

     #include <dlfcn.h>
     /* This is missing from very old Linux libc. */
     #ifndef RTLD_NOW
     #  define RTLD_NOW 2
     #endif

     typedef int entrypoint (const char *argument);

     /* Save and return a copy of the dlerror() error  message,
        since the next API call may overwrite the original. */
     static char *dlerrordup (char *errormsg);

     int
     main (int argc, const char *argv[])
     {
       const char modulepath[1+ PATH_MAX];
       const char *errormsg = NULL;
       void *module = NULL;
       entrypoint *run = NULL;
       int errors = 0;

       if (argc != 3)
         {
           fprintf (stderr, "USAGE: main MODULENAME ARGUMENT\n");
           exit (EXIT_FAILURE);
         }

       /* Set the module search path. */
       getcwd (modulepath, PATH_MAX);
       strcat (modulepath, "/");
       strcat (modulepath, argv[1]);

       /* Load the module. */
       module = dlopen (modulepath, RTLD_NOW);
       if (!module)
         {
           strcat (modulepath, ".so");
           module = dlopen (modulepath, RTLD_NOW);
         }
       if (!module)
         errors = 1;

       /* Find the entry point. */
       if (!errors)
         {
           run = dlsym (module, "run");
           /* In principle, run might legitimately be NULL, so
              I don't use run == NULL as an error indicator. */
           errormsg = dlerrordup (errormsg);

           if (errormsg != NULL)
             errors = dlclose (module);
         }

       /* Call the entry point function. */
       if (!errors)
         {
           int result = (*run) (argv[2]);
           if (result < 0)
             errormsg = strdup ("module entry point execution failed");
           else
             printf ("\t=> %d\n", result);
         }

       /* Unload the module, now that we are done with it. */
       if (!errors)
         errors = dlclose (module);

       if (errors)
         {
           /* Diagnose the encountered error. */
           errormsg = dlerrordup (errormsg);

           if (!errormsg)
             {
               fprintf (stderr, "%s: dlerror() failed.\n", argv[0]);
               return EXIT_FAILURE;
             }
         }

       if (errormsg)
         {
           fprintf (stderr, "%s: %s.\n", argv[0], errormsg);
           free (errormsg);
           return EXIT_FAILURE;
         }

       return EXIT_SUCCESS;
     }

     /* Be careful to save a copy of the error message,
        since the next API call may overwrite the original. */
     static char *
     dlerrordup (char *errormsg)
     {
       char *error = (char *) dlerror ();
       if (error && !errormsg)
         errormsg = strdup (error);
       return errormsg;
     }

You would compile this on a GNU/Linux machine like so:

     $ gcc -o simple-loader simple-loader.c -ldl

   However, despite making reasonable effort with this loader, and
ignoring features which could easily be added, it still has some
seemingly insoluble problems:

  1. It will fail if the user's platform doesn't have the `dlopen' API.
     This also includes platforms which have no shared libraries.

  2. It relies on the implementation to provide a working self-opening
     mechanism.  `dlopen (NULL, RTLD_NOW)' is very often unimplemented,
     or buggy, and without that, it is impossible to access the symbols
     of the main program through the `dlsym' mechanism.

  3. It is quite difficult to figure out at compile time whether the
     target host needs `libdl.so' to be linked.

   I will use GNU Autotools to tackle these problems in the next
chapter.


File: autobook.info,  Node: A Simple GNU/Linux Dynamic Module,  Prev: A Simple GNU/Linux Module Loader,  Up: Dynamic Loading

17.5 A Simple GNU/Linux Dynamic Module
======================================

As an appetiser for working with dynamic loadable modules, here is a
minimal module written for the interface used by the loader in the
previous section:

     #include <stdio.h>

     int
     run (const char *argument)
     {
       printf ("Hello, %s!\n", argument);
       return 0;
     }

Again, to compile on a GNU/Linux machine:

     $ gcc -fPIC -c simple-module.c
     $ gcc -shared -o simple-module.so

Having compiled both loader and module, a test run looks like this:

     $ ./simple-loader simple-module World
     Hello, World!
             => 0

   If you have a GNU/Linux system, you should experiment with the
simple examples from this chapter to get a feel for the relationship
between a dynamic module loader and its modules - tweak the interface a
little; try writing another simple module.  If you have a machine with
a different dynamic loading API, try porting these examples to that
machine to get a feel for the kinds of problems you would encounter if
you wanted a module system that would work with both APIs.

   The next chapter will do just that, and develop these examples into a
fully portable module loading system with the aid of GNU Autotools.  In
*Note A Module Loading Subsystem::, I will add a more realistic module
loader into the Sic project last discussed in *Note A Large GNU
Autotools Project::.


File: autobook.info,  Node: Using GNU libltdl,  Next: Advanced GNU Automake Usage,  Prev: Dynamic Loading,  Up: Top

18 Using GNU libltdl
********************

Now that you are conversant with the mechanics and advantages of using
dynamic run time modules in your projects, you can probably already
imagine a hundred and one uses for a plugin architecture.  As I
described in the last chapter, there are several gratuitously different
architecture dependent dynamic loading APIs, and yet several more
shortcomings in many of those.

   If you have Libtool installed on your machine, then you almost
certainly have libltdl which has shipped as part of the standard Libtool
distribution since release 1.3.In this chapter I will describe "GNU
libltdl", the *L*ib*T*ool *D*ynamic *L*oading *lib*rary, and explain
some of its features and how to make use of them.

* Menu:

* Introducing libltdl::
* Using libltdl::
* Portable Library Design::
* dlpreopen Loading::
* User Module Loaders::


File: autobook.info,  Node: Introducing libltdl,  Next: Using libltdl,  Up: Using GNU libltdl

18.1 Introducing libltdl
========================

Probably the best known and supported Unix run time linking API is the
`dlopen' interface, used by Solaris and GNU/Linux amongst others, and
discussed earlier in *Note Dynamic Loading::.  libltdl is based on the
`dlopen' API, with a few small differences and several enhancements.

   The following libltdl API functions are declared in `ltdl.h':

 -- Function: lt_dlhandle lt_dlopen (const char *FILENAME)
     This function brings the code from a named module into the address
     space of the running program that calls it, and returns a handle
     which is used by the other API functions.  If FILENAME is not an
     absolute path, libltdl  will search for it in directories named in
     the `LTDL_LIBRARY_PATH' environment variable, and then in the
     standard library directories before giving up. It is safe to call
     this function many times, libltdl will keep track of the number of
     calls made, but will require the same number of calls to
     `lt_dlclose' to actually unload the module.

 -- Function: lt_ptr_t lt_dlsym (lt_dlhandle HANDLE, const char *NAME)
     Returns the address of the named symbol in the module which
     returned HANDLE when it was `lt_dlopen'ed.  You must cast the
     returned address to a known type before using it.

 -- Function: int lt_dlclose (lt_dlhandle HANDLE)
     When you are finished with a particular module, it can be removed
     from memory using this function.

 -- Function: const char * lt_dlerror (void)
     If any of the libltdl  API calls fail, this function returns a
     string which describes the last error that occurred.

   In order to use these functions, you must `#include <ltdl.h>' for
the function prototypes, and link with `-lltdl' to provide the API
implementation.  Assuming you link your application with `libtool', and
that you call the necessary macros from your `configure.in' (*note
Using libltdl::), then any host specific dependent libraries (for
example, `libdl' on GNU/Linux) will automatically be added to the final
link line by `libtool'.

   You don't limit yourself to using only Libtool compiled modules when
you use libltdl.  By writing the module loader carefully, it will be
able to load native modules too--although you will not be able to
preload non-Libtool modules (*note dlpreopen Loading::.  The loader in
*Note Module Loader: libltdl Module Loader. is written in this way.  It
is useful to be able to load modules flexibly like this, because you
don't tie your users into using Libtool for any modules they write.

   Compare the descriptions of the functions above with the API
described in *Note Module Access Functions::.  You will notice that they
are very similar.

     Back-linking is the process of resolving any remaining symbols by
     referencing back into the application that loads the library at
     runtime - a mechanism implemented on almost all modern Unices.

     For instance, your main application may provide some utility
     function, `my_function', which you want a module to have access
     to.  There are two ways to do that:

        * You could use Libtool to link your application, using the
          `-export-dynamic' option to ensure that the global application
          symbols are available to modules.  When libltdl loads a
          module into an application compiled like this, it will
          "back-link" symbols from the application to resolve any
          otherwise undefined symbols in a module.  When the module is
          `ltdlopen'ed, libltdl will arrange for calls to `my_function'
          in the module, to execute the `my_function' implementation in
          the application.

          If you have need of this functionality, relying on
          back-linking is the simplest way to achieve it.
          Unfortunately, this simplicity is at the expense of
          portability:  some platforms have no support for back-linking
          at all, and others will not allow a module to be created with
          unresolved symbols.  Never-the-less, libltdl allows you to do
          this if you want to.

        * You could split the code that implements the symbols you need
          to share with modules into a separate library.  This library
          would then be used to resolve the symbols you wish to share,
          by linking it into modules and application alike.  The
          definition of `my_function' would be compiled separately into
          a library, `libmy_function.la'.  References to `my_function'
          from the application would be resolved by linking it with
          `libmy_function.la', and the library would be installed so
          that modules which need to call `my_function' would be able
          to resolve the symbol by linking with `-lmy_function'.

          This method requires support for neither back-linking nor
          unresolved link time symbols from the host platform.  The
          disadvantage is that when you realise you need this
          functionality, it may be quite complicated to extract the
          shared functionality from the application to be compiled in a
          stand alone library.

   On those platforms which support "back-linking", libltdl can be
configured to resolve external symbol references in a dynamic module
with any global symbols already present in the main application.  This
has two implications for the libltdl API:

   * There is no need to pass `RTLD_GLOBAL' (or equivalent) to
     `lt_dlopen' as might be necessary with the native module loading
     API.

   * You should be aware that your application will not work on some
     platforms--most notably, Windows and AIX--if you rely on a
     back-linking.

   Similarly, there is no need to specify whether the module should be
integrated into the application core before `lt_dlopen' returns, or
else when the symbols it provides are first referenced.  libltdl will
use "lazy loading" if it is supported, since this is a slight
performance enhancement, or else fall back to loading everything
immediately.  Between this feature and the support of back-linking,
there is no need to pass flags into `lt_dlopen' as there is with most
native `dlopen' APIs.

   There are a couple of other important API functions which you will
need when using libltdl:

 -- Function: int lt_dlinit (void)
     You must call this function to initialise libltdl before calling
     any of the other libltdl API functions.  It is safe to call this
     function many times, libltdl will keep track of the number of calls
     made, but will require the same number of calls to `lt_dlexit' to
     actually recycle the library resources.  If you don't call
     `lt_dlinit' before any other API call, the other calls, including
     `lt_dlerror', will return their respective failure codes (`NULL'
     or `1', as appropriate).

 -- Function: int lt_dlexit (void)
     When you are done with libltdl and all dynamic modules have been
     unloaded you can call this function to finalise the library, and
     recycle its resources.  If you forget to unload any modules, the
     call to `lt_dlexit' will `lt_dlclose' them for you.

   Another useful departure that the libltdl API makes from a vanilla
`dlopen' implementation is that it also will work correctly with old
K&R C compilers, by virtue of not relying on `void *' pointers.
libltdl uses `lt_dlhandle's to pass references to loaded modules, and
this also improves ANSI C compiler's type checking compared to the
untyped addresses typically used by native `dlopen' APIs.


File: autobook.info,  Node: Using libltdl,  Next: Portable Library Design,  Prev: Introducing libltdl,  Up: Using GNU libltdl

18.2 Using libltdl
==================

Various aspects of libltdl are addressed in the following subsections,
starting with a step by step guide to adding libltdl to your own GNU
Autotools projects (*note Configury: libltdl Configury.) and an
explanation of how to initialise libltdl's memory management (*note
Memory Management: libltdl Memory Management.).  After this comes a
simple libltdl module loader which you can use as the basis for a
module loader in your own projects (*note Module Loader: libltdl Module
Loader.), including an explanation of how libltdl finds and links any
native dynamic module library necessary for the host platform.  The
next subsection (*note Dependent Libraries: libltdl Dependent
Libraries.) deals with the similar problem of dynamic modules which
depend on other libraries - take care not to confuse the problems
discussed in the previous two subsections.  Following that, the source
code for and use of a simple dynamic module for use with this section's
module loader is detailed (*note Dynamic Module: libltdl Dynamic
Module.).

* Menu:

* libltdl Configury::
* libltdl Memory Management::
* libltdl Module Loader::
* libltdl Dependent Libraries::
* libltdl Dynamic Module::


File: autobook.info,  Node: libltdl Configury,  Next: libltdl Memory Management,  Up: Using libltdl

18.2.1 Configury
----------------

Because libltdl supports so many different platforms(1) it needs to be
configured for the host platform before it can be used.

     The path of least resistance to successfully integrating libltdl
     into your own project, dictates that the project use Libtool for
     linking its module loader with libltdl.  This is certainly the
     method I use and recommend, and is the method discussed in this
     chapter.   However, I have seen projects which did not use Libtool
     (specifically because Libtool's poor C++ support made it difficult
     to adopt), but which wanted the advantages of libltdl.  It is
     possible to use libltdl entirely without Libtool, provided you
     take care to use the configuration macros described here, and use
     the results of those running these macros to determine how to link
     your application with libltdl.

   The easiest way to add libltdl support to your own projects is with
the following simple steps:

  1. You must add the libltdl sources to your project distribution.  If
     you are not already using Libtool in some capacity for your
     project, you should add `AC_PROG_LIBTOOL'(2)  to your
     `configure.in'.  That done, move to the top level directory of the
     project, and execute:

          $ libtoolize --ltdl
          $ ls -F
          aclocal.m4    configure.in    libltdl/
          $ ls libltdl/
          COPYING.LIB   README         aclocal.m4    configure.in   stamp-h.in
          Makefile.am   acconfig.h     config.h.in   ltdl.c
          Makefile.in   acinclude.m4   configure     ltdl.h

  2. libltdl has its own configuration to run in addition to the
     configuration for your project, so you must be careful to call the
     subdirectory configuration from your top level `configure.in':

          AC_CONFIG_SUBDIRS(libltdl)

     And you must ensure that Automake knows that it must descend into
     the libltdl source directory at make time, by adding the name of
     that subdirectory to the `SUBDIRS' macro in your top level
     `Makefile.am':

          SUBDIRS = libltdl src

  3. You must also arrange for the code of libltdl to be linked into
     your application.  There are two ways to do this: as a regular
     Libtool library; or as a convenience library (*note Creating
     Convenience Libraries: Creating Convenience Libraries with
     libtool.).  Either way there are catches to be aware of, which
     will be addressed in a future release.  Until libltdl is present
     on the average user's machine, I recommend  building a convenience
     library.  You can do that in `configure.in':

          AC_LIBLTDL_CONVENIENCE
          AC_PROG_LIBTOOL

     The main thing to be aware of when you follow these steps, is that
     you can only have one copy of the code from libltdl in any
     application. Once you link the objects into a library, that
     library will not work with any other library which has also linked
     with libltdl, or any application which has its own copy of the
     objects.  If you were to try, the libltdl symbol names would clash.

     The alternative is to substitute `AC_LIBLTDL_CONVENIENCE' with
     `AC_LIBLTDL_INSTALLABLE'.  Unfortunately there are currently many
     potential problems with this approach.  This macro will try to
     find an already installed libltdl and use that, or else the
     embedded libltdl will be built as a standard shared library, which
     must be installed along with any libraries or applications that
     use it.  There is no testing for version compatibility, so it is
     possible that two or more applications that use this method will
     overwrite one anothers copies of the installed libraries and
     headers.  Also, the code which searches for the already installed
     version of libltdl tends not to find the library on many hosts,
     due to the native libraries it depends on being difficult to
     predict.

     Both of the `AC_LIBLTDL_...' macros set the values of `INCLTDL'
     and `LIBLTDL' so that they can be used to add the correct include
     and library flags to the compiler in your Makefiles.  They are not
     substituted by default.  If you need to use them you must also add
     the following macros to your `configure.in':

          AC_SUBST(INCLTDL)
          AC_SUBST(LIBLTDL)

  4. Many of the libltdl supported hosts require that a separate shared
     library be linked into any application that uses dynamic runtime
     loading.  libltdl is wrapped around this native implementation on
     these hosts, so it is important to link that library too.  Adding
     support for module loading through the wrapped native
     implementation is independent of Libtool's determination of how
     shared objects are compiled.  On GNU/Linux, you would need to link
     your program with libltdl and `libdl', for example.

     Libtool installs a macro, `AC_LIBTOOL_DLOPEN', which adds tests to
     your `configure' that will search for this native library.
     Whenever you use libltdl you should add this macro to your
     `configure.in' before `AC_PROG_LIBTOOL':

          AC_LIBTOOL_DLOPEN
          AC_LIBLTDL_CONVENIENCE
          AC_PROG_LIBTOOL
          ...
          AC_SUBST(INCLTDL)
          AC_SUBST(LIBLTDL)

     `AC_LIBTOOL_DLOPEN' takes care to substitute a suitable value of
     `LIBADD_DL' into your `Makefile.am', so that your code will
     compile correctly wherever the implementation library is
     discovered:

          INCLUDES        += @INCLTDL@

          bin_PROGRAMS     = your_app
          your_app_SOURCES = main.c support.c
          your_app_LDADD   = @LIBLTDL@ @LIBADD_DL@

   Libtool 1.4 has much improved inter-library dependency tracking code
which no longer requires `@LIBADD_DL@' be explicitly referenced in your
`Makefile.am'.    When you install libltdl, Libtool 1.4 (or better)
will make a note of any native library that libltdl depends on -
linking it automatically, provided that you link `libltdl.la' with
`libtool'.  You might want to omit the `@LIBADD_DL@' from your
`Makefile.am' in this case, if seeing the native library twice (once as
a dependee of libltdl, and again as an expansion of `@LIBADD_DL@') on
the link line bothers you.

   Beyond this basic configury setup, you will also want to write some
code to form a module loading subsystem for your project, and of course
some modules!  That process is described in *Note Module Loader:
libltdl Module Loader. and *Note Dynamic Module: libltdl Dynamic Module.
respectively.

   ---------- Footnotes ----------

   (1) As I always like to say, `from BeOS to Windows!'.  And yes, I do
think that it is a better catchphrase than `from AIX to Xenix'!

   (2) Use `AM_PROG_LIBTOOL' if you have `automake' version 1.4 or
older or a version of `libtool' earlier than 1.4.


File: autobook.info,  Node: libltdl Memory Management,  Next: libltdl Module Loader,  Prev: libltdl Configury,  Up: Using libltdl

18.2.2 Memory Management
------------------------

Internally, libltdl maintains a list of loaded modules and symbols on
the heap.  If you find that you want to use it with a project that has
an unusual memory management API, or if you simply want to use a
debugging `malloc', libltdl provides hook functions for you to set the
memory routines it should call.

   The way to use these hooks is to point them at the memory allocation
routines you want libltdl to use before calling any of its API
functions:

         lt_dlmalloc = (lt_prt_t (*) PARAMS((size_t))) mymalloc;
         lt_dlfree   = (void (*) PARAMS((lt_ptr_t))) myfree;

   Notice that the function names need to be cast to the correct type
before assigning them to the hook symbols.  You need to do this because
the prototypes of the functions you want libltdl to use will vary
slightly from libltdls own function pointer types-- libltdl uses
`lt_ptr_t' for compatibility with K&R compilers, for example.


File: autobook.info,  Node: libltdl Module Loader,  Next: libltdl Dependent Libraries,  Prev: libltdl Memory Management,  Up: Using libltdl

18.2.3 Module Loader
--------------------

This section contains a fairly minimal libltdl based dynamic module
loader that you can use as a base for your own code.  It implements the
same API as the simple module loader in *Note A Simple GNU/Linux Module
Loader::, and because of the way libltdl is written is able to load
modules written for that loader, too.  The only part of this code which
is arguably more complex than the equivalent from the previous example
loader, is that `lt_dlinit' and `lt_dlexit' must be called in the
appropriate places.  In contrast, The module search path initialisation
is much simplified thanks to another relative improvement in the
libltdl API:

 -- Function: int lt_dlsetsearchpath (const char *PATH)
     This function takes a colon separated list of directories, which
     will be the first directories libltdl will search when trying to
     locate a dynamic module.

   Another new API function is used to actually load the module:

 -- Function: lt_dlhandle lt_dlopenext (const char *FILENAME)
     This function is used in precisely the same way as `lt_dlopen'.
     However, if the search for the named module by exact match against
     FILENAME fails, it will try again with a `.la' extension, and then
     the native shared library extension (`.sl' on HP-UX, for example).

   The advantage of using `lt_dlopenext' to load dynamic modules is
that it will work equally well when loading modules not compiled with
Libtool.  Also, by passing the module name parameter with no extension,
this function allows module coders to manage without Libtool.

     #include <stdio.h>
     #include <stdlib.h>
     #ifndef EXIT_FAILURE
     #  define EXIT_FAILURE        1
     #  define EXIT_SUCCESS        0
     #endif

     #include <limits.h>
     #ifndef PATH_MAX
     #  define PATH_MAX 255
     #endif

     #include <string.h>
     #include <ltdl.h>

     #ifndef MODULE_PATH_ENV
     #  define MODULE_PATH_ENV        "MODULE_PATH"
     #endif

     typedef int entrypoint (const char *argument);

     /* Save and return a copy of the dlerror() error  message,
        since the next API call may overwrite the original. */
     static char *dlerrordup (char *errormsg);

     int
     main (int argc, const char *argv[])
     {
       char *errormsg = NULL;
       lt_dlhandle module = NULL;
       entrypoint *run = NULL;
       int errors = 0;

       if (argc != 3)
         {
           fprintf (stderr, "USAGE: main MODULENAME ARGUMENT\n");
           exit (EXIT_FAILURE);
         }

       /* Initialise libltdl. */
       errors = lt_dlinit ();

       /* Set the module search path. */
       if (!errors)
         {
           const char *path = getenv (MODULE_PATH_ENV);

           if (path != NULL)
             errors = lt_dlsetsearchpath (path);
         }

       /* Load the module. */
       if (!errors)
         module = lt_dlopenext (argv[1]);

       /* Find the entry point. */
       if (module)
         {
           run = (entrypoint *) lt_dlsym (module, "run");

           /* In principle, run might legitimately be NULL, so
              I don't use run == NULL as an error indicator
              in general. */
           errormsg = dlerrordup (errormsg);
           if (errormsg != NULL)
             {
               errors = lt_dlclose (module);
               module = NULL;
             }
         }
       else
         errors = 1;

       /* Call the entry point function. */
       if (!errors)
         {
           int result = (*run) (argv[2]);
           if (result < 0)
             errormsg = strdup ("module entry point execution failed");
           else
             printf ("\t=> %d\n", result);
         }

       /* Unload the module, now that we are done with it. */
       if (!errors)
         errors = lt_dlclose (module);

       if (errors)
         {
           /* Diagnose the encountered error. */
           errormsg = dlerrordup (errormsg);

           if (!errormsg)
             {
               fprintf (stderr, "%s: dlerror() failed.\n", argv[0]);
               return EXIT_FAILURE;
             }
         }

       /* Finished with ltdl now. */
       if (!errors)
         if (lt_dlexit () != 0)
           errormsg = dlerrordup (errormsg);

       if (errormsg)
         {
           fprintf (stderr, "%s: %s.\n", argv[0], errormsg);
           free (errormsg);
           exit (EXIT_FAILURE);
         }

       return EXIT_SUCCESS;
     }

     /* Be careful to save a copy of the error message,
        since the  next API call may overwrite the original. */
     static char *
     dlerrordup (char *errormsg)
     {
       char *error = (char *) lt_dlerror ();
       if (error && !errormsg)
         errormsg = strdup (error);
       return errormsg;
     }

   This file must be compiled with `libtool', so that the dependent
libraries (`libdl.so' on my GNU/Linux machine) are handled correctly,
and so that the dlpreopen support is compiled in correctly (*note
dlpreopen Loading::):

     $ libtool --mode=link gcc -g -o ltdl-loader -dlopen self \
     -rpath /tmp/lib ltdl-loader.c -lltdl
     gcc -g -o ltdl-loader -Wl,--rpath,/tmp/lib ltdl-loader.c -lltdl -ldl

   By using _both_ of `lt_dlopenext' and `lt_dlsetsearchpath', this
module loader will make a valiant attempt at loading anything you pass
to it - including the module I wrote for the simple GNU/Linux module
loader earlier (*note A Simple GNU/Linux Dynamic Module::).  Here, you
can see the new `ltdl-loader' loading and using the `simple-module'
module from *Note A Simple GNU/Linux Dynamic Module:::

     $ ltdl-loader simple-module World
     Hello, World!
             => 0


File: autobook.info,  Node: libltdl Dependent Libraries,  Next: libltdl Dynamic Module,  Prev: libltdl Module Loader,  Up: Using libltdl

18.2.4 Dependent Libraries
--------------------------

On modern Unices(1), the shared library architecture is smart enough to
encode all of the other libraries that a dynamic module depends on as
part of the format of the file which is that module.  On these
architectures, when you `lt_dlopen' a module, if any shared libraries
it depends on are not already loaded into the main application, the
system runtime loader will ensure that they too are loaded so that all
of the module's symbols are satisfied.

   Less well endowed systems(2), cannot do this by themselves.  Since
Libtool release 1.4, libltdl uses the record of inter-library
dependencies in the libtool pseudo-library (*note Introducing GNU
Libtool::) to manually load dependent libraries as part of the
`lt_dlopen' call.

   An example of the sort of difficulties that can arise from trying to
load a module that has a complex library dependency chain is typified
by a problem I encountered with GNU Guile a few years ago:  Earlier
releases of the libXt Athena widget wrapper library for GNU Guile
failed to load on my a.out based GNU/Linux system.  When I tried to
load the module into a running Guile interpreter, it couldn't resolve
any of the symbols that referred to libXt.  I soon discovered that the
libraries that the module depended upon were not loaded by virtue of
loading the module itself.  I needed to build the interpreter itself
with libXt and rely on back-linking to resolve the `Xt' references when
I loaded the module.  This pretty much defeated the whole point of
having the wrapper library as a module.  Had Libtool been around in
those days, it would have been able to load libXt as part of the process
of loading the module.

   If you program with the X window system, you will know that the list
of libraries you need to link into your applications soon grows to be
very large.  Worse, if you want to load an X extension module into a
non-X aware application, you will encounter the problems I found with
Guile, unless you link your module with `libtool' and dynamically load
it with libltdl.  At the moment, the various X Window libraries are not
built with libtool, so you must be sure to list all of the dependencies
when you link a module.  By doing this, Libtool can use the list to
check that all of the libraries required by a module are loaded
correctly as part of the call to `lt_dlopen', like this:

     $ libtool --mode=link gcc -o module.so -module -avoid-version \
     source.c -L/usr/X11R6/lib -lXt -lX11
     ...
     $ file .libs/module.so
     .libs/module.so: ELF 32-bit LSB shared object, Intel 80386,
     version 1, not stripped
     $ ldd .libs/module.so
             libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x4012f00)
             libXt.so.6 => /usr/X11R6/lib/libXt.so.6 (0x4014500)

   Or, if you are using Automake:

     ...
     lib_LTLIBRARIES   = module.la
     module_la_SOURCES = source.c
     module_la_LDFLAGS = -module -avoid-version -L$(X11LIBDIR)
     module_la_LIBADD  = -lXt -lX11
     ...

   It is especially important to be aware of this if you develop on a
modern platform which correctly handles these dependencies natively (as
in the example above), since the code may still work on your machine
even if you don't correctly note all of the dependencies.  It will only
break if someone tries to use it on a machine that needs Libtool's help
for it to work, thus reducing the portability of your project.

   ---------- Footnotes ----------

   (1) Architectures which use ELF and ECOFF binary format for example.

   (2) Those which use a.out binary format, for example.


File: autobook.info,  Node: libltdl Dynamic Module,  Prev: libltdl Dependent Libraries,  Up: Using libltdl

18.2.5 Dynamic Module
---------------------

Writing a module for use with the libltdl based dynamic module loader
is no more involved than before:  It must provide the correct entry
points, as expected by the simple API I designed - the `run' entry
point described in *Note A Simple GNU/Linux Module Loader::.  Here is
such a module, `ltdl-module.c':

     #include <stdio.h>
     #include <math.h>

     #define run ltdl_module_LTX_run

     int
     run (const char *argument)
     {
       char *end = NULL;
       long number;

       if (!argument || *argument == '\0')
         {
           fprintf (stderr, "error: invalid argument, \"%s\".\n",
                    argument ? argument : "(null)");
           return -1;
         }

       number = strtol (argument, &end, 0);
       if (end && *end != '\0')
         {
           fprintf (stderr, "warning: trailing garbage \"%s\".\n",
                    end);
         }

       printf ("Square root of %s is %f\n", argument, sqrt (number));

       return 0;
     }

   To take full advantage of the new module loader, the module itself
*must* be compiled with Libtool.  Otherwise dependent libraries will
not have been stored when libltdl tries to load the module on an
architecture that doesn't load them natively, or which doesn't have
shared libraries at all (*note dlpreopen Loading::).

     $ libtool --mode=compile gcc -c ltdl-module.c
     rm -f .libs/ltdl-module.lo
     gcc -c ltdl-module.c  -fPIC -DPIC -o .libs/ltdl-module.lo
     gcc -c ltdl-module.c -o ltdl-module.o >/dev/null 2>&1
     mv -f .libs/ltdl-module.lo ltdl-module.lo
     $ libtool --mode=link gcc -g -o ltdl-module.la -rpath `pwd` \
     -no-undefined -module -avoid-version ltdl-module.lo -lm
     rm -fr .libs/ltdl-module.la .libs/ltdl-module.* .libs/ltdl-module.*
     gcc -shared  ltdl-module.lo  -lm -lc  -Wl,-soname \
     -Wl,ltdl-module.so -o .libs/ltdl-module.so
     ar cru .libs/ltdl-module.a  ltdl-module.o
     creating ltdl-module.la
     (cd .libs && rm -f ltdl-module.la && ln -s ../ltdl-module.la \
     ltdl-module.la)

   You can see from the interaction below that `ltdl-loader' does not
load the math library, `libm', and that the shared part of the Libtool
module, `ltdl-module', does have a reference to it.  The pseudo-library
also has a note of the `libm' dependency so that libltdl will be able
to load it even on architectures that can't do it natively:

     $ libtool --mode=execute ldd ltdl-loader
             libltdl.so.0 => /usr/lib/libltdl.so.0 (0x4001a000)
             libdl.so.2 => /lib/libdl.so.2 (0x4001f000)
             libc.so.6 => /lib/libc.so.6 (0x40023000)
             /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
     $ ldd .libs/ltdl-module.so
             libm.so.6 => /lib/libm.so.6 (0x40008000)
             libc.so.6 => /lib/libc.so.6 (0x40025000)
             /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)
     $ fgrep depend ltdl-module.la
     # Libraries that this one depends upon.
     dependency_libs=' -lm'

   This module is now ready to load from `ltdl-loader':

     $ ltdl-loader ltdl-module 9
     Square root of 9 is 3.000000
             => 0


File: autobook.info,  Node: Portable Library Design,  Next: dlpreopen Loading,  Prev: Using libltdl,  Up: Using GNU libltdl

18.3 Portable Library Design
============================

When partitioning the functionality of your project into libraries, and
particularly loadable modules, it easy to inadvertently rely on modern
shared library features such as _back-linking_ or _dependent library
loading_.  If you do accidentally use any of these features, you
probably won't find out about it until someone first tries to use your
project on an older or less featureful host.

   I have already used the `-module' and `-avoid-version' libtool
linking options when compiling the libltdl module in the last section,
the others are useful to know also.  All of these are used with the
`link' mode of `libtool' (`libtool --mode=link'):

`-module'
     This option tells `libtool' that the target is a dynamically
     loadable module (as opposed to a conventional shared library) and
     as such need not have the `lib' prefix.

`-avoid-version'
     When linking a dynamic module, this option can be used instead of
     the `-version-info' option, so that the module is not subject to
     the usual shared library version number suffixes.

`-no-undefined'
     This is an extremely important option when you are aiming for
     maximum portability.  It declares that all of the symbols required
     by the target are resolved at link time.  Some shared library
     architectures do not allow undefined symbols by default (Tru64
     Unix), and others do not allow them at all (AIX).  By using this
     switch, and ensuring that all symbols really are resolved at link
     time, your libraries will work on even these platforms.  *Note
     Creating Libtool Libraries with Automake::.

`-export-dynamic'
     Almost the opposite of `-no-undefined', this option will compile
     the target so that the symbols it exports can be used to satisfy
     unresolved symbols in subsequently loaded modules.  Not all shared
     library architectures support this feature, and many that do
     support it, do so by default regardless of whether this option is
     supplied.  If you rely on this feature, then you should use this
     option, in the knowledge that you project will not work correctly
     on architectures that have no support for the feature.  For
     maximum portability, you should neither rely on this feature nor
     use the `-export-dynamic' option - but, on the occasions you do
     need the feature, this option is necessary to ensure that the
     linker is called correctly.

   When you have the option to do so, I recommend that you design your
project so that each of the libraries and modules is self contained,
except for minimal number of dependent libraries, arranged in a
directional graph shaped like a tree.  That is, by relying on
back-linking, or mutual or cyclic dependencies you reduce the
portability of your project.  In the diagrams below, an arrow indicates
that the compilation object relies on symbols from the objects that it
points to:

            main            .---> main                main
              |             |       |                   |
         .----+----,        |  .----+----,         .----+----,
         v         v        |  v         v         v         v
        liba      libb      liba        libb      liba<-----libb
         |                     |                   |         ^
         v                     v                   v         |
        libc                  libc                libc-------'

          Tree: good        Backlinking: bad       Cyclic: bad


File: autobook.info,  Node: dlpreopen Loading,  Next: User Module Loaders,  Prev: Portable Library Design,  Up: Using GNU libltdl

18.4 dlpreopen Loading
======================

On machines which do not have any facility for shared libraries or
dynamic modules, libltdl allows an application to `lt_dlopen' modules,
provided that the modules are known at link time.  This works by
linking the code for the modules into the application in advance, and
then looking up the addresses of the already loaded symbols when
`lt_dlsym' is called.  We call this mechanism "dlpreopening" - so named
because the modules must be loaded at link time, not because the API to
use modules loaded in this way is any different.

   This feature is extremely useful for debugging, allowing you to make
a fully statically linked application from the executable and module
objects, without changing any source code to work around the module
loading calls.  As far as the code outside the libltdl API can tell,
these modules really are being loaded dynamically.  Driving a symbolic
debugger across module boundaries is however much easier when blocks of
code aren't moving in and out of memory during execution.

   You may have wondered about the purpose of the following line in the
dynamic module code in *Note Dependent Libraries: libltdl Dependent
Libraries.:

     #define run ltdl_module_LTX_run

The reason for redefining the entry point symbol in this way is to
prevent a symbol clash when two or more modules that provide
identically named entry point functions are preloaded into an
executable.  It would be otherwise impossible to preload both
`simple-module.c' and `ltdl-module.c', for example, since each defines
the symbol `run'.  To allow us to write dynamic modules that are
potentially preloaded, `lt_dlsym' will first try to lookup the address
of a named symbol with a prefix consisting of the canonicalized name of
the module being searched, followed by the characters `_LTX_'.  The
module name part of this prefix is canonicalized by replacing all
non-alphanumeric characters with an underscore.  If that fails,
`lt_dlsym' resorts to the unadorned symbol name, which is how `run' was
found in `simple-module.la' by `ltdl-loader' earlier.

   Supporting this feature in your module loading code is a simple
matter of initialising the address lookup table, and `ltdl.h' defines a
convenient macro to do exactly that:

 -- Macro: LTDL_SET_PRELOADED_SYMBOLS ()
     Add this macro to the code of your module loading code, before the
     first call to a libltdl function, to ensure that the dlopen address
     lookup table is populated.

   Now change the contents of `ltdl-loader.c', and add a call to this
macro, so that it looks like this:

       /* Initialise preloaded symbol lookup table. */
       LTDL_SET_PRELOADED_SYMBOLS();

       /* Initialise libltdl. */
       errors = lt_dlinit ();

Libtool will now be able to fall back to using preloaded static modules
if you tell it to, or if the host platform doesn't support native
dynamic loading.

     If you use `LTDL_SET_PRELOADED_SYMBOLS' in your module loader, you
     *must* also specify something to preload to avoid compilation
     failure due to undefined `lt_preloaded_symbols'.  You can name
     modules on the Libtool link command line using one of `-dlopen' or
     `-dlpreopen'.  This includes support for accessing the symbols of
     the main executable opened with `lt_dlopen(NULL)'--you can ask
     Libtool to fall back to preopening the main modules like this:

          $ libtool gcc -g -o ltdl-loader -dlopen self -rpath /tmp/lib \
          ltdl-loader.c -lltdl
          rm -f .libs/ltdl-loader.nm .libs/ltdl-loader.nmS \
          .libs/ltdl-loader.nmT
          creating .libs/ltdl-loaderS.c
          (cd .libs && gcc -c -fno-builtin -fno-rtti -fno-exceptions
          "ltdl-loaderS.c")
          rm -f .libs/ltdl-loaderS.c .libs/ltdl-loader.nm .libs/ltdl-loader.nmS
          .libs/ltdl-loader.nmT
          gcc -o ltdl-loader .libs/ltdl-loaderS.o ltdl-loader.c
          -Wl,--export-dynamic  /usr/lib/libltdl.so -ldl -Wl,--rpath -Wl,/tmp/lib
          rm -f .libs/ltdl-loaderS.o

     It doesn't make sense to add preloaded module support to a
     project, when you have no modules to preopen, so the compilation
     failure in that case is actually a feature of sorts.

   The `LTDL_SET_PRELOADED_SYMBOLS' macro does not interfere with the
normal operation of the code when modules are dynamically loaded,
provided you use the `-dlopen' option on the link line.  The advantage
of referencing the macro by default is that you can recompile the
application with or without preloaded module, and all without editing
the sources.

   If you have no modules to link in by default, you can force Libtool
to populate the preload symbol table by using the `-dlopen force'
option.  This is the option used to preload the symbols of the main
executable so that you can subsequently call `lt_dlopen(NULL)'.

   Multiple modules can be preloaded, although at the time of writing
only Libtool compiled modules can be used.  If there is a demand,
Libtool will be extended to include native library preloading in a
future revision.

   To illustrate, I have recompiled the `simple-module.c' module with
`libtool':

     $ libtool --mode=compile gcc -c simple-module.c
     rm -f .libs/simple-module.lo
     gcc -c simple-module.c  -fPIC -DPIC -o .libs/simple-module.lo
     gcc -c simple-module.c -o simple-module.o >/dev/null 2>&1
     mv -f .libs/simple-module.lo simple-module.lo
     $ libtool --mode=link gcc -g -o simple-module.la -rpath `pwd`
     -no-undefined -module -avoid-version simple-module.lo
     rm -fr .libs/simple-module.la .libs/simple-module.*
     .libs/simple-module.*
     gcc -shared  simple-module.lo  -lc  -Wl,-soname \
     -Wl,simple-module.so -o .libs/simple-module.so
     ar cru .libs/simple-module.a  simple-module.o
     creating simple-module.la
     (cd .libs && rm -f simple-module.la && ln -s ../simple-module.la \
     simple-module.la)

The names of the modules that may be subsequently `lt_dlopen'ed are
added to the application link line.  I am using the `-static' option to
force a static only link, which must use dlpreopened modules by
definition.  I am only specifying this because my host has native
dynamic loading, and Libtool will use that unless I force a static only
link, like this:

     $ libtool --mode=link gcc -static -g -o ltdl-loader ltdl-loader.c \
     -lltdl -dlopen ltdl-module.la -dlopen simple-module.la
     rm -f .libs/ltdl-loader.nm .libs/ltdl-loader.nmS \
     .libs/ltdl-loader.nmT
     creating .libs/ltdl-loaderS.c
     extracting global C symbols from ./.libs/ltdl-module.a
     extracting global C symbols from ./.libs/simple-module.a
     (cd .libs && gcc -c -fno-builtin -fno-rtti -fno-exceptions \
     "ltdl-loaderS.c")
     rm -f .libs/ltdl-loaderS.c .libs/ltdl-loader.nm \
     .libs/ltdl-loader.nmS .libs/ltdl-loader.nmT
     gcc -g -o ltdl-loader ltdl-loader.c .libs/ltdl-loaderS.o \
     ./.libs/ltdl-module.a -lm ./.libs/simple-module.a \
     /usr/lib/libltdl.a -ldl
     rm -f .libs/ltdl-loaderS.o
     $ ./ltdl-loader ltdl-module 345
     Square root of 345 is 18.574176
             => 0
     $ ./ltdl-loader simple-module World
     Hello, World!
             => 0

   Note that the current release of Libtool requires that the
pseudo-library be present for any libltdl loaded module, even preloaded
ones.  Once again, if there is sufficient demand, this may be fixed in
a future release. Until then, if the pseudo-library was deleted or
cannot be found, this will happen:

     $ rm -f simple-module.la
     $ ./ltdl-loader simple-module World
     ./ltdl-loader: file not found.

   A side effect of using the `LTDL_SET_PRELOADED_SYMBOLS' macro is
that if you subsequently link the application without Libtool, you will
get an undefined symbol for the Libtool supplied
`lt_preloaded_symbols'.  If you need to link in this fashion, you will
need to provide a stub that supplies the missing definition.
Conversely, you must be careful not to link the stub file when you _do_
link with Libtool, because it will clash with the Libtool generated
table it is supposed to replace:

     #include <ltdl.h>
     const lt_dlsymlist lt_preloaded_symbols[] = { { 0, 0 } };

Of course, if you use this stub, and link the application without the
benefits of Libtool, you will not be able to use any preloaded modules
- even if you statically link them, since there is no preloaded symbol
lookup table in this case.


File: autobook.info,  Node: User Module Loaders,  Prev: dlpreopen Loading,  Up: Using GNU libltdl

18.5 User Module Loaders
========================

While writing the module loading code for GNU M4 1.5, I found that
libltdl did not provide a way for loading modules in exactly the way I
required:  As good as the preloading feature of libltdl may be, and as
useful as it is for simplifying debugging, it doesn't have all the
functionality of full dynamic module loading when the host platform is
limited to static linking.  After all, you can only ever load modules
that were specified at link time, so for access to user supplied modules
the whole application must be relinked to preload these new modules
before `lt_dlopen' will be able to make use of the additional module
code.

   In this situation, it would be useful to be able to automate this
process.  That is, if a libltdl using process is unable to `lt_dlopen'
a module in any other fashion, but can find a suitable static archive
in the module search path, it should relink itself along with the
static archive (using `libtool' to preload the module), and then `exec'
the new executable.  Assuming all of this is successful, the attempt to
`lt_dlopen' can be tried again - if the `suitable' static archive was
chosen correctly it should now be possible to access the preloaded code.

* Menu:

* libltdl Loader Mechanism::
* libltdl Loader Management::
* libltdl Loader Errors::


File: autobook.info,  Node: libltdl Loader Mechanism,  Next: libltdl Loader Management,  Up: User Module Loaders

18.5.1 Loader Mechanism
-----------------------

Since Libtool 1.4, libltdl has provided a generalized method for loading
modules, which can be extended by the user.  libltdl has a default built
in list of module loading mechanisms, some of which are peculiar to a
given platform, others of which are more general.  When the `libltdl'
subdirectory of a project is configured, the list is narrowed to
include only those _mechanisms_, or simply "loaders", which can work on
the host architecture.  When `lt_dlopen' is called, the loaders in this
list are tried, in order, until the named module has loaded, or all of
the loaders in the list have been exhausted.  The entries in the final
list of loaders each have a unique name, although there may be several
candidate loaders for a single name before the list is narrowed.  For
example, the `dlopen' loader is implemented differently on BeOS and
Solaris - for a single host, there can be only one implementation of
any named loader.  The name of a module loader is something entirely
different to the name of a loaded module, something that should become
clearer as you read on.

   In addition to the loaders supplied with libltdl, your project can
add more loaders of its own.  New loaders can be added to the end of the
existing list, or immediately before any other particular loader, thus
giving you complete control of the relative priorities of all of the
active loaders in your project.

   In your module loading API, you might even support the dynamic
loading of user supplied loaders:  that is your users would be able to
create dynamic modules which added more loading mechanisms to the
existing list of loaders!

   Version 1.4 of Libtool has a default list that potentially contains
an implementation of the following loaders (assuming all are supported
by the host platform):

`dlpreopen'
     If the named module was preloaded, use the preloaded symbol table
     for subsequent `lt_dlsym' calls.

`dlopen'
     If the host machine has a native dynamic loader API use that to try
     and load the module.

`dld'
     If the host machine has GNU dld(1), use that to try and load the
     module.

   Note that loader names with a `dl' prefix are reserved for future
use by Libtool, so you should choose something else for your own module
names to prevent a name clash with future Libtool releases.

   ---------- Footnotes ----------

   (1) `http://www.gnu.org/software/dld'


File: autobook.info,  Node: libltdl Loader Management,  Next: libltdl Loader Errors,  Prev: libltdl Loader Mechanism,  Up: User Module Loaders

18.5.2 Loader Management
------------------------

The API supplies all of the functions you need to implement your own
module loading mechanisms to solve problems just like this:

 -- Function: lt_dlloader_t * lt_dlloader_find (const char *LOADER_NAME)
     Each of the module loaders implemented by libltdl is stored
     according to a unique name, which can be used to lookup the
     associated handle.  These handles operate in much the same way as
     `lt_dlhandle's:  They are used for passing references to modules in
     and out of the API, except that they represent a kind of _module
     loading method_, as opposed to a loaded module instance.

     This function finds the `lt_dlloader_t' handle associated with the
     unique name passed as the only argument, or else returns `NULL' if
     there is no such module loader registered.

 -- Function: int lt_dlloader_add (lt_dlloader_t *PLACE,
          lt_user_dlloader *DLLOADER, const char *LOADER_NAME)
     This function is used to register your own module loading
     mechanisms with libltdl.  If PLACE is given it must be a handle
     for an already registered module loader, which the new loader
     DLLOADER will be placed in front of for the purposes of which
     order to try loaders in.  If PLACE is `NULL', on the other hand,
     the new DLLOADER will be added to the end of the list of loaders
     to try when loading a module instance. In either case LOADER_NAME
     must be a unique name for use with `lt_dlloader_find'.

     The DLLOADER argument must be a C structure of the following
     format, populated with suitable function pointers which determine
     the functionality of your module loader:

          struct lt_user_dlloader {
            const char         *sym_prefix;
            lt_module_open_t   *module_open;
            lt_module_close_t  *module_close;
            lt_find_sym_t      *find_sym;
            lt_dlloader_exit_t *dlloader_exit;
            lt_dlloader_data_t dlloader_data;
          };

 -- Function: int lt_dlloader_remove (const char *LOADER_NAME)
     When there are no more loaded modules that were opened by the given
     module loader, the loader itself can be removed using this
     function.

When you come to set the fields in the `lt_user_dlloader' structure,
they must each be of the correct type, as described below:

 -- Type: const char * sym_prefix
     If a particular module loader relies on a prefix to each symbol
     being looked up (for example, the Windows module loader
     necessarily adds a `_' prefix to each symbol name passed to
     `lt_dlsym'), it should be recorded in the `sym_prefix' field.

 -- Type: lt_module_t lt_module_open_t (lt_dlloader_data_t LOADER_DATA,
          const char *MODULE_NAME)
     When `lt_dlopen' has reached your registered module loader when
     attempting to load a dynamic module, this is the type of the
     `module_open' function that will be called.  The name of the module
     that libltdl is attempting to load, along with the module loader
     instance data associated with the loader being used currently, are
     passed as arguments to such a function call.

     The `lt_module_t' returned by functions of this type can be
     anything at all that can be recognised as unique to a successfully
     loaded module instance when passed back into the `module_close' or
     `find_sym' functions in the `lt_user_dlloader' module loader
     structure.

 -- Type: int lt_module_close_t (lt_dlloader_data_t LOADER_DATA,
          lt_module_t MODULE)
     In a similar vein, a function of this type will be called by
     `lt_dlclose', where MODULE is the returned value from the
     `module_open' function which loaded this dynamic module instance.

 -- Type: lt_ptr_t lt_find_sym_t (lt_dlloader_data_t LOADER_DATA,
          lt_module_t MODULE,  const char *SYMBOL_NAME)
     In a similar vein once more, a function of this type will be
     called by `lt_dlsym', and must return the address of SYMBOL_NAME in
     MODULE.

 -- Type: int lt_dlloader_exit_t (lt_dlloader_data_t LOADER_DATA)
     When a user module loader is `lt_dlloader_remove'd, a function of
     this type will be called.  That function is responsible for
     releasing any resources that were allocated during the
     initialisation of the loader, so that they are not `leaked' when
     the `lt_user_dlloader' structure is recycled.

     Note that there is no initialisation function type:  the
     initialisation of a user module loader should be performed before
     the loader is registered with `lt_dlloader_add'.

 -- Type: lt_dlloader_data_t dlloader_data
     The DLLOADER_DATA is a spare field which can be used to store or
     pass any data specific to a particular module loader.  That data
     will always be passed as the value of the first argument to each
     of the implementation functions above.


File: autobook.info,  Node: libltdl Loader Errors,  Prev: libltdl Loader Management,  Up: User Module Loaders

18.5.3 Loader Errors
--------------------

When writing the code to fill out each of the functions needed to
populate the `lt_user_dlloader' structure, you will often need to raise
an error of some sort.  The set of standard errors which might be
raised by the internal module loaders are available for use in your own
loaders, and should be used where possible for the sake of uniformity if
nothing else.  On the odd occasion where that is not possible, libltdl
has API calls to register and set your own error messages, so that
users of your module loader will be able to call `lt_dlerror' and have
the error message you set returned:

 -- Function: int lt_dlseterror (int ERRORCODE)
     By calling this function with one of the error codes enumerated in
     the header file, `ltdl.h', `lt_dlerror' will return the associated
     diagnostic until the error code is changed again.

 -- Function: int lt_dladderror (const char *DIAGNOSTIC)
     Often you will find that the existing error diagnostics do not
     describe the failure you have encountered.  By using this function
     you can register a more suitable diagnostic with libltdl, and
     subsequently use the returned integer as an argument to
     `lt_dlseterror'.

   libltdl provides several other functions which you may find useful
when writing a custom module loader.  These are covered in the Libtool
manual, along with more detailed descriptions of the functions
described in the preceding paragraphs.

   In the next chapter, we will discuss the more complex features of
Automake, before moving on to show you how to use those features and add
libltdl module loading to the Sic project from *Note A Large GNU
Autotools Project:: in the chapter after that.


File: autobook.info,  Node: Advanced GNU Automake Usage,  Next: A Complex GNU Autotools Project,  Prev: Using GNU libltdl,  Up: Top

19 Advanced GNU Automake Usage
******************************

This chapter covers a few seemingly unrelated Automake features which
are commonly considered `advanced': conditionals, user-added language
support, and automatic dependency tracking.

* Menu:

* Automake Conditionals::
* Language support::
* Automatic dependency tracking::


File: autobook.info,  Node: Automake Conditionals,  Next: Language support,  Up: Advanced GNU Automake Usage

19.1 Conditionals
=================

Automake conditionals are a way to omit or include different parts of
the `Makefile' depending on what `configure' discovers.  A conditional
is introduced in `configure.in' using the `AM_CONDITIONAL' macro.  This
macro takes two arguments: the first is the name of the condition, and
the second is a shell expression which returns true when the condition
is true.

   For instance, here is how to make a condition named `TRUE' which is
always true:

     AM_CONDITIONAL(TRUE, true)

   As another example, here is how to make a condition named `DEBUG'
which is true when the user has given the `--enable-debug' option to
`configure':

     AM_CONDITIONAL(DEBUG, test "$enable_debug" = yes)

   Once you've defined a condition in `configure.in', you can refer to
it in your `Makefile.am' using the `if' statement.  Here is a part of a
sample `Makefile.am' that uses the conditions defined above:

     if TRUE
     ## This is always used.
     bin_PROGRAMS = foo
     endif

     if DEBUG
     AM_CFLAGS = -g -DDEBUG
     endif

   It's important to remember that Automake conditionals are
_configure-time_ conditionals.  They don't rely on any special feature
of `make', and there is no way for the user to affect the conditionals
from the `make' command line.  Automake conditionals work by rewriting
the `Makefile' - `make' is unaware that these conditionals even exist.

   Traditionally, Automake conditionals have been considered an advanced
feature.  However, practice has shown that they are often easier to use
and understand than other approaches to solving the same problem.  I now
recommend the use of conditionals to everyone.

   For instance, consider this example:

     bin_PROGRAMS = echo
     if FULL_ECHO
     echo_SOURCES = echo.c extras.c getopt.c
     else
     echo_SOURCES = echo.c
     endif

   In this case, the equivalent code without conditionals is more
confusing and correspondingly more difficult for the new Automake user
to figure out:

     bin_PROGRAMS = echo
     echo_SOURCES = echo.c
     echo_LDADD   = @echo_extras@
     EXTRA_echo_SOURCES = extras.c getopt.c

   Automake conditionals have some limitations.  One known problem is
that conditionals don't interact properly with `+=' assignment.  For
instance, consider this code:

     bin_PROGRAMS = z
     z_SOURCES = z.c
     if SOME_CONDITION
     z_SOURCES += cond.c
     endif

   This code appears to have an unambiguous meaning, but Automake 1.4
doesn't implement this and will give an error.  This bug will be fixed
in the next major Automake release.


File: autobook.info,  Node: Language support,  Next: Automatic dependency tracking,  Prev: Automake Conditionals,  Up: Advanced GNU Automake Usage

19.2 Language support
=====================

Automake comes with built-in knowledge of the most common compiled
languages: C, C++, Objective C, Yacc, Lex, assembly, and Fortran.
However, programs are sometimes written in an unusual language, or in a
custom language that is translated into something more common.  Automake
lets you handle these cases in a natural way.

   Automake's notion of a `language' is tied to the suffix appended to
each source file written in that language.  You must inform Automake of
each new suffix you introduce.  This is done by listing them in the
`SUFFIXES' macro.  For instance, suppose you are writing part of your
program in the language `M', which is compiled to object code by a
program named `mc'.  The typical suffix for an `M' source file is `.m'.
In your `Makefile.am' you would write:

     SUFFIXES = .m

   This differs from ordinary `make' usage, where you would use the
special `.SUFFIX' target to list suffixes.

   Now you need to tell Automake (and `make') how to compile a `.m'
file to a `.o' file.  You do this by writing an ordinary `make' suffix
rule:

     MC = mc
     .m.o:
             $(MC) $(MCFLAGS) $(AM_MCFLAGS) -c $<

   Note that we introduced the `MC', `MCFLAGS', and `AM_MCFLAGS'
variables.  While not required, this is good style in case you want to
override any of these later (for instance from the command line).

   Automake understands enough about suffix rules to recognize that
`.m' files can be treated just like any file it already understands, so
now you can write:

     bin_PROGRAMS = myprogram
     myprogram_SOURCES = foo.c something.m

   Note that Automake does not really understand chained suffix rules;
however, frequently the right thing will happen anyway.  For instance,
if you have a `.m.c' rule, Automake will naively assume that `.m' files
should be turned into `.o' files - and then it will proceed to rely on
`make' to do the real work.  In this example, if the translation takes
three steps--from `.m' to `.x', then from `.x' to `.c', and finally to
`.o'--then Automake's simplistic approach will break.  Fortunately,
these cases are very rare.


File: autobook.info,  Node: Automatic dependency tracking,  Prev: Language support,  Up: Advanced GNU Automake Usage

19.3 Automatic dependency tracking
==================================

Keeping track of dependencies for a large program is tedious and
error-prone.  Many edits require the programmer to update dependencies,
but for some changes, such as adding a `#include' to an existing
header, the change is large enough that he simply refuses (or does it
incorrectly).  To fix this problem, Automake supports automatic
dependency tracking.

   The implementation of automatic dependency tracking in Automake 1.4
requires `gcc' and GNU `make'.  These programs are only required for
maintainers; the `Makefile's generated by `make dist' are completely
portable.  If you can't use `gcc' or GNU `make' for your project, then
you are simply out of luck; you have to disable dependency tracking.

   Automake 1.5 will include a completely new dependency tracking
implementation.  This new implementation will work with any compiler and
any version of `make'.

   Another limitation of the current scheme is that the dependencies
included into the portable `Makefile's by `make dist' are derived from
the current build environment.  First, this means that you must use
`make all' before you can meaningfully run `make dist' (otherwise the
dependencies won't have been created).  Second, this means that any
files not built in your current tree will not have dependencies in the
distributed `Makefile's.  The new implementation will avoid both of
these shortcomings as well.

   Automatic dependency tracking is on by default; you don't have to do
anything special to get it.  To turn it off, either run `automake -i'
instead of plain `automake', or put `no-dependencies' into the
`AUTOMAKE_OPTIONS' macro in each `Makefile.am'.


File: autobook.info,  Node: A Complex GNU Autotools Project,  Next: M4,  Prev: Advanced GNU Automake Usage,  Up: Top

20 A Complex GNU Autotools Project
**********************************

This chapter polishes the worked example I introduced in *Note A Small
GNU Autotools Project::, and developed in *Note A Large GNU Autotools
Project::.  As always, the ideas presented here are my own views and
not necessarily the only way to do things.  Everything I present here
has, however, served me well for quite some time, and you should find
plenty of interesting ideas for your own projects.

   Herein, I will add a libltdl module loading system to Sic, as well
as some sample modules to illustrate how extensible such a project can
be. I will also explain how to integrate the `dmalloc' library into the
development of a project, and show why this is important.

   If you noticed that, as it stands, Sic is only useful as an
interactive shell unable to read commands from a file, then go to the
top of the class!  In order for it to be of genuine use, I will extend
it to interpret commands from a file too.

* Menu:

* A Module Loading Subsystem::
* A Loadable Module::
* Interpreting Commands from a File::
* Integrating Dmalloc::


File: autobook.info,  Node: A Module Loading Subsystem,  Next: A Loadable Module,  Up: A Complex GNU Autotools Project

20.1 A Module Loading Subsystem
===============================

As you saw in *Note Using GNU libltdl::, I need to put an invocation of
the macro `AC_LIBTOOL_DLOPEN' just before `AC_PROG_LIBTOOL', in the
file `configure.in'.  But, as well as being able to use `libtoolize
--ltdl', which adds libltdl in a subdirectory with its own
subconfigure, you can also manually copy just the ltdl source files
into your project(1), and use `AC_LIB_LTDL' in your existing
`configure.in'.  At the time of writing, this is still a very new and
(as yet) undocumented feature, with a few kinks that need to be ironed
out.  In any case you probably shouldn't use this method to add
`ltdl.lo' to a C++ library, since `ltdl.c' is written in C.  If you do
want to use libltdl with a C++ library, things will work much better if
you build it in a subdirectory generated with `libtoolize --ltdl'.

   For this project, lets:

     $ cp /usr/share/libtool/libltdl/ltdl.[ch] sic/

   The Sic module loader is probably as complicated as any you will ever
need to write, since it must support two kinds of modules: modules which
contain additional built-in commands for the interpreter; and modules
which extend the Sic syntax table.  A single module can also provide
both syntax extensions _and_ additional built-in commands.

* Menu:

* Initialising the Module Loader::
* Managing Module Loader Errors::
* Loading a Module::
* Unloading a Module::

   ---------- Footnotes ----------

   (1) If you have an early 1.3c snapshot of Libtool, you will also
need to copy the `ltdl.m4' file into your distribution.


File: autobook.info,  Node: Initialising the Module Loader,  Next: Managing Module Loader Errors,  Up: A Module Loading Subsystem

20.1.1 Initialising the Module Loader
-------------------------------------

Before using this code (or any other libltdl based module loader for
that matter), a certain amount of initialisation is required:

   * libltdl itself requires initialisation.

       1. libltdl should be told to use the same memory allocation
          routines used by the rest of Sic.

       2. Any preloaded modules (*note dlpreopen Loading::) need to be
          initialised with `LTDL_SET_PRELOADED_SYMBOLS()'.

       3. `ltdl_init()' must be called.

   * The module search path needs to be set.  Here I allow the
     installer to specify a default search path to correspond with the
     installed Sic modules at compile time, but search the directories
     in the runtime environment variable `SIC_MODULES_PATH' first.

   * The internal error handling needs to be initialised.

   Here is the start of the module loader, `sic/module.c', including
the initialisation code for libltdl:

     #if HAVE_CONFIG_H
     #  include <config.h>
     #endif

     #include "common.h"
     #include "builtin.h"
     #include "eval.h"
     #include "ltdl.h"
     #include "module.h"
     #include "sic.h"

     #ifndef SIC_MODULE_PATH_ENV
     #  define SIC_MODULE_PATH_ENV   "SIC_MODULE_PATH"
     #endif

     int
     module_init (void)
     {
       static int initialised = 0;
       int errors = 0;

       /* Only perform the initialisation once. */
       if (!initialised)
         {
           /* ltdl should use the same mallocation as us. */
           lt_dlmalloc = (lt_ptr_t (*) (size_t)) xmalloc;
           lt_dlfree = (void (*) (lt_ptr_t)) free;

           /* Make sure preloaded modules are initialised. */
           LTDL_SET_PRELOADED_SYMBOLS();

           last_error = NULL;

           /* Call ltdl initialisation function. */
           errors = lt_dlinit();


           /* Set up the module search directories. */
           if (errors == 0)
             {
               const char *path = getenv (SIC_MODULE_PATH_ENV);

               if (path != NULL)
                 errors = lt_dladdsearchdir(path);
             }

           if (errors == 0)
             errors = lt_dladdsearchdir(MODULE_PATH);

           if (errors != 0)
             last_error = lt_dlerror ();

           ++initialised;

           return errors ? SIC_ERROR : SIC_OKAY;
         }

       last_error = multi_init_error;
       return SIC_ERROR;
     }


File: autobook.info,  Node: Managing Module Loader Errors,  Next: Loading a Module,  Prev: Initialising the Module Loader,  Up: A Module Loading Subsystem

20.1.2 Managing Module Loader Errors
------------------------------------

The error handling is a very simplistic wrapper for the libltdl error
functions, with the addition of a few extra errors specific to this
module loader code(1).  Here are the error messages from `module.c':

     static char multi_init_error[]
                 = "module loader initialised more than once";
     static char no_builtin_table_error[]
                 = "module has no builtin or syntax table";
     static char builtin_unload_error[]
                 = "builtin table failed to unload";
     static char syntax_unload_error[]
                 = "syntax table failed to unload";
     static char module_not_found_error[]
                 = "no such module";
     static char module_not_unloaded_error[]
                 = "module not unloaded";

     static const char *last_error = NULL;

     const char *
     module_error (void)
     {
       return last_error;
     }

   ---------- Footnotes ----------

   (1) This is very different to the way errors are managed when
writing a custom loader for libltdl. Compare this section with *Note
Loader Errors: libltdl Loader Errors.


File: autobook.info,  Node: Loading a Module,  Next: Unloading a Module,  Prev: Managing Module Loader Errors,  Up: A Module Loading Subsystem

20.1.3 Loading a Module
-----------------------

Individual modules are managed by finding specified "entry points"
(prescribed exported symbols) in the module:

 -- Variable: const Builtin * builtin_table
     An array of names of built-in commands implemented by a module,
     with associated handler functions.

 -- Function: void module_init (Sic *SIC)
     If present, this function will be called when the module is loaded.

 -- Function: void module_finish (Sic *SIC)
     If supplied, this function will be called just before the module is
     unloaded.

 -- Variable: const Syntax * syntax_table
     An array of syntactically significant symbols, and associated
     handler functions.

 -- Function: int syntax_init (Sic *SIC)
     If specified, this function will be called by Sic before the
     syntax of each input line is analysed.

 -- Function: int syntax_finish (Sic *SIC, BufferIn *IN, BufferOut *OUT)
     Similarly, this function will be call after the syntax analysis of
     each line has completed.

   All of the hard work in locating and loading the module, and
extracting addresses for the symbols described above is performed by
libltdl.  The `module_load' function below simply registers these
symbols with the Sic interpreter so that they are called at the
appropriate times - or diagnoses any errors if things don't go
according to plan:

     int
     module_load (Sic *sic, const char *name)
     {
       lt_dlhandle module;
       Builtin *builtin_table;
       Syntax *syntax_table;
       int status = SIC_OKAY;

       last_error = NULL;

       module = lt_dlopenext (name);

       if (module)
         {
           builtin_table = (Builtin*) lt_dlsym (module, "builtin_table");
           syntax_table = (Syntax *) lt_dlsym (module, "syntax_table");
           if (!builtin_table && !syntax_table)
             {
               lt_dlclose (module);
               last_error = no_builtin_table_error;
               module = NULL;
             }
         }

       if (module)
         {
           ModuleInit *init_func
             = (ModuleInit *) lt_dlsym (module, "module_init");
           if (init_func)
               (*init_func) (sic);
         }

       if (module)
         {
           SyntaxFinish *syntax_finish
             = (SyntaxFinish *) lt_dlsym (module, "syntax_finish");
           SyntaxInit *syntax_init
             = (SyntaxInit *) lt_dlsym (module, "syntax_init");

           if (syntax_finish)
             sic->syntax_finish = list_cons (list_new (syntax_finish),
                                             sic->syntax_finish);
           if (syntax_init)
             sic->syntax_init = list_cons (list_new (syntax_init),
                                           sic->syntax_init);
         }

       if (module)
         {
           if (builtin_table)
             status = builtin_install (sic, builtin_table);

           if (syntax_table && status == SIC_OKAY)
             status = syntax_install (sic, module, syntax_table);

           return status;
         }

       last_error = lt_dlerror();
       if (!last_error)
         last_error = module_not_found_error;

       return SIC_ERROR;
     }

Notice that the generalised `List' data type introduced earlier (*note
A Small GNU Autotools Project::) is reused to keep a list of
accumulated module initialisation and finalisation functions.


File: autobook.info,  Node: Unloading a Module,  Prev: Loading a Module,  Up: A Module Loading Subsystem

20.1.4 Unloading a Module
-------------------------

When unloading a module, several things must be done:

   * Any built-in commands implemented by this module must be
     unregistered so that Sic doesn't try to call them after the
     implementation has been removed.

   * Any syntax extensions implemented by this module must be similarly
     unregistered, including `syntax_init' and `syntax_finish'
     functions.

   * If there is a finalisation entry point in the module,
     `module_finish' (*note Loading a Module::), it must be called.

   My first cut implementation of a module subsystem kept a list of the
entry points associated with each module so that they could be looked up
and removed when the module was subsequently unloaded.  It also kept
track of multiply loaded modules so that a module wasn't unloaded
prematurely.  libltdl already does all of this though, and it is
wasteful to duplicate all of that work.  This system uses
`lt_dlforeach' and `lt_dlgetinfo' to access libltdls records of loaded
modules, and save on duplication.  These two functions are described
fully in*Note Libltdl interface: (Libtool)Libltdl interface.

     static int unload_ltmodule (lt_dlhandle module, lt_ptr_t data);

     struct unload_data { Sic *sic; const char *name; };

     int
     module_unload (Sic *sic, const char *name)
     {
       struct unload_data data;

       last_error = NULL;

       data.sic = sic;
       data.name = name;

       /* Stopping might be an error, or we may have unloaded the module. */
       if (lt_dlforeach (unload_ltmodule, (lt_ptr_t) &data) != 0)
         if (!last_error)
           return SIC_OKAY;

       if (!last_error)
         last_error = module_not_found_error;

       return SIC_ERROR;
     }

This function asks libltdl to call the function `unload_ltmodule' for
each of the modules it has loaded, along with some details of the
module it wants to unload.  The tricky part of the callback function
below is recalculating the entry point addresses for the module to be
unloaded and then removing all matching addresses from the appropriate
internal structures.  Otherwise, the balance of this callback is
involved in informing the calling `lt_dlforeach' loop of whether a
matching module has been found and handled:

     static int userdata_address_compare (List *elt, void *match);

     /* This callback returns 0 if the module was not yet found.
        If there is an error, LAST_ERROR will be set, otherwise the
        module was successfully unloaded. */
     static int
     unload_ltmodule (lt_dlhandle module, void *data)
     {
       struct unload_data *unload = (struct unload_data *) data;
       const lt_dlinfo *module_info = lt_dlgetinfo (module);

       if ((unload == NULL)
           || (unload->name == NULL)
           || (module_info == NULL)
           || (module_info->name == NULL)
           || (strcmp (module_info->name, unload->name) != 0))
         {
           /* No match, return 0 to keep searching */
           return 0;
         }

       if (module)
         {
           /* Fetch the addresses of the entrypoints into the module. */
           Builtin *builtin_table
             = (Builtin*) lt_dlsym (module, "builtin_table");
           Syntax *syntax_table
             = (Syntax *) lt_dlsym (module, "syntax_table");
           void *syntax_init_address
             = (void *) lt_dlsym (module, "syntax_init");
           void **syntax_finish_address
             = (void *) lt_dlsym (module, "syntax_finish");
           List *stale;

           /* Remove all references to these entry points in the internal
              data structures, before actually unloading the module. */
           stale = list_remove (&unload->sic->syntax_init,
                        syntax_init_address, userdata_address_compare);
           XFREE (stale);

           stale = list_remove (&unload->sic->syntax_finish,
                        syntax_finish_address, userdata_address_compare);
           XFREE (stale);

           if (builtin_table
               && builtin_remove (unload->sic, builtin_table) != SIC_OKAY)
             {
               last_error = builtin_unload_error;
               module = NULL;
             }

           if (syntax_table
               && SIC_OKAY != syntax_remove (unload->sic, module,
                                             syntax_table))
             {
               last_error = syntax_unload_error;
               module = NULL;
             }
         }

       if (module)
         {
           ModuleFinish *finish_func
             = (ModuleFinish *) lt_dlsym (module, "module_finish");

           if (finish_func)
             (*finish_func) (unload->sic);
         }

       if (module)
         {
           if (lt_dlclose (module) != 0)
             module = NULL;
         }

       /* No errors?  Stop the search! */
       if (module)
         return 1;

       /* Find a suitable diagnostic. */
       if (!last_error)
         last_error = lt_dlerror();
       if (!last_error)
         last_error = module_not_unloaded_error;

       /* Error diagnosed.  Stop the search! */
       return -1;
     }

     static int
     userdata_address_compare (List *elt, void *match)
     {
       return (int) (elt->userdata - match);
     }

The `userdata_address_compare' helper function at the end is used to
compare the address of recalculated entry points against the already
registered functions and handlers to find which items need to be
unregistered.

   There is also a matching header file to export the module interface,
so that the code for loadable modules can make use of it:

     #ifndef SIC_MODULE_H
     #define SIC_MODULE_H 1

     #include <sic/builtin.h>
     #include <sic/common.h>
     #include <sic/sic.h>

     BEGIN_C_DECLS

     typedef void ModuleInit         (Sic *sic);
     typedef void ModuleFinish       (Sic *sic);

     extern const char *module_error (void);
     extern int module_init          (void);
     extern int module_load          (Sic *sic, const char *name);
     extern int module_unload        (Sic *sic, const char *name);

     END_C_DECLS

     #endif /* !SIC_MODULE_H */

This header also includes some of the other Sic headers, so that in most
cases, the source code for a module need only `#include <sic/module.h>'.

   To make the module loading interface useful, I have added built-ins
for `load' and `unload'.  Naturally, these must be compiled into the
bare `sic' executable, so that it is able to load additional modules:

     #if HAVE_CONFIG_H
     #  include <config.h>
     #endif

     #include "module.h"
     #include "sic_repl.h"

     /* List of built in functions. */
     #define builtin_functions               \
             BUILTIN(exit,           0, 1)   \
             BUILTIN(load,           1, 1)   \
             BUILTIN(unload,         1, -1)

     BUILTIN_DECLARATION (load)
     {
       int status = SIC_ERROR;

       if (module_load (sic, argv[1]) < 0)
         {
           sic_result_clear (sic);
           sic_result_append (sic, "module \"", argv[1], "\" not loaded: ",
                              module_error (), NULL);
         }
       else
         status = SIC_OKAY;

       return status;
     }

     BUILTIN_DECLARATION (unload)
     {
       int status = SIC_ERROR;
       int i;

       for (i = 1; argv[i]; ++i)
         if (module_unload (sic, argv[i]) != SIC_OKAY)
           {
             sic_result_clear (sic);
             sic_result_append (sic, "module \"", argv[1],
                                "\" not unloaded: ", module_error (), NULL);
           }
         else
           status = SIC_OKAY;

       return status;
     }

These new built-in commands are simply wrappers around the module
loading code in `module.c'.

   As with `dlopen', you can  use libltdl to `lt_dlopen' the main
executable, and then lookup _its_ symbols.  I have simplified the
initialisation of Sic by replacing the `sic_init' function in
`src/sic.c' by `loading' the executable itself as a module.  This works
because I was careful to use the same format in `sic_builtin.c' and
`sic_syntax.c' as would be required for a genuine loadable module, like
so:

       /* initialise the module subsystem */
       if (module_init () != SIC_OKAY)
           sic_fatal ("module initialisation failed");

       if (module_load (sic, NULL) != SIC_OKAY)
           sic_fatal ("sic initialisation failed");


File: autobook.info,  Node: A Loadable Module,  Next: Interpreting Commands from a File,  Prev: A Module Loading Subsystem,  Up: A Complex GNU Autotools Project

20.2 A Loadable Module
======================

A feature of the Sic interpreter is that it will use the `unknown'
built-in to handle any command line which is not handled by any of the
other registered built-in callback functions.  This mechanism is very
powerful, and allows me to lookup unhandled built-ins in the user's
`PATH', for instance.

   Before adding any modules to the project, I have created a separate
subdirectory, `modules', to put the module source code into.  Not
forgetting to list this new subdirectory in the `AC_OUTPUT' macro in
`configure.in', and the `SUBDIRS' macro in the top level `Makefile.am',
a new `Makefile.am' is needed to build the loadable modules:

     ## Makefile.am -- Process this file with automake to produce Makefile.in
     INCLUDES        = -I$(top_builddir) -I$(top_srcdir) \
                     -I$(top_builddir)/sic -I$(top_srcdir)/sic \
                     -I$(top_builddir)/src -I$(top_srcdir)/src

     pkglib_LTLIBRARIES = unknown.la

`pkglibdir' is a Sic specific directory where modules will be
installed, *Note Installing and Uninstalling Configured Packages:
Installing and Uninstalling.

     For a library to be maximally portable, it should be written so
     that it does not require back-linking(1) to resolve its own
     symbols.  That is, if at all possible you should design all of
     your libraries (not just dynamic modules) so that all of their
     symbols can be resolved at linktime.  Sometimes, it is impossible
     or undesirable to architect your libraries and modules in this
     way.  In that case you sacrifice the portability of your project
     to platforms such as AIX and Windows.

   The key to building modules with libtool is in the options that are
specified when the module is linked.  This is doubly true when the
module must work with libltdl's dlpreopening mechanism.

     unknown_la_SOURCES = unknown.c
     unknown_la_LDFLAGS = -no-undefined -module -avoid-version
     unknown_la_LIBADD  = $(top_builddir)/sic/libsic.la

Sic modules are built without a `lib' prefix (`-module'), and without
version suffixes (`-avoid-version').  All of the undefined symbols are
resolved at linktime by `libsic.la', hence `-no-undefined'.

Having added `ltdl.c' to the `sic' subdirectory, and called the
`AC_LIB_LTDL' macro in `configure.in', `libsic.la' cannot build
correctly on those architectures which do not support back-linking.
This is because `ltdl.c' simply abstracts the native `dlopen' API with
a common interface, and that local interface often requires that a
special library be linked - `-ldl' on linux, for example.
`AC_LIB_LTDL' probes the system to determine the name of any such
dlopen library, and allows you to depend on it in a portable way by
using the configure substitution macro, `@LIBADD_DL@'.  If I were
linking a `libtool' compiled libltdl at this juncture, the system
library details would have already been taken care of.  In this
project, I have bypassed that mechanism by compiling and linking
`ltdl.c' myself, so I have altered `sic/Makefile.am' to use
`@LIBADD_DL@':

     lib_LTLIBRARIES         = libcommon.la libsic.la

     libsic_la_LIBADD        = $(top_builddir)/replace/libreplace.la \
                             libcommon.la @LIBADD_DL@
     libsic_la_SOURCES         = builtin.c error.c eval.c list.c ltdl.c \
                             module.c sic.c syntax.c

   Having put all this infrastructure in place, the code for the
`unknown' module is a breeze (helper functions omitted for brevity):

     #if HAVE_CONFIG_H
     #  include <config.h>
     #endif

     #include <sys/types.h>
     #include <sys/wait.h>
     #include <sic/module.h>

     #define builtin_table   unknown_LTX_builtin_table

     static char *path_find  (const char *command);
     static int path_execute (Sic *sic, const char *path, char *const argv[]);

     /* Generate prototype. */
     SIC_BUILTIN (builtin_unknown);

     Builtin builtin_table[] = {
       { "unknown", builtin_unknown, 0, -1 },
       { 0, 0, -1, -1 }
     };

     BUILTIN_DECLARATION(unknown)
     {
       char *path = path_find (argv[0]);
       int status = SIC_ERROR;

       if (!path)
         sic_result_append (sic, "command \"", argv[0], "\" not found",
                            NULL);
       else if (path_execute (sic, path, argv) != SIC_OKAY)
         sic_result_append (sic, "command \"", argv[0],"\" failed: ",
                            strerror (errno), NULL);
       else
         status = SIC_OKAY;

       return status;
     }

In the first instance, notice that I have used the preprocessor to
redefine the entry point functions to be compatible with libltdls
`dlpreopen', hence the `unknown_LTX_builtin_table' `cpp' macro.  The
`unknown' handler function itself looks for a suitable executable in
the user's path, and if something suitable _is_ found, executes it.

   Notice that Libtool doesn't relink dependent libraries (`libsic'
depends on `libcommon', for example) on my GNU/Linux system, since they
are not required for the static library in any case, and because the
dependencies are also encoded directly into the shared archive,
`libsic.so', by the original link.  On the other hand, Libtool _will_
relink the dependent libraries if that is necessary for the target host.

     $ make
     /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I.. \
     -I.. -I.. -I../sic -I../sic -I../src -I../src    -g -O2 -c unknown.c
     mkdir .libs
     gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I.. -I../sic -I../sic -I../src \
     -I../src -g -O2 -Wp,-MD,.deps/unknown.pp -c unknown.c  -fPIC -DPIC \
     -o .libs/unknown.lo
     gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I.. -I../sic -I../sic -I../src \
     I../src -g -O2 -Wp,-MD,.deps/unknown.pp -c unknown.c -o unknown.o \
     >/dev/null 2>&1
     mv -f .libs/unknown.lo unknown.lo
     /bin/sh ../libtool --mode=link gcc  -g -O2  -o unknown.la -rpath \
     /usr/local/lib/sic -no-undefined -module -avoid-version unknown.lo \
     ../sic/libsic.la
     rm -fr .libs/unknown.la .libs/unknown.* .libs/unknown.*
     gcc -shared  unknown.lo -L/tmp/sic/sic/.libs ../sic/.libs/libsic.so \
     -lc  -Wl,-soname -Wl,unknown.so -o .libs/unknown.so
     ar cru .libs/unknown.a  unknown.o
     creating unknown.la
     (cd .libs && rm -f unknown.la && ln -s ../unknown.la unknown.la)
     $ ./libtool --mode=execute ldd ./unknown.la
             libsic.so.0 => /tmp/sic/.libs/libsic.so.0 (0x40002000)
             libc.so.6 => /lib/libc.so.6 (0x4000f000)
             libcommon.so.0 => /tmp/sic/.libs/libcommon.so.0 (0x400ec000)
             libdl.so.2 => /lib/libdl.so.2 (0x400ef000)
             /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)

   After compiling the rest of the tree, I can now use the `unknown'
module:

     $ SIC_MODULE_PATH=`cd ../modules; pwd` ./sic
     ] echo hello!
     command "echo" not found.
     ] load unknown
     ] echo hello!
     hello!
     ] unload unknown
     ] echo hello!
     command "echo" not found.
     ] exit
     $

   ---------- Footnotes ----------

   (1) *Note Introducing libltdl::


File: autobook.info,  Node: Interpreting Commands from a File,  Next: Integrating Dmalloc,  Prev: A Loadable Module,  Up: A Complex GNU Autotools Project

20.3 Interpreting Commands from a File
======================================

For all practical purposes, any interpreter is pretty useless if it only
works interactively.  I have added a `source' built-in command to
`sic_builtin.c' which takes lines of input from a file and evaluates
them using `sic_repl.c' in much the same way as lines typed at the
prompt are evaluated otherwise.  Here is the built-in handler:

     /* List of built in functions. */
     #define builtin_functions               \
             BUILTIN(exit,           0, 1)   \
             BUILTIN(load,           1, 1)   \
             BUILTIN(source,         1, -1)  \
             BUILTIN(unload,         1, -1)

     BUILTIN_DECLARATION (source)
     {
       int status = SIC_OKAY;
       int i;

       for (i = 1; status == SIC_OKAY && argv[i]; ++i)
         status = source (sic, argv[i]);

       return status;
     }

   And the `source' function from `sic_repl.c':

     int
     source (Sic *sic, const char *path)
     {
       FILE *stream;
       int result = SIC_OKAY;
       int save_interactive = is_interactive;

       SIC_ASSERT (sic && path);

       is_interactive = 0;

       if ((stream = fopen (path, "rt")) == NULL)
         {
           sic_result_clear (sic);
           sic_result_append (sic, "cannot source \"", path, "\": ",
                              strerror (errno), NULL);
           result = SIC_ERROR;
         }
       else
         result =  evalstream (sic, stream);

       is_interactive = save_interactive;

       return result;
     }

   The reason for separating the `source' function in this way, is that
it makes it easy for the startup sequence in `main' to evaluate a
startup file.  In traditional Unix fashion, the startup file is named
`.sicrc', and is evaluated if it is present in the user's home
directory:

     static int
     evalsicrc (Sic *sic)
     {
       int result = SIC_OKAY;
       char *home = getenv ("HOME");
       char *sicrcpath, *separator = "";
       int len;

       if (!home)
         home = "";

       len = strlen (home);
       if (len && home[len -1] != '/')
         separator = "/";

       len += strlen (separator) + strlen (SICRCFILE) + 1;

       sicrcpath = XMALLOC (char, len);
       sprintf (sicrcpath, "%s%s%s", home, separator, SICRCFILE);

       if (access (sicrcpath, R_OK) == 0)
         result = source (sic, sicrcpath);

       return result;
     }


File: autobook.info,  Node: Integrating Dmalloc,  Prev: Interpreting Commands from a File,  Up: A Complex GNU Autotools Project

20.4 Integrating Dmalloc
========================

A huge number of bugs in C and C++ code are caused by mismanagement of
memory.  Using the wrapper functions described earlier (*note Memory
Management::), or their equivalent, can help immensely in reducing the
occurrence of such bugs.  Ultimately, you will introduce a
difficult-to-diagnose memory bug in spite of these measures.

   That is where Dmalloc(1) comes in.  I recommend using it routinely
in all of your projects -- you will find all sorts of leaks and bugs
that might otherwise have lain dormant for some time.  Automake has
explicit support for Dmalloc to make using it in your own projects as
painless as possible.  The first step is to add the macro
`AM_WITH_DMALLOC' to `configure.in'.  Citing this macro adds a
`--with-dmalloc' option to `configure', which, when specified by the
user, adds `-ldmalloc' to `LIBS' and defines `WITH_DMALLOC'.

   The usefulness of Dmalloc is much increased by compiling an entire
project with the header, `dmalloc.h' - easily achieved in Sic by
conditionally adding it to `common-h.in':

     BEGIN_C_DECLS

     #define XCALLOC(type, num)                                      \
             ((type *) xcalloc ((num), sizeof(type)))
     #define XMALLOC(type, num)                                      \
             ((type *) xmalloc ((num) * sizeof(type)))
     #define XREALLOC(type, p, num)                                  \
             ((type *) xrealloc ((p), (num) * sizeof(type)))
     #define XFREE(stale)                    do {                    \
             if (stale) { free ((void *) stale);  stale = 0; }       \
                                             } while (0)

     extern void *xcalloc          (size_t num, size_t size);
     extern void *xmalloc          (size_t num);
     extern void *xrealloc         (void *p, size_t num);
     extern char *xstrdup          (const char *string);

     END_C_DECLS

     #if WITH_DMALLOC
     #  include <dmalloc.h>
     #endif

I have been careful to include the `dmalloc.h' header from the end of
this file so that it overrides my own _definitions_ without renaming
the function _prototypes_.  Similarly I must be careful to accommodate
Dmalloc's redefinition of the mallocation routines in `sic/xmalloc.c'
and `sic/xstrdup.c', by putting each file inside an `#ifndef
WITH_DMALLOC'.  That way, when compiling the project, if
`--with-dmalloc' is specified and the `WITH_DMALLOC' preprocessor
symbol is defined, then Dmalloc's debugging definitions of `xstrdup'
et. al. will be used in place of the versions I wrote.

   Enabling Dmalloc is now simply a matter of reconfiguring the whole
package using the `--with-dmalloc' option, and disabling it again is a
matter of recofinguring without that option.

   The use of Dmalloc is beyond the scope of this book, and is in any
case described very well in the documentation that comes with the
package.  I strongly recommend you become familiar with it - the time
you invest here will pay dividends many times over in the time you save
debugging.

   This chapter completes the description of the Sic library project,
and indeed this part of the book.  All of the infrastructure for
building an advanced command line shell is in place now - you need only
add the builtin and syntax function definitions to create a complete
shell of your own.

   Each of the chapters in the next part of the book explores a more
specialised application of the GNU Autotools, starting with a
discussion of M4, a major part of the implementation of Autoconf.

   ---------- Footnotes ----------

   (1) Dmalloc is distributed from `http://www.dmalloc.com'.


File: autobook.info,  Node: M4,  Next: Writing Portable Bourne Shell,  Prev: A Complex GNU Autotools Project,  Up: Top

21 M4
*****

M4 is a general purpose tool for processing text and has existed on Unix
systems of all kinds for many years, rarely catching the attention of
users.  Text generation through macro processing is not a new concept.
Originally M4 was designed as the preprocessor for the Rational FORTRAN
system and was influenced by the General Purpose Macro generator, GPM,
first described by Stratchey in 1965!  GNU M4 is the GNU project's
implementation of M4 and was written by Rene' Seindal in 1990.

   In recent years, awareness of M4 has grown through its use by
popular free software packages.  The Sendmail package incorporates a
configuration system that uses M4 to generate its complex `sendmail.cf'
file from a simple specification of the desired configuration.
Autoconf uses M4 to generate output files such as a `configure' script.

   It is somewhat unfortunate that users of GNU Autotools need to know
so much about M4, because it has been too exposed.  Many of these
tools' implementation details were simply left up to M4, forcing the
user to know about M4 in order to use them.  It is a well-known problem
and there is a movement amongst the development community to improve
this shortcoming in the future.  This deficiency is the primary reason
that this chapter exists--it is important to have a good working
knowledge of M4 in order to use the GNU Autotools and to extend it with
your own macros (*note Writing New Macros for Autoconf::).

   The GNU M4 manual provides a thorough tutorial on M4.  Please refer
to it for additional information.

* Menu:

* What does M4 do? ::
* How GNU Autotools uses M4 ::
* Fundamentals of M4 processing ::
* Features of M4 ::
* Writing macros within the GNU Autotools framework ::


File: autobook.info,  Node: What does M4 do?,  Next: How GNU Autotools uses M4,  Up: M4

21.1 What does M4 do?
=====================

`m4' is a general purpose tool suitable for all kinds of text
processing applications--not unlike the C preprocessor, `cpp', with
which you are probably familiar.  Its obvious application is as a
front-end for a compiler--`m4' is in many ways superior to `cpp'.

   Briefly, `m4' reads text from the input and writes processed text to
the output.  Symbolic macros may be defined which have replacement
text.  As macro invocations are encountered in the input, they are
replaced (`expanded') with the macro's definition.  Macros may be
defined with a set of parameters and the definition can specify where
the actual parameters will appear in the expansion.  These concepts
will be elaborated on in *Note Fundamentals of M4 processing::.

   M4 includes a set of pre-defined macros that make it substantially
more useful.  The most important ones will be discussed in *Note
Features of M4::.  These macros perform functions such as arithmetic,
conditional expansion, string manipulation and running external shell
commands.


File: autobook.info,  Node: How GNU Autotools uses M4,  Next: Fundamentals of M4 processing,  Prev: What does M4 do?,  Up: M4

21.2 How GNU Autotools uses M4
==============================

The GNU Autotools may all appear to use M4, but in actual fact, it all
boils down to `autoconf' that invokes `m4' to generate your `configure'
script.  You might be surprised to learn that the shell code in
`configure' does not use `m4' to generate a final `Makefile' from
`Makefile.in'.  Instead, it uses `sed', since that is more likely to be
present on an end-user's system and thereby removes the dependency on
`m4'.

   Automake and Libtool include a lot of M4 input files.  These are
macros provided with each package that you can use directly (or
indirectly) from your `configure.in'.  These packages don't invoke `m4'
themselves.

   If you have already installed Autoconf on your system, you may have
encountered problems due to its strict M4 requirements.  Autoconf
_demands_ to use GNU M4, mostly due to it exceeding limitations present
in other M4 implementations.  As noted by the Autoconf manual, this is
not an onerous requirement, as it only affects package maintainers who
must regenerate `configure' scripts.

   Autoconf's own `Makefile' will freeze some of the Autoconf `.m4'
files containing macros as it builds Autoconf.  When M4 freezes an
input file, it produces another file which represents the internal
state of the M4 processor so that the input file does not need to be
parsed again.  This helps to reduce the startup time for `autoconf'.


File: autobook.info,  Node: Fundamentals of M4 processing,  Next: Features of M4,  Prev: How GNU Autotools uses M4,  Up: M4

21.3 Fundamentals of M4 processing
==================================

When properly understood, M4 seems like child's play.  However, it is
common to learn M4 in a piecemeal fashion and to have an incomplete or
inaccurate understanding of certain concepts.  Ultimately, this leads to
hours of furious debugging.  It is important to understand the
fundamentals well before progressing to the details.

* Menu:

* Token scanning ::
* Macros and macro expansion ::
* Quoting ::


File: autobook.info,  Node: Token scanning,  Next: Macros and macro expansion,  Up: Fundamentals of M4 processing

21.3.1 Token scanning
---------------------

`m4' scans its input stream, generating (often, just copying) text to
the output stream.  The first step that `m4' performs in processing is
to recognize _tokens_.  There are three kinds of tokens:

Names
     A name is a sequence of characters that starts with a letter or an
     underscore and may be followed by additional letters, characters
     and underscores.  The end of a name is recognized by the
     occurrence a character which is not any of the permitted
     characters--for example, a period.  A name is always a candidate
     for macro expansion (*Note Macros and macro expansion::), whereby
     the name will be replaced in the output by a macro definition of
     the same name.

Quoted strings
     A sequence of characters may be _quoted_ (*Note Quoting::) with a
     starting quote at the beginning of the string and a terminating
     quote at the end.  The default M4 quote characters are ``' and
     `'', however Autoconf reassigns them to `[' and `]', respectively.
     Suffice to say, M4 will remove the quote characters and pass the
     inner string to the output (*Note Quoting::).

Other tokens
     All other tokens are those single characters which are not
     recognized as belonging to any of the other token types.  They are
     passed through to the output unaltered.

   Like most programming languages, M4 allows you to write comments in
the input which will be ignored.  Comments are delimited by the `#'
character and by the end of a line.  Comments in M4 differ from most
languages, though, in that the text within the comment, including
delimiters, is passed through to the output unaltered.  Although the
comment delimiting characters can be reassigned by the user, this is
highly discouraged, as it may break GNU Autotools macros which rely on
this fact to pass Bourne shell comment lines-which share the same
comment delimiters-through to the output unaffected.


File: autobook.info,  Node: Macros and macro expansion,  Next: Quoting,  Prev: Token scanning,  Up: Fundamentals of M4 processing

21.3.2 Macros and macro expansion
---------------------------------

Macros are definitions of replacement text and are identified by a
name--as defined by the syntax rules given in *Note Token scanning::.
M4 maintains an internal table of macros, some of which are built-ins
defined when `m4' starts.  When a name is found in the input that
matches a name registered in M4's macro table, the macro _invocation_
in the input is replaced by the macro's definition in the output.  This
process is known as _expansion_--even if the new text may be shorter!
Many beginners to M4 confuse themselves the moment they start to use
phrases like `I am going to call this particular macro, which returns
this value'.  As you will see, macros differ significantly from
_functions_ in other programming languages, regardless of how similar
their syntax may seem.  You should instead use phrases like `If I
invoke this macro, it will expand to this text'.

   Suppose M4 knows about a simple macro called `foo' that is defined
to be `bar'.  Given the following input, `m4' would produce the
corresponding output:

     That is one big foo.
     =>That is one big bar.

   The period character at the end of this sentence is not permitted in
macro names, thus `m4' knows when to stop scanning the `foo' token and
consult the table of macro definitions for a macro named `foo'.

   Curiously, macros are defined to `m4' using the built-in macro
`define'.  The example shown above would be defined to `m4' with the
following input:

     define(`foo', `bar')

   Since `define' is itself a macro, it too must have an expansion--by
definition, it is the empty string, or _void_.  Thus, `m4' will appear
to consume macro invocations like these from the input.  The ``' and
`'' characters are M4's default quote characters and play an important
role (*Note Quoting::).  Additional built-in macros exist for managing
macro definitions (*Note Macro management::).

   We've explored the simplest kind of macros that exist in M4.  To make
macros substantially more useful, M4 extends the concept to macros which
accept a number of arguments (1).  If a macro is given arguments, the
macro may address its arguments using the special macro names `$1'
through to `$n', where `n' is the maximum number of arguments that the
macro cares to reference.  When such a macro is invoked, the argument
list must be delimited by commas and enclosed in parentheses.  Any
whitespace that precedes an argument is discarded, but trailing
whitespace (for example, before the next comma) is preserved.  Here is
an example of a macro which expands to its third argument:

     define(`foo', `$3')
     That is one big foo(3, `0x', `beef').
     =>That is one big beef.

   Arguments in M4 are simply text, so they have no type.  If a macro
which accepts arguments is invoked, `m4' will expand the macro
regardless of how many arguments are provided.  M4 will not produce
errors due to conditions such as a mismatched number of arguments, or
arguments with malformed values/types.  It is the responsibility of the
macro to validate the argument list and this is an important practice
when writing GNU Autotools macros.  Some common M4 idioms have
developed for this purpose and are covered in *Note Conditionals::.  A
macro that expects arguments can still be invoked without
arguments--the number of arguments seen by the macro will be zero:

     This is still one big foo.
     =>That is one big .

   A macro invoked with an empty argument list is not empty at all, but
rather is considered to be a single empty string:

     This is one big empty foo().
     =>That is one big .

   It is also important to understand how macros are expanded.  It is
here that you will see why an M4 macro is not the same as a function in
any other programming language.  The explanation you've been reading
about macro expansion thus far is a little bit simplistic: macros are
not exactly matched in the input and expanded in the output.  In actual
fact, the macro's expansion replaces the invocation in the input stream
and it is _rescanned_ for further expansions until there are none
remaining.  Here is an illustrative example:

     define(`foobar', `FUBAR')
     define(`f', `foo')
     f()bar
     =>FUBAR

   If the token `a1' were to be found in the input, `m4' would replace
it with `a2' in the input stream and rescan.  This continues until no
definition can be found for `a4', at which point the literal text `a4'
will be sent to the output.  This is _by far the biggest point of
misunderstanding_ for new M4 users.

   The same principles apply for the collection of arguments to macros
which accept arguments.  Before a macro's actual arguments are handed to
the macro, they are expanded until there are no more expansions left.
Here is an example--using the built-in `define' macro (where the
problems are no different) which highlights the consequences of this.
Normally, `define' will redefine any existing macro:

     define(foo, bar)
     define(foo, baz)

   In this example, we expect `foo' to be defined to `bar' and then
redefined to `baz'.  Instead, we've defined a new macro `bar' that is
defined to be `baz'!  Why?  The second `define' invocation has its
arguments expanded prior to the expanding the `define' macro.  At this
stage, the name `foo' is expanded to its original definition, `bar'.
In effect, we've stated:

     define(foo, bar)
     define(bar, baz)

   Sometimes this can be a very useful property, but mostly it serves to
thoroughly confuse the GNU Autotools macro writer.  The key is to know
that `m4' will expand as much text as it can as early as possible in its
processing.  Expansion can be prevented by quoting (2) and is discussed
in detail in the following section.

   ---------- Footnotes ----------

   (1) GNU M4 permits an unlimited number of arguments, whereas other
versions of M4 limit the number of addressable arguments to nine.

   (2) Which is precisely what the ``' and `'' characters in all of the
examples in this section are.


File: autobook.info,  Node: Quoting,  Prev: Macros and macro expansion,  Up: Fundamentals of M4 processing

21.3.3 Quoting
--------------

It is been shown how `m4' expands macros when it encounters a name that
matches a defined macro in the input.  There are times, however, when
you wish to defer expansion.  Principally, there are three situations
when this is so:

Free-form text
     There may be free-form text that you wish to appear at the
     output-and as such, be unaltered by any macros that may be
     inadvertently invoked in the input.  It is not always possible to
     know if some particular name is defined as a macro, so it should
     be quoted.

Overcoming syntax rules
     Sometimes you may wish to form strings which would violate M4's
     syntax rules - for example, you might wish to use leading
     whitespace or a comma in a macro argument.  The solution is to
     quote the entire string.

Macro arguments
     This is the most common situation for quoting: when arguments to
     macros are to be taken literally and not expanded as the arguments
     are collected.  In the previous section, an example was given that
     demonstrates the effects of not quoting the first argument to
     `define'.  Quoting macro arguments is considered a good practice
     that you should emulate.

   Strings are quoted by surrounding the quoted text with the ``' and
`'' characters.  When `m4' encounters a quoted string-as a type of
token (*Note Token scanning::)-the quoted string is expanded to the
string itself, with the outermost quote characters removed.

   Here is an example of a string that is triple quoted:

     ```foo'''
     =>``foo''

   A more concrete example uses quoting to demonstrate how to prevent
unwanted expansion within macro definitions:

     define(`foo', ``bar'')dnl
     define(`bar', `zog')dnl
     foo
     =>bar

   When the macro `foo' is defined, `m4' strips off the outermost
quotes and registers the definition ``bar''.  The `dnl' text has a
special purpose, too, which will be covered in *Note Discarding input::.

   As the macro `foo' is expanded, the next pair of quote characters
are stripped off and the string is expanded to `bar'.  Since the
expansion of the quoted string is the string itself (minus the quote
characters), we have prevented unwanted expansion from the string `bar'
to `zog'.

   As mentioned in *Note Token scanning::, the default M4 quote
characters are ``' and `''.  Since these are two commonly used
characters in Bourne shell programming (1), Autoconf reassigns these to
the `[' and `]' characters-a symmetric looking pair of characters least
likely to cause problems when writing GNU Autotools macros.  From this
point forward, we shall use `[' and `]' as the quote characters and you
can forget about the default M4 quotes.

   Autoconf uses M4's built-in `changequote' macro to perform this
reassignment and, in fact, this built-in is still available to you.  In
recent years, the common practice when needing to use the quote
characters `[' or `]' or to quote a string with an legitimately
imbalanced number of the quote characters has been to invoke
`changequote' and temporarily reassign them around the affected area:

     dnl Uh-oh, we need to use the apostrophe! And even worse, we have two
     dnl opening quote marks and no closing quote marks.
     changequote(<<, >>)dnl
     perl -e 'print "$]\n";'
     changequote([, ])dnl

   This leads to a few potential problems, the least of which is that
it's easy to reassign the quote characters and then forget to reset
them, leading to total chaos!  Moreover, it is possible to entirely
disable M4's quoting mechanism by blindly changing the quote characters
to a pair of empty strings.

   In hindsight, the overwhelming conclusion is that using
`changequote' within the GNU Autotools framework is a bad idea.
Instead, leave the quote characters assigned as `[' and `]' and use the
special strings `@<:@' and `@:>@' anywhere you want real square
brackets to appear in your output.  This is an easy practice to adopt,
because it's faster and less error prone than using `changequote':

     perl -e 'print "$@:>@\n";'

   This, and other guidelines for using M4 in the GNU Autotools
framework are covered in detail in *Note Writing macros within the GNU
Autotools framework::.

   ---------- Footnotes ----------

   (1) The ``' is used in grave redirection and `'' for the shell's own
quote character!


File: autobook.info,  Node: Features of M4,  Next: Writing macros within the GNU Autotools framework,  Prev: Fundamentals of M4 processing,  Up: M4

21.4 Features of M4
===================

M4 includes a number of pre-defined macros that make it a powerful
preprocessor.  We will take a tour of the most important features
provided by these macros.  Although some of these features are not very
relevant to GNU Autotools users, Autoconf is implemented using most of
them.  For this reason, it is useful to understand the features to
better understand Autoconf's behavior and for debugging your own
`configure' scripts.

* Menu:

* Discarding input ::
* Macro management ::
* Conditionals ::
* Looping ::
* Diversions ::
* Including files ::


File: autobook.info,  Node: Discarding input,  Next: Macro management,  Up: Features of M4

21.4.1 Discarding input
-----------------------

A macro called `dnl' discards text from the input.  The `dnl' macro
takes no arguments and expands to the empty string, but it has the side
effect of discarding all input up to and including the next newline
character.  Here is an example of `dnl' from the Autoconf source code:

     # AC_LANG_POP
     # -----------
     # Restore the previous language.
     define([AC_LANG_POP],
     [popdef([_AC_LANG])dnl
     ifelse(_AC_LANG, [_AC_LANG],
             [AC_FATAL([too many $0])])dnl
     AC_LANG(_AC_LANG)])

   It is important to remember `dnl''s behavior: it discards the
newline character, which can have unexpected effects on generated
`configure' scripts!  If you want a newline to appear in the output,
you must add an extra blank line to compensate.

   `dnl' need not appear in the first column of a given line - it will
begin discarding input at any point that it is invoked in the input
file.  However, be aware of the newline eating problem again!  In the
example of `AC_TRY_LINK_FUNC' above, note the deliberate use of `dnl'
to remove surplus newline characters.

   In general, `dnl' makes sense for macro invocations that appear on a
single line, where you would expect the whole line to simply vanish
from the output.  In the following subsections, `dnl' will be used to
illustrate where it makes sense to use it.


File: autobook.info,  Node: Macro management,  Next: Conditionals,  Prev: Discarding input,  Up: Features of M4

21.4.2 Macro management
-----------------------

A number of built-in macros exist in M4 to manage macros.  We shall
examine the most common ones that you're likely to encounter.  There
are others and you should consult the GNU M4 manual for further
information.

   The most obvious one is `define', which defines a macro.  It expands
to the empty string:

     define([foo], [bar])dnl
     define([combine], [$1 and $2])dnl

   It is worth highlighting again the liberal use of quoting.  We wish
to define a pair of macros whose names are _literally_ `foo' and
`combine'.  If another macro had been previously defined with either of
these names, `m4' would have expanded the macro immediately and passed
the expansion of `foo' to `define', giving unexpected results.

   The `undefine' macro will remove a macro's definition from M4's
macro table.  It also expands to the empty string:

     undefine([foo])dnl
     undefine([combine])dnl

   Recall that once removed from the macro table, unmatched text will
once more be passed through to the output.

   The `defn' macro expands to the definition of a macro, named by the
single argument to `defn'.  It is quoted, so that it can be used as the
body of a new, renamed macro:

     define([newbie], defn([foo]))dnl
     undefine([foo])dnl

   The `ifdef' macro can be used to determine if a macro name has an
existing definition.  If it does exist, `ifdef' expands to the second
argument, otherwise it expands to the third:

     ifdef([foo], [yes], [no])dnl

   Again, `yes' and `no' have been quoted to prevent expansion due to
any pre-existing macros with those names.  _Always_ consider this a
real possibility!

   Finally, a word about built-in macros: these macros are all defined
for you when `m4' is started.  One common problem with these macros is
that they are not in any kind of name space, so it's easier to
accidentally invoke them or want to define a macro with an existing
name.  One solution is to use the `define' and `defn' combination shown
above to rename all of the macros, one by one.  This is how Autoconf
makes the distinction clear.


File: autobook.info,  Node: Conditionals,  Next: Looping,  Prev: Macro management,  Up: Features of M4

21.4.3 Conditionals
-------------------

Macros which can expand to different strings based on runtime tests are
extremely useful-they are used extensively throughout macros in GNU
Autotools and third party macros.  The macro that we will examine
closely is `ifelse'.  This macro compares two strings and expands to a
different string based on the result of the comparison.  The first form
of `ifelse' is akin to the `if'/`then'/`else' construct in other
programming languages:

     ifelse(string1, string2, equal, not-equal)

   The other form is unusual to a beginner because it actually
resembles a `case' statement from other programming languages:

     ifelse(string1, string2, equala, string3, string4, equalb, default)

   If `string1' and `string2' are equal, this macro expands to
`equala'.  If they are not equal, `m4' will shift the argument list
three positions to the left and try again:

     ifelse(string3, string4, equalb, default)

   If `string3' and `string4' are equal, this macro expands to
`equalb'.  If they are not equal, it expands to `default'.  The number
of cases that may be in the argument list is unbounded.

   As it has been mentioned in *Note Macros and macro expansion::,
macros that accept arguments may access their arguments through
specially named macros like `$1'.  If a macro has been defined, no
checking of argument counts is performed before it is expanded and the
macro may examine the number of arguments given through the `$#' macro.
This has a useful result: you may invoke a macro with too few (or too
many) arguments and the macro will still be expanded.  In the example
below, `$2' will expand to the empty string.

     define([foo], [$1 and $2])dnl
     foo([a])
     =>a and

   This is useful because `m4' will expand the macro and give the macro
the opportunity to test each argument for the empty string.  In effect,
we have the equivalent of default arguments from other programming
languages.  The macro can use `ifelse' to provide a default value if,
say, `$2' is the empty string.  You will notice in much of the
documentation for existing Autoconf macros that arguments may be left
blank to accept the default value.  This is an important idiom that you
should practice in your own macros.

   In this example, we wish to accept the default shell code fragment
for the case where `/etc/passwd' is found in the build system's file
system, but output `Big trouble!' if it is not.

     AC_CHECK_FILE([/etc/passwd], [], [echo "Big trouble!"])


File: autobook.info,  Node: Looping,  Next: Diversions,  Prev: Conditionals,  Up: Features of M4

21.4.4 Looping
--------------

There is no support in M4 for doing traditional iterations (ie.
`for-do' loops), however macros may invoke themselves.  Thus, it is
possible to iterate using recursion.  The recursive definition can use
conditionals (*Note Conditionals::) to terminate the loop at its
completion by providing a trivial case.  The GNU M4 manual provides
some clever recursive definitions, including a definition for a
`forloop' macro that emulates a `for-do' loop.

   It is conceivable that you might wish to use these M4 constructs when
writing macros to generate large amounts of in-line shell code or
arbitrarily nested `if; then; fi' statements.


File: autobook.info,  Node: Diversions,  Next: Including files,  Prev: Looping,  Up: Features of M4

21.4.5 Diversions
-----------------

Diversions are a facility in M4 for diverting text from the input stream
into a holding buffer.  There is a large number of diversion buffers in
GNU M4, limited only by available memory.  Text can be diverted into
any one of these buffers and then `undiverted' back to the output
(diversion number 0) at a later stage.

   Text is diverted and undiverted using the `divert' and `undivert'
macros.  They expand to the empty string, with the side effect of
setting the diversion.  Here is an illustrative example:

     divert(1)dnl
     This goes at the end.
     divert(0)dnl
     This goes at the beginning.
     undivert(1)dnl
     =>This goes at the beginning.
     =>This goes at the end.

   It is unlikely that you will want to use diversions in your own
macros, and it is difficult to do reliably without understanding the
internals of Autoconf.  However, it is interesting to note that this is
how `autoconf' generates fragments of shell code on-the-fly that must
precede shell code at the current point in the `configure' script.


File: autobook.info,  Node: Including files,  Prev: Diversions,  Up: Features of M4

21.4.6 Including files
----------------------

M4 permits you to include files into the input stream using the
`include' and `sinclude' macros.  They simply expand to the contents of
the named file.  Of course, the expansion will be rescanned as the
normal rules dictate (*Note Fundamentals of M4 processing::).

   The difference between `include' and `sinclude' is subtle: if the
filename given as an argument to `include' is not present, an error
will be raised.  The `sinclude' macro will instead expand to the empty
string--presumably the `s' stands for `silent'.

   Older GNU Autotools macros that tried to be modular would use the
`include' and `sinclude' macros to import libraries of macros from
other sources.  While this is still a workable mechanism, there is an
active effort within the GNU Autotools development community to improve
the packaging system for macros.  An `--install' option is being
developed to improve the mechanism for importing macros from a library.


File: autobook.info,  Node: Writing macros within the GNU Autotools framework,  Prev: Features of M4,  Up: M4

21.5 Writing macros within the GNU Autotools framework
======================================================

With a good grasp of M4 concepts, we may turn our attention to applying
these principles to writing `configure.in' files and new `.m4' macro
files.  There are some differences between writing generic M4 input
files and macros within the GNU Autotools framework and these will be
covered in this section, along with some useful hints on working within
the framework.  This section ties in closely with *Note Writing New
Macros for Autoconf::.

   Now that you are familiar with the capabilities of M4, you can
forget about the names of the built-in M4 macros-they should be avoided
in the GNU Autotools framework.  Where appropriate, the framework
provides a collection of macros that are laid on top of the M4
built-ins.  For instance, the macros in the `AC_' family are just
regular M4 macros that take a number of arguments and rely on an
extensive library of `AC_' support macros.

* Menu:

* Syntactic conventions ::
* Debugging with M4 ::


File: autobook.info,  Node: Syntactic conventions,  Next: Debugging with M4,  Up: Writing macros within the GNU Autotools framework

21.5.1 Syntactic conventions
----------------------------

Some conventions have grown over the life of the GNU Autotools, mostly
as a disciplined way of avoiding M4 pitfalls.  These conventions are
designed to make your macros more robust, your code easier to read and,
most importantly, improve your chances for getting things to work the
first time!  A brief list of recommended conventions appears below:

   - Do not use the M4 built-in `changequote'.  Any good macro will
     already perform sufficient quoting.

   - Never use the argument macros (e.g. `$1') within shell comments and
     dnl remarks.  If such a comment were to be placed within a macro
     definition, M4 will expand the argument macros leading to strange
     results.  Instead, quote the argument number to prevent unwanted
     expansion.  For instance, you would use `$[1]' in the comment.

   - Quote the M4 comment character, `#'.  This can appear often in
     shell code fragments and can have undesirable effects if M4
     ignores any expansions in the text between the `#' and the next
     newline.

   - In general, macros invoked from `configure.in' should be placed one
     per line.  Many of the GNU Autotools macros conclude their
     definitions with a `dnl' to prevent unwanted whitespace from
     accumulating in `configure'.

   - Many of the `AC_' macros, and others which emulate their good
     behavior, permit default values for unspecified arguments.  It is
     considered good style to explicitly show your intention to use an
     empty argument by using a pair of quotes, such as `[]'.

   - Always quote the names of macros used within the definitions of
     other macros.

   - When writing new macros, generate a small `configure.in' that uses
     (and abuses!) the macro--particularly with respect to quoting.
     Generate a `configure' script with `autoconf' and inspect the
     results.


File: autobook.info,  Node: Debugging with M4,  Prev: Syntactic conventions,  Up: Writing macros within the GNU Autotools framework

21.5.2 Debugging with M4
------------------------

After writing a new macro or a `configure.in' template, the generated
`configure' script may not contain what you expect.  Frequently this is
due to a problem in quoting (*note Quoting::), but the interactions
between macros can be complex.  When you consider that the arguments to
GNU Autotools macros are often shell scripts, things can get rather
hairy.  A number of techniques exist for helping you to debug these
kinds of problems.

   Expansion problems due to over-quoting and under-quoting can be
difficult to pinpoint.  Autoconf half-heartedly tries to detect this
condition by scanning the generated `configure' script for any
remaining invocations of the `AC_' and `AM_' families of macros.
However, this only works for the `AC_' and `AM_' macros and not for
third party macros.

   M4 provides a comprehensive facility for tracing expansions.  This
makes it possible to see how macro arguments are expanded and how a
macro is finally expanded.  Often, this can be half the battle in
discovering if the macro definition or the invocation is at fault.
Autoconf 2.15 will include this tracing mechanism.  To trace the
generation of `configure', Autoconf can be invoked like so:

     $ autoconf --trace=AC_PROG_CC

   Autoconf provides fine control over which macros are traced and the
format of the trace output.  You should refer to the Autoconf manual for
further details.

   GNU `m4' also provides a debugging mode that can be helpful in
discovering problems such as infinite recursion.  This mode is activated
with the `-d' option.  In order to pass options to `m4', invoke
Autoconf like so:

     $ M4='m4 -dV' autoconf

   Another situation that can arise is the presence of shell syntax
errors in the generated `configure' script.  These errors are usually
obvious, as the shell will abort `configure' when the syntax error is
encountered.  The task of then locating the troublesome shell code in
the input files can be potentially quite difficult.  If the erroneous
shell code appears in `configure.in', it should be easy to
spot-presumably because you wrote it recently!  If the code is imported
from a third party macro, though, it may only be present because you
invoked that macro.  A trick to help locate these kinds of errors is to
place some magic text (`__MAGIC__') throughout `configure.in':

     AC_INIT
     AC_PROG_CC
     __MAGIC__
     MY_SUSPECT_MACRO
     __MAGIC__
     AC_OUTPUT(Makefile)

   After `autoconf' has generated `configure', you can search through
it for the magic text to determine the extremities of the suspect
macro.  If your erroneous code appears within the magic text markers,
you've found the culprit!  Don't be afraid to hack up `configure'.  It
can easily be regenerated.

   Finally, due to an error on your part, `m4' may generate a
`configure' script that contains semantic errors.  Something as simple
as inverted logic may lead to a nonsense test result:

     checking for /etc/passwd... no

   Semantic errors of this kind are usually easy to solve once you can
spot them.  A fast and simple way of tracing the shell execution is to
use the shell's `-x' and `-v' options to turn on its own tracing.  This
can be done by explicitly placing the required `set' commands into
`configure.in':

     AC_INIT
     AC_PROG_CC
     set -x -v
     MY_BROKEN_MACRO
     set +x +v
     AC_OUTPUT(Makefile)

   This kind of tracing is invaluable in debugging shell code containing
semantic errors.


File: autobook.info,  Node: Writing Portable Bourne Shell,  Next: Writing New Macros for Autoconf,  Prev: M4,  Up: Top

22 Writing Portable Bourne Shell
********************************

This chapter is a whistle stop tour of the accumulated wisdom of the
free software community, with respect to best practices for portable
shell scripting, as encoded in the sources for Autoconf and Libtool, as
interpreted and filtered by me.  It is by no means comprehensive -
entire books have been devoted to the subject - though it is, I hope,
authoritative.

* Menu:

* Why Use the Bourne Shell?::
* Sh Implementation::
* Environment::
* Utilities::


File: autobook.info,  Node: Why Use the Bourne Shell?,  Next: Sh Implementation,  Up: Writing Portable Bourne Shell

22.1 Why Use the Bourne Shell?
==============================

Unix has been around for more than thirty years and has splintered into
hundreds of small and not so small variants, *Note The Diversity of
Unix Systems: Unix Diversity.  Much of the subject matter of this book
is concerned with how best to approach writing programs which will work
on as many of these variants as possible.  One of the few programming
tools that is absolutely guaranteed to be present on every flavour of
Unix in use today is Steve Bourne's original shell, `sh' - the Bourne
Shell.  That is why Libtool is written as a Bourne Shell script, and why
the `configure' files generated by Autoconf are Bourne Shell scripts:
they can be executed on all known Unix flavours, and as a bonus on most
POSIX based non-Unix operating systems too.

   However, there are complications.  Over the years, OS vendors have
improved Steve Bourne's original shell or have reimplemented it in an
almost, but not quite, compatible way.  There also a great number of
Bourne compatible shells which are often used as a system's default
`/bin/sh': `ash', `bash', `bsh', `ksh', `sh5' and `zsh' are some that
you may come across.  For the rest of this chapter, when I say `shell',
I mean a Bourne compatible shell.

   This leads us to the black art known as "portable shell
programming", the art of writing a single script which will run
correctly through all of these varying implementations of `/bin/sh'.
Of course, Unix systems are constantly evolving and new variations are
being introduced all the time (and very old systems which have fallen
into disuse can perhaps be ignored by the pragmatic).  The amount of
system knowledge required to write a truly portable shell script is
vast, and a great deal of the information that sets a precedent for a
given idiom is necessarily second or third (or tenth) hand.
Practically, this means that some of the knowledge accumulated in
popular portable shell scripts is very probably folklore - but that
doesn't really matter too much, the important thing is that if you
adhere to these idioms, you shouldn't have any problems from people who
can't run your program on their system.


File: autobook.info,  Node: Sh Implementation,  Next: Environment,  Prev: Why Use the Bourne Shell?,  Up: Writing Portable Bourne Shell

22.2 Implementation
===================

By their very nature, a sizeable part of the functionality of shell
scripts, is provided by the many utility programs that they routinely
call to perform important subsidiary tasks.  Addressing the portability
of the script involves issues of portability in the host operating
system environment, and portability of the utility programs as well as
the portability of the shell implementation itself.

   This section discusses differences between shell implementations to
which you must cater when writing a portable script.  It is broken into
several subsections, each covering a single aspect of shell programming
that needs to be approached carefully to avoid pitfalls with unexpected
behaviour in some shell implementations.  The following section
discusses how to cope with the host environment in a portable fashion.
The last section in this chapter addresses the portability of common
shell utilities.

* Menu:

* Size Limitations::
* Magic Numbers::
* Colon::
* Functions::
* Source::
* Test::
* Variables::
* Pattern Matching::


File: autobook.info,  Node: Size Limitations,  Next: Magic Numbers,  Up: Sh Implementation

22.2.1 Size Limitations
-----------------------

Quite a lot of the Unix vendor implementations of the Bourne shell have
a fixed buffer for storing command lines, as small as 512 characters in
the worst cases.  You may have an error akin to this:

     $ ls -d /usr/bin/* | wc -l
     sh: error: line too long

   Notice that the limit applies to the _expanded_ command line, not
just the characters typed in for the line.  A portable way to write this
would be:

     $ ( cd /usr/bin && ls | wc -l )
        1556


File: autobook.info,  Node: Magic Numbers,  Next: Colon,  Prev: Size Limitations,  Up: Sh Implementation

22.2.2 #!
---------

When the kernel executes a program from the file system, it checks the
first few bytes of the file, and compares them with its internal list of
known "magic numbers", which encode how the file can be executed.  This
is a similar, but distinct, system to the `/etc/magic' magic number
list used by user space programs.

   Having determined that the file is a script by examining its magic
number, the kernel finds the path of the interpreter by removing the
`#!' and any intervening space from the first line of the script.  One
optional argument is allowed (additional arguments are not ignored,
they constitute a syntax error), and the resulting command line is
executed.  There is a 32 character limit to the significant part of the
`#!' line, so you must ensure that the full path to the interpreter
plus any switches you need to pass to it do not exceed this limit.
Also, the interpreter must be a real binary program, it cannot be a
`#!' file itself.

   It used to be thought, that the semantics between different kernels'
idea of the magic number for the start of an interpreted script varied
slightly between implementations.  In actual fact, all look for `#!' in
the first two bytes - in spite of commonly held beliefs, there is no
evidence that there are others which require `#! /'.

   A portable script must give an absolute path to the interpreter,
which causes problems when, say, some machines have a better version of
Bourne shell in an unusual directory - say `/usr/sysv/bin/sh'.  See
*Note (): Functions. for a way to re-execute the script with a better
interpreter.

   For example, imagine a script file called `/tmp/foo.pl' with the
following first line:

     #! /usr/local/bin/perl

Now, the script can be executed from the `tmp' directory, with the
following sequence of commands:

     $ cd /tmp
     $ ./foo.pl

When executing these commands, the kernel will actually execute the
following from the `/tmp' directory directory:

     /usr/local/bin/perl ./foo.pl

   This can pose problems of its own though.  A script such as the one
described above will not work on a machine where the perl interpreter is
installed as `/usr/bin/perl'.  There is a way to circumvent this
problem, by using the `env' program to find the interpreter by looking
in the user's `PATH' environment variable.  Change the first line of
the `foo.pl' to read as follows:

     #! /usr/bin/env perl

This idiom does rely on the `env' command being installed as
`/usr/bin/env', and that, in this example, `perl' can be found in the
user's `PATH'.  But that is indeed the case on the great majority of
machines.  In contrast, perl is installed in `usr/local/bin' as often
as `/usr/bin', so using `env' like this is a net win overall.  You can
also use this method to get around the 32 character limit if the path
to the interpreter is too long.

   Unfortunately, you lose the ability to pass an option flag to the
interpreter if you choose to use `env'.  For example, you can't do the
following, since it requires two arguments:

     #! /usr/bin/env guile -s


File: autobook.info,  Node: Colon,  Next: Functions,  Prev: Magic Numbers,  Up: Sh Implementation

22.2.3 :
--------

In the beginning, the magic number for Bourne shell scripts used to be a
colon followed by a newline.  Most Unices still support this, and will
correctly pass a file with a single colon as its first line to
`/bin/sh' for interpretation.  Nobody uses this any more and I suspect
some very new Unices may have forgotten about it entirely, so you
should stick to the more usual `#! /bin/sh' syntax for your own
scripts.  You may occasionally come across a very old script that starts
with a `:' though, and it is nice to know why!

   In addition, all known Bourne compatible shells have a builtin
command, `:' which always returns success.  It is equivalent to the
system command `/bin/true', but can be used from a script without the
overhead of starting another process.  When setting a shell variable as
a flag, it is good practice to use the commands, `:' and `false' as
values, and choose the sense of the variable to be `:' in the common
case:  When you come to test the value of the variable, you will avoid
the overhead of additional processes most of the time.

     var=:
     if $var; then
       foo
     fi

   The `:' command described above can take any number of arguments,
which it will fastidiously ignore.  This allows the `:' character to
double up as a comment leader of sorts.  Be aware that the characters
that follow are not discarded, they are still interpreted by the shell,
so metacharacters can have unexpected effects:

     $ cat foo
     :
     : echo foo
     : `echo bar`
     : `echo baz >&2'
     $ ./foo
     baz

   You may find very old shell scripts that are commented using `:', or
new scripts that exploit this behavior in some esoteric fashion.  My
advice is, don't:  It will bite you later.


File: autobook.info,  Node: Functions,  Next: Source,  Prev: Colon,  Up: Sh Implementation

22.2.4 ()
---------

There are still a great number of shells that, like Steve Bourne's
original implementation, do not have functions! So, strictly speaking,
you can't use shell functions in your scripts.  Luckily, in this day and
age, even though `/bin/sh' itself may not support shell functions, it
is not too far from the truth to say that almost every machine will
have _some_ shell that does.

   Taking this assumption to its logical conclusion, it is a simple
matter of writing your script to find a suitable shell, and then feed
itself to that shell so that the rest of the script can use functions
with impunity:

     #! /bin/sh

     # Zsh is not Bourne compatible without the following:
     if test -n "$ZSH_VERSION"; then
       emulate sh
       NULLCMD=:
     fi

     # Bash is not POSIX compliant without the following:
     test -n "$BASH_VERSION" && set -o posix

     SHELL="${SHELL-/bin/sh}"
     if test x"$1" = x--re-executed; then
       # Functional shell was found.  Remove option and continue
       shift
     elif "$SHELL" -c 'foo () { exit 0; }; foo' 2>/dev/null; then
       # The current shell works already!
       :
     else
       # Try alternative shells that (sometimes) support functions
       for cmd in sh bash ash bsh ksh zsh sh5; do
         set IFS=:; X="$PATH:/bin:/usr/bin:/usr/afsws/bin:/usr/ucb"; echo $X`
         for dir
           shell="$dir/$cmd"
           if (test -f "$shell" || test -f "$shell.exe") &&
             "$shell" -c 'foo () { exit 0; }; foo 2>/dev/null
           then
             # Re-execute with discovered functional shell
             SHELL="$shell" exec "$shell" "$0" --re-executed ${1+"$@"}
           fi
         done
       done
       echo "Unable to locate a shell interpreter with function support" >&2
       exit 1
     fi

     foo () {
         echo "$SHELL: ta da!"
     }

     foo

     exit 0

Note that this script finds a shell that supports functions of the
following syntax, since the use of the `function' keyword is much less
widely supported:

     foo () { ... }

   A notable exception to the assertion that all machines have a shell
that can handle functions is 4.3BSD, which has only a single shell: a
shell function deprived Bourne shell.  There are two ways you can deal
with this:

  1. Ask 4.3BSD users of your script to install a more featureful shell
     such as bash, so that the technique above will work.

  2. Have your script run itself through `sed', chopping itself into
     pieces, with each function written to it's own script file, and
     then feed what's left into the original shell.  Whenever a
     function call is encountered, one of the fragments from the
     original script will be executed in a subshell.

   If you decide to split the script with `sed', you will need to be
careful not to rely on shell variables to communicate between
functions, since each `function' will be executed in its own subshell.


File: autobook.info,  Node: Source,  Next: Test,  Prev: Functions,  Up: Sh Implementation

22.2.5 .
--------

The semantics of `.' are rather peculiar to say the least.  Here is a
simple script - it just displays its positional parameters:

     #! /bin/sh
     echo "$0" ${1+"$@"}

Put this in a file, `foo'.  Here is another simple script - it calls
the first script.  Put this in another file, `wrapper':

     #! /bin/sh
     . ./foo
     . ./foo bar baz

Observe what happens when you run this from the command line:

     $ ./wrapper
     ./wrapper
     ./wrapper bar baz

So `$0' is inherited from the calling script, and the positional
parameters are as passed to the command.  Observe what happens when you
call the wrapper script with arguments:

     $ ./wrapper 1 2 3
     ./wrapper 1 2 3
     ./wrapper bar baz

So the sourced script has access to the calling scripts positional
parameters, _unless you override them in the `.' command_.

   This can cause no end of trouble if you are not expecting it, so you
must either be careful to omit all parameters to any `.' command, or
else don't reference the parameters inside the sourced script.  If you
are reexecuting your script with a shell that understands functions,
the best use for the `.' command is to load libraries of functions
which can subsequently be used in the calling script.

   Most importantly, don't forget that, if you call the `exit' command
in a script that you load with `.', it will cause the calling script to
exit too!


File: autobook.info,  Node: Test,  Next: Variables,  Prev: Source,  Up: Sh Implementation

22.2.6 [
--------

Although technically equivalent, `test' is preferable to `[' in shell
code written in conjunction with Autoconf, since `[' is also used for
M4 quoting in Autoconf.  Your code will be much easier to read (and
write) if you abstain from the use of `['.

   Except in the most degenerate shells, `test' is a shell builtin to
save the overhead of starting another process, and is no slower than
`['.  It does mean, however, that there is a huge range of features
which are not implemented often enough that you can use them freely
within a truly portable script.   The less obvious ones to avoid are
`-a' and `-o' - the logical `and' and `or' operations.  A good litmus
test for the portability of any shell feature is to see whether that
feature is used in the source of Autoconf, and it turns out that `-a'
and `-o' _are_ used here and there, but never more than once in a
single command.  All the same, to avoid any confusion, I always avoid
them entirely.  I would not use the following, for example:

     test foo -a bar

Instead I would run test twice, like this:

     test foo && test bar

   The negation operator of `test' is quite portable and can be used in
portable shell scripts.  For example:

     if test ! foo; then bar; fi

   The negation operator of `if' is not at all portable and should be
avoided.  The following would generate a syntax error on some shell
implementations:

     if ! test foo; then bar; fi

   An implication of this axiom is that when you need to branch if a
command fails, and that command is not `test', you cannot use the
negation operator.  The easiest way to work around this is to use the
`else' clause of the un-negated `if', like this:

     if foo; then :; else bar; fi

Notice the use of the `:' builtin as a null operation when `foo'
doesn't fail.

   The `test' command does not cope with missing or additional
arguments, so you must take care to ensure that the shell does not
remove arguments or introduce new ones during variable and quote
expansions.  The best way to do that is to enclose any variables in
double quotes.  You should also add a single character prefix to both
sides in case the value of the expansion is a valid option to `test':

     $ for foo in "" "!" "bar" "baz quux"; do
     >   test x"$foo" = x"bar" && echo 1 || echo 0
     > done
     0
     0
     1
     0

Here, you can see that using the `x' prefix for the first operand saves
`test' from interpreting the `!' argument as a real option, or from
choking on an empty string - something you must always be aware of, or
else the following behaviour will ensue:

     $ foo=!
     $ test "$foo" = "bar" && echo 1 || echo 0
     test: argument expected
     0
     $ foo=""
     $ test "$foo" = "bar" && echo 1 || echo 0
     test: argument expected
     0

Also, the double quote marks help `test' cope with strings that contain
whitespace.  Without the double quotes, you will see this errors:

     $ foo="baz quux"
     $ test x$foo = "bar" && echo 1 || echo 0
     test: too many arguments
     0

   You shouldn't rely on the default behaviour of test (to return `true'
if its single argument has non-zero length), use the `-n' option to
force that behaviour if it is what you want.  Beyond that, the other
thing you need to know about `test', is that if you use operators other
than those below, you are reducing the portability of your code:

`-n' STRING
     STRING is non-empty.

`-z' STRING
     STRING is empty.

STRING1 = STRING2
     Both strings are identical.

STRING1 != STRING2
     The strings are not the same.

`-d' FILE
     FILE exists and is a directory.

`-f' FILE
     FILE exists and is a regular file.

   You can also use the following, provided that you don't mix them
within a single invocation of `test':

EXPRESSION `-a' EXPRESSION
     Both expressions evaluate to `true'.

EXPRESSION `-o' EXPRESSION
     Neither expression evaluates to `false'.


File: autobook.info,  Node: Variables,  Next: Pattern Matching,  Prev: Test,  Up: Sh Implementation

22.2.7 $
--------

When using shell variables in your portable scripts, you need to write
them in a somewhat stylised fashion to maximise the number of shell
implementations that will interpret your code as expected:
   * Convenient though it is, the POSIX `$(command parameters)' syntax
     for command substitution is not remotely portable.  Despite it
     being more difficult to nest, you must use ``command parameters`'
     instead.

   * The most portable way to set a default value for a shell variable
     is:

          $ echo ${no_such_var-"default value"}
          default value

     If there is any whitespace in the default value, as there is here,
     you must be careful to quote the entire value, since some shells
     will raise an error:

          $ echo ${no_such_var-default value}
          sh: bad substitution

   * The `unset' command is not available in many of the degenerate
     Bourne shell implementations.  Generally, it is not too difficult
     to get by without it, but following the logic that led to the
     shell script in *Note (): Functions, it would be trivial to extend
     the test case for confirming a shell's suitability to include a
     check for `unset'.  Although it has not been put to the test, the
     theory is that all the interesting machines in use today have
     _some_ shell that supports `unset'.

   * Be religious about double quoting variable expansions.  Using
     `"$foo"' will avoid trouble with unexpected spaces in filenames,
     and compression of all whitespace to a single space in unquoted
     variable expansions.

   * To avoid accidental interpretation of variable expansions as
     command options you can use the following technique:

          $ foo=-n
          $ echo $foo
          $ echo x"$foo" | sed -e 's/^x//'
          -n

   * If it is set, `IFS' splits words on whitespace by default.  If you
     change it, be sure to put it back when you're done, or the shell
     may behave very strangely from that point.  For example, when you
     need to examine each element of `$PATH' in turn:

          # The whitespace at the end of the following line is a space
          # followed by literal tab and newline characters.
          save_IFS="${IFS=
          }"; IFS=":"
          set dummy $PATH
          IFS="$save_IFS"
          shift

     Alternatively, you can take advantage of the fact that command
     substitutions occur in a separate subshell, and do not corrupt the
     environment of the calling shell:

          set dummy `IFS=:; echo $PATH`
          shift

     Strictly speaking, the `dummy' argument is required to stop the
     `set' command from interpreting the first word of the expanded
     backquote expression as a command option.  Realistically, no one
     is going to have `-x', for example, as the first element of their
     `PATH' variable, so the `dummy' could be omitted - as I did
     earlier in the script in *Note (): Functions.

   * Some shells expand `$@' to the empty string, even when there are
     no actual parameters (`$#' is 0).  If you need to replicate the
     parameters that were passed to the executing script, when feeding
     the script to a more suitable interpreter for example, you must
     use the following:

          ${1+"$@"}

     Similarly, although all known shells do correctly use `$@' as the
     default argument to a `for' command, you must write it like this:

          for arg
          do
            stuff
          done

     When you rely on implicit `$@' like this, it is important to write
     the `do' keyword on a separate line.  Some degenerate shells can
     not parse the following:

          for arg; do
            stuff
          done


File: autobook.info,  Node: Pattern Matching,  Prev: Variables,  Up: Sh Implementation

22.2.8 * versus .*
------------------

This section compares "file globbing" with "regular expression
matching".   There are many Unix commands which are regularly used from
shell scripts, and which provide some sort of pattern matching
mechanism: `expr', `egrep' and `sed', to name a few.  Unfortunately
they each have different quoting rules regarding whether particular
meta-characters must be backslash escaped to revert to their literal
meaning and vice-versa.  There is no real logic to the particular
dialect of regular expressions accepted by these commands.  To confirm
the correctness of each regular expression, you should always check
them from the shell prompt with the relevant tool before committing to
a script, so I won't belabour the specifics.

   Shell globbing however is much more regular (no pun intended), and
provides a reasonable and sometimes more cpu efficient solution to many
shell matching problems.  The key is to make good use of the `case'
command, which is easier to use (because it uses globbing rules) and
doesn't require additional processes to be spawned.  Unfortunately, GNU
Bash doesn't handle backslashes correctly in glob character classes -
the backslash must be the first character in the class, or else it will
never match.  For example, if you want to detect absolute directory
paths on Unix and Windows using `case', you should write the code like
this:

     case $dir in
       [\\/]* | ?:[\\/]* ) echo absolute ;;
       * )                 echo relative ;;
     esac

   Even though `expr' uses regular expressions rather than shell
globbing, it is often(1) a shell builtin, so using it to extract
sections of strings can be faster than spawning a sed process to do the
same.  As with `echo' and `set', for example, you must be careful that
variable or command expansions for the first argument to `expr' are not
accidentally interpreted as reserved keywords.   As with `echo', you
can work around this problem by prefixing any expansions with a literal
`x', as follows:

     $ foo=substr
     $ expr $foo : '.*\(str\)'
     expr: syntax error
     $ expr x$foo : '.*\(str\)'
     str

   ---------- Footnotes ----------

   (1) Notable exceptions are GNU Bash, and both Ksh and the Bourne
shell on Solaris.


File: autobook.info,  Node: Environment,  Next: Utilities,  Prev: Sh Implementation,  Up: Writing Portable Bourne Shell

22.3 Environment
================

In addition to the problems with portability in shell implementations
discussed in the previous section, the behaviour of the shell can also
be drastically affected by the contents of certain environment
variables, and the operating environment provided by the host machine.

   It is important to be aware of the behavior of some of the operating
systems within which your shell script might run.  Although not directly
related to the implementation of the shell interpreter, the
characteristics of some of target architectures do influence what is
considered to be portable.  To ensure your script will work on as many
shell implementations as possible, you must observe the following
points.

   SCO Unix doesn't like `LANG=C' and friends, but without
`LC_MESSAGES=C', Solaris will translate variable values in `set'!
Similarly, without `LC_CTYPE=C', compiled C code can behave
unexpectedly.  The trick is to set the values to `C', except for if
they are not already set at all:

     for var in LANG LC_ALL LC_MESSAGES LC_CTYPES LANGUAGES
     do
       if eval test x"\${$var+set}" = xset; then
         eval $var=C; eval export $var
       fi
     done

   HP-UX `ksh' and all POSIX shells print the target directory to
standard output if `CDPATH' is set.

     if test x"${CDPATH+set}" = xset; then CDPATH=:; export CDPATH; fi

   The target architecture file system may impose limits on your
scripts.  IF you want your scripts to run on the architectures which
impose these limits, then your script must adhere to these limits:

   * The ISO9660 filesystem, as used on most CD-ROMs, limits nesting of
     directories to a maximum depth of twelve levels.

   * Many old Unix filesystems place a 14 character limit on the length
     of any filename.  If you care about portability to DOS, _that_ has
     an 8 character limit with an optional extension of 3 or fewer
     characters (known as 8.3 notation).

   A useful idiom when you need to determine whether a particular
pathname is relative or absolute, which works for DOS targets to
follows:

     case "$file" in
       [\\/]* | ?:[\\/]*) echo absolute ;;
       *)                 echo default ;;
     esac


File: autobook.info,  Node: Utilities,  Prev: Environment,  Up: Writing Portable Bourne Shell

22.4 Utilities
==============

The utility programs commonly executed by shell scripts can have a huge
impact on the portability of shell scripts, and it is important to know
which utilities are universally available, and any differences certain
implementations of these utilities may exhibit.  According to the GNU
standards document, you can rely on having access to these utilities
from your scripts:

     cat cmp cp diff echo egrep expr false grep install-info
     ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touch true

   Here are some things that you must be aware of when using some of the
tools listed above:

`cat'
     Host architectures supply `cat' implementations with conflicting
     interpretations of, or entirely missing, the various command line
     options.  You should avoid using any command line options to this
     command.

`cp' and `mv'
     Unconditionally duplicated or otherwise open file descriptors can
     not be deleted on many operating systems, and worse on Windows the
     destination files cannot even be moved.  Constructs like this must
     be avoided, for example.

          exec > foo
          mv foo bar

`echo'
     The `echo' command has at least two flavors:  the one takes a `-n'
     option to suppress the automatic newline at the end of the echoed
     string; the other uses an embedded `\c' notation as the last
     character in the echoed string for the same purpose.

     If you need to emit a string without a trailing newline character,
     you can use the following script fragment to discover which flavor
     of `echo' you are using:

          case echo "testing\c"`,`echo -n testing` in
            *c*,-n*) echo_n=   echo_c='(1)
          ' ;;
            *c*,*)   echo_n=-n echo_c= ;;
            *)       echo_n=   echo_c='\c ;;
          esac

     Any `echo' command after the shell fragment above, which shouldn't
     move the cursor to a new line, can now be written like so:

          echo $echo_n "prompt:$echo_c"

     In addition, you should try to avoid backslashes in `echo'
     arguments unless they are expanded by the shell.  Some
     implementations interpret them and effectively perform another
     backslash expansion pass, where equally many implementations do
     not.  This can become a really hairy problem if you need to have
     an `echo' command which doesn't perform backslash expansion, and
     in fact the first 150 lines of the `ltconfig' script distributed
     with Libtool are devoted to finding such a command.

`ln'
     Not all systems support soft links.  You should use the Autoconf
     macro `AC_PROG_LN_S' to discover what the target architecture
     supports, and assign the result of that test to a variable.
     Whenever you subsequently need to create a link you can use the
     command stored in the variable to do so.

          LN_S=@LN_S@
          ...
          $LN_S $top_srcdir/foo $dist_dir/foo

     Also, you cannot rely on support for the `-f' option from all
     implementations of `ln'.  Use `rm' before calling `ln' instead.

`mkdir'
     Unfortunately, `mkdir -p' is not as portable as we might like.  You
     must either create each directory in the path in turn, or use the
     `mkinstalldirs' script supplied by Automake.

`sed'
     When you resort to using `sed' (rather, use `case' or `expr' if
     you can), there is no need to introduce command line scripts using
     the `-e' option.  Even when you want to supply more than one
     script, you can use `;' as a command separator.  The following two
     lines are equivalent, though the latter is cleaner:

          $ sed -e 's/foo/bar/g -e '12q' < infile > outfile
          $ sed 's/foo/bar/g;12q' < infile > outfile

     Some portability zealots still go to great lengths to avoid "here
     documents" of more than twelve lines.  The twelve line limit is
     actually a limitation in some implementations of `sed', which has
     gradually seeped into the portable shell folklore as a general
     limit in all here documents.  Autoconf, however, includes many
     here documents with far more than twelve lines, and has not
     generated any complaints from users.  This is testament to the
     fact that at worst the limit is only encountered in very obscure
     cases - and most likely that it is not a real limit after all.

     Also, be aware that branch labels of more than eight characters
     are not portable to some imPlementations of `sed'.

     "Here documents" are a way of redirecting literal strings into the
     standard input of a command.  You have certainly seen them before
     if you have looked at other peoples shell scripts, though you may
     not have realised what they were called:

          cat >> /tmp/file$$ << _EOF_
          This is the text of a "here document"
          _EOF_


   Something else to be aware of is that the temporary files created by
your scripts can become a security problem if they are left in `/tmp'
or if the names are predictable.  A simple way around this is to create
a directory in `/tmp' that is unique to the process and owned by the
process user.  Some machines have a utility program for just this
purpose - `mktemp -d' - or else you can always fall back to `umask 077
&& mkdir /tmp/$$'.  Having created this directory, all of the temporary
files for this process should be written to that directory, and its
contents removed as soon as possible.

   Armed with the knowledge of how to write shell code in a portable
fashion as discussed in this chapter, in combination with the M4 details
from the last chapter, the specifics of combining the two to write your
own Autoconf macros are covered in the next chapter.

   ---------- Footnotes ----------

   (1) This is a literal newline.


File: autobook.info,  Node: Writing New Macros for Autoconf,  Next: Migrating Existing Packages,  Prev: Writing Portable Bourne Shell,  Up: Top

23 Writing New Macros for Autoconf
**********************************

Autoconf is an extensible system which permits new macros to be written
and shared between Autoconf users.  Although it is possible to perform
custom tests by placing fragments of shell code into your
`configure.in' file, it is better practice to encapsulate that test in
a macro.  This encourages macro authors to make their macros more
general purpose, easier to test and easier to share with other users.

   This chapter presents some guidelines for designing and implementing
good Autoconf macros.  It will conclude with a discussion of the
approaches being considered by the Autoconf development community for
improving the creation and distribution of macros.  A more general
discussion of macros can be found in *Note Macros and macro expansion::.

* Menu:

* Autoconf Preliminaries::
* Reusing Existing Macros::
* Guidelines for writing macros::
* Implementation specifics::
* Future directions for macro writers::


File: autobook.info,  Node: Autoconf Preliminaries,  Next: Reusing Existing Macros,  Up: Writing New Macros for Autoconf

23.1 Autoconf Preliminaries
===========================

In a small package which only uses Autoconf, your own macros are placed
in the `aclocal.m4' file-this includes macros that you may have
obtained from third parties such as the Autoconf macro archive (*note
Autoconf macro archive::).  If your package additionally uses Automake,
then these macros should be placed in `acinclude.m4'.  The `aclocal'
program from Automake reads in macro definitions from `acinclude.m4'
when generating `aclocal.m4'.  When using Automake, for instance,
`aclocal.m4' will include the definitions of `AM_' macros needed by
Automake.

   In larger projects, it's advisable to keep your custom macros in a
more organized structure.  Autoconf version 2.15 will introduce a new
facility to explicitly include files from your `configure.in' file.
The details have not solidified yet, but it will almost certainly
include a mechanism for automatically included files with the correct
filename extension from a subdirectory, say `m4/'.


File: autobook.info,  Node: Reusing Existing Macros,  Next: Guidelines for writing macros,  Prev: Autoconf Preliminaries,  Up: Writing New Macros for Autoconf

23.2 Reusing Existing Macros
============================

It goes without saying that it makes sense to reuse macros where
possible-indeed, a search of the Autoconf macro archive might turn up a
macro which does exactly what you want, alleviating the need to write a
macro at all (*note Autoconf macro archive::).

   It's more likely, though, that there will be generic, parameterized
tests available that you can use to help you get your job done.
Autoconf"s `generic' tests provide one such collection of macros.  A
macro that wants to test for support of a new language keyword, for
example, should rely on the `AC_TRY_COMPILE' macro.  This macro can be
used to attempt to compile a small program and detect a failure due to,
say, a syntax error.

   In any case, it is good practice when reusing macros to adhere to
their publicized interface-do not rely on implementation details such as
shell variables used to record the test's result unless this is
explicitly mentioned as part of the macro's behavior.  Macros in the
Autoconf core can, and do, change their implementation from time to
time.

   Reusing a macro does not imply that the macro is necessarily invoked
from within the definition of your macro.  Sometimes you might just want
to rely on some action performed by a macro earlier in the configuration
run-this is still a form of reuse.  In these cases, it is necessary to
ensure that this macro has indeed run at least once before your macro is
invoked.  It is possible to state such a dependency by invoking the
`AC_REQUIRE' macro at the beginning of your macro's definition.

   Should you need to write a macro from scratch, the following sections
will provide guidelines for writing better macros.