This is autobook.info, produced by makeinfo version 4.7 from autobook.texi. INFO-DIR-SECTION GNU programming tools START-INFO-DIR-ENTRY * Autoconf, Automake, Libtool: (autobook). Using the GNU autotools. END-INFO-DIR-ENTRY This file documents GNU Autoconf, Automake and Libtool. Copyright (C) 1999, 2000 Gary V. Vaughan, Ben Elliston, Tom Tromey, Ian Lance Taylor Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.  File: autobook.info, Node: Removing --foreign, Next: Installing Header Files, Prev: Using Libtool Libraries, Up: A Large GNU Autotools Project 12.2 Removing `--foreign' ========================= Now that I have the bulk of the project in place, I want it to adhere to the GNU standard layout. By removing the `--foreign' option from the call to `automake' in the `bootstrap' file, `automake' is able to warn me about missing, or in some cases(1), malformed files, as follows: $ ./bootstrap + aclocal -I config + libtoolize --force --copy Putting files in AC_CONFIG_AUX_DIR, config. + autoheader + automake --add-missing --copy automake: Makefile.am: required file ./NEWS not found automake: Makefile.am: required file ./README not found automake: Makefile.am: required file ./AUTHORS not found automake: Makefile.am: required file ./THANKS not found + autoconf The GNU standards book(2) describes the contents of these files in more detail. Alternatively, take a look at a few other GNU packages from `ftp://ftp.gnu.org/gnu'. ---------- Footnotes ---------- (1) For example, when I come to using the `make dist' rule. (2) The GNU standard is distributed from `http://www.gnu.org/prep/standards.html'.  File: autobook.info, Node: Installing Header Files, Next: Including Texinfo Documentation, Prev: Removing --foreign, Up: A Large GNU Autotools Project 12.3 Installing Header Files ============================ One of the more difficult problems with GNU Autotools driven projects is that each of them depends on `config.h' (or its equivalent) and the project specific symbols that it defines. The purpose of this file is to be `#include'd from all of the project source files. The preprocessor can tailor then the code in these files to the target environment. It is often difficult and sometimes impossible to not introduce a dependency on `config.h' from one of the project's installable header files. It would be nice if you could simply install the generated `config.h', but even if you name it carefully or install it to a subdirectory to avoid filename problems, the macros it defines will clash with those from any other GNU Autotools based project which also installs _its_ `config.h'. For example, if Sic installed its `config.h' as `/usr/include/sic/config.h', and had `#include ' in the installed `common.h', when another GNU Autotools based project came to use the Sic library it might begin like this: #if HAVE_CONFIG_H # include #endif #if HAVE_SIC_H # include #endif static const char version_number[] = VERSION; But, `sic.h' says `#include ', which in turn says `#include '. Even though the other project has the correct value for `VERSION' in its own `config.h', by the time the preprocessor reaches the `version_number' definition, it has been redefined to the value in `sic/config.h'. Imagine the mess you could get into if you were using several libraries which each installed their own `config.h' definitions. GCC issues a warning when a macro is redefined to a different value which would help you to catch this error. Some compilers do not issue a warning, and perhaps worse, other compilers will warn even if the repeated definitions have the same value, flooding you with hundreds of warnings for each source file that reads multiple `config.h' headers. The Autoconf macro `AC_OUTPUT_COMMANDS'(1) provides a way to solve this problem. The idea is to generate a system specific but installable header from the results of the various tests performed by `configure'. There is a 1-to-1 mapping between the preprocessor code that relied on the configure results written to `config.h', and the new shell code that relies on the configure results saved in `config.cache'. The following code is a snippet from `configure.in', in the body of the `AC_OUTPUT_COMMANDS' macro: # Add the code to include these headers only if autoconf has # shown them to be present. if test x$ac_cv_header_stdlib_h = xyes; then echo '#include ' >> $tmpfile fi if test x$ac_cv_header_unistd_h = xyes; then echo '#include ' >> $tmpfile fi if test x$ac_cv_header_sys_wait_h = xyes; then echo '#include ' >> $tmpfile fi if test x$ac_cv_header_errno_h = xyes; then echo '#include ' >> $tmpfile fi cat >> $tmpfile << '_EOF_' #ifndef errno /* Some sytems #define this! */ extern int errno; #endif _EOF_ if test x$ac_cv_header_string_h = xyes; then echo '#include ' >> $tmpfile elif test x$ac_cv_header_strings_h = xyes; then echo '#include ' >> $tmpfile fi if test x$ac_cv_header_assert_h = xyes; then cat >> $tmpfile << '_EOF_' #include #define SIC_ASSERT assert _EOF_ else echo '#define SIC_ASSERT(expr) ((void) 0)' >> $tmpfile fi Compare this with the equivalent C pre-processor code from `sic/common.h', which it replaces: #if STDC_HEADERS || HAVE_STDLIB_H # include #endif #if HAVE_UNISTD_H # include #endif #if HAVE_SYS_WAIT_H # include #endif #if HAVE_ERRNO_H # include #endif #ifndef errno /* Some systems #define this! */ extern int errno; #endif #if HAVE_STRING_H # include #else # if HAVE_STRING_H # include # endif #endif #if HAVE_ASSERT_H # include # define SIC_ASSERT assert #else # define SIC_ASSERT(expr) ((void) 0) #endif Apart from the mechanical process of translating the preprocessor code, there is some plumbing needed to ensure that the `common.h' file generated by the new code in `configure.in' is functionally equivalent to the old code, and is generated in a correct and timely fashion. Taking my lead from some of the Automake generated `make' rules to regenerate `Makefile' from `Makefile.in' by calling `config.status', I have added some similar rules to `sic/Makefile.am' to regenerate `common.h' from `common-h.in'. # Regenerate common.h with config.status whenever common-h.in changes. common.h: stamp-common @: stamp-common: $(srcdir)/common-h.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES= CONFIG_HEADERS= CONFIG_OTHER=sic/common.h \ $(SHELL) ./config.status echo timestamp > $@ The way that `AC_OUTPUT_COMMANDS' works, is to copy the contained code into `config.status' (*note Generated File Dependencies::). It is actually `config.status' that creates the generated files - for example, `automake' generated `Makefile's are able to regenerate themselves from corresponding `Makefile.in's by calling `config.status' if they become out of date. Unfortunately, this means that `config.status' doesn't have direct access to the cache values generated while `configure' was running (because it has finished its work by the time `config.status' is called). It is tempting to read in the cache file at the top of the code inside `AC_OUTPUT_COMMANDS', but that only works if you know where the cache file is saved. Also the package installer can use the `--cache-file' option of `configure' to change the location of the file, or turn off caching entirely with `--cache-file=/dev/null'. `AC_OUTPUT_COMMANDS' accepts a second argument which can be used to pass the variable settings discovered by `configure' into `config.status'. It's not pretty, and is a little error prone. In the first argument to `AC_OUTPUT_COMMANDS', you must be careful to check that *every single* configure variable referenced is correctly set somewhere in the second argument. A slightly stripped down example from the sic project `configure.in' looks like this: # ---------------------------------------------------------------------- # Add code to config.status to create an installable host dependent # configuration file. # ---------------------------------------------------------------------- AC_OUTPUT_COMMANDS([ if test -n "$CONFIG_FILES" && test -n "$CONFIG_HEADERS"; then # If both these vars are non-empty, then config.status wasn't run by # automake rules (which always set one or the other to empty). CONFIG_OTHER=${CONFIG_OTHER-sic/common.h} fi case "$CONFIG_OTHER" in *sic/common.h*) outfile=sic/common.h stampfile=sic/stamp-common tmpfile=${outfile}T dirname="sed s,^.*/,,g" echo creating $outfile cat > $tmpfile << _EOF_ /* -*- Mode: C -*- * -------------------------------------------------------------------- * DO NOT EDIT THIS FILE! It has been automatically generated * from: configure.in and `echo $outfile|$dirname`.in * on host: `(hostname || uname -n) 2>/dev/null | sed 1q` * -------------------------------------------------------------------- */ #ifndef SIC_COMMON_H #define SIC_COMMON_H 1 #include #include _EOF_ if test x$ac_cv_func_bzero = xno && \ test x$ac_cv_func_memset = xyes; then cat >> $tmpfile << '_EOF_' #define bzero(buf, bytes) ((void) memset (buf, 0, bytes)) _EOF_ fi if test x$ac_cv_func_strchr = xno; then echo '#define strchr index' >> $tmpfile fi if test x$ac_cv_func_strrchr = xno; then echo '#define strrchr rindex' >> $tmpfile fi # The ugly but portable cpp stuff comes from here infile=$srcdir/sic/`echo $outfile | sed 's,.*/,,g;s,\..*$,,g'`-h.in sed '/^##.*$/d' $infile >> $tmpfile ],[ srcdir=$srcdir ac_cv_func_bzero=$ac_cv_func_bzero ac_cv_func_memset=$ac_cv_func_memset ac_cv_func_strchr=$ac_cv_func_strchr ac_cv_func_strrchr=$ac_cv_func_strrchr ]) You will notice that the contents of `common-h.in' are copied into `common.h' verbatim as it is generated. It's just an easy way of collecting together the code that belongs in `common.h', but which doesn't rely on configuration tests, without cluttering `configure.in' any more than necessary. I should point out that, although this method has served me well for a number of years now, it is inherently fragile because it relies on undocumented internals of both Autoconf and Automake. There is a very real possibility that if you also track the latest releases of GNU Autotools, it may stop working. Future releases of GNU Autotools will address the interface problems that force us to use code like this, for the lack of a better way to do things. ---------- Footnotes ---------- (1) This is for Autoconf version 2.13. Autoconf version 2.50 recommends `AC_CONFIG_COMMANDS'.  File: autobook.info, Node: Including Texinfo Documentation, Next: Adding a Test Suite, Prev: Installing Header Files, Up: A Large GNU Autotools Project 12.4 Including Texinfo Documentation ==================================== Automake provides a few facilities to make the maintenance of Texinfo documentation within projects much simpler than it used to be. Writing a `Makefile.am' for Texinfo documentation is extremely straightforward: ## Process this file with automake to produce Makefile.in MAINTAINERCLEANFILES = Makefile.in info_TEXINFOS = sic.texi The `TEXINFOS' primary will not only create rules for generating `.info' files suitable for browsing with the GNU info reader, but also for generating `.dvi' and `.ps' documentation for printing. You can also create other formats of documentation by adding the appropriate `make' rules to `Makefile.am'. For example, because the more recent Texinfo distributions have begun to support generation of HTML documentation from the `.texi' format master document, I have added the appropriate rules to the `Makefile.am': SUFFIXES = .html html_docs = sic.html .texi.html: $(MAKEINFO) --html $< .PHONY: html html: version.texi $(html_docs) For ease of maintenance, these `make' rules employ a suffix rule which describes how to generate HTML from equivalent `.texi' source - this involves telling make about the `.html' suffix using the automake `SUFFIXES' macro. I haven't defined `MAKEINFO' explicitly (though I could have done) because I know that Automake has already defined it for use in the `.info' generation rules. The `html' target is for convenience; typing `make html' is a little easier than typng `make sic.html'. I have also added a `.PHONY' target so that featureful `make' programs will know that the `html' target doesn't actually generate a file called literally, `html'. As it stands, this code is not quite complete, since the toplevel `Makefile.am' doesn't know how to call the `html' rule in the `doc' subdirectory. There is no need to provide a general solution here in the way Automake does for its `dvi' target, for example. A simple recursive call to `doc/Makefile' is much simpler: docdir = $(top_builddir)/doc html: @echo Making $@ in $(docdir) @cd $(docdir) && make $@ Another useful management function that Automake can perform for you with respect to Texinfo documentation is to automatically generate the version numbers for your Texinfo documents. It will add `make' rules to generate a suitable `version.texi', so long as `automake' sees `@include version.texi' in the body of the Texinfo source: \input texinfo @c -*-texinfo-*- @c %**start of header @setfilename sic.info @settitle Dynamic Modular Interpreter Prototyping @setchapternewpage odd @c %**end of header @headings double @include version.texi @dircategory Programming @direntry * sic: (sic). The dynamic, modular, interpreter prototyping tool. @end direntry @ifinfo This file documents sic. @end ifinfo @titlepage @sp 10 @title Sic @subtitle Edition @value{EDITION}, @value{UPDATED} @subtitle $Id: sic.texi,v 1.1 2004/03/16 07:08:18 joostvb Exp $ @author Gary V. Vaughan @page @vskip 0pt plus 1filll @end titlepage `version.texi' sets Texinfo variables, `VERSION', `EDITION' and `UPDATE', which can be expanded elsewhere in the main Texinfo documentation by using `@value{EDITION}' for example. This makes use of another auxiliary file, `mdate-sh' which will be added to the scripts in the `$ac_aux_dir' subdirectory by Automake after adding the `version.texi' reference to `sic.texi': $ ./bootstrap + aclocal -I config + libtoolize --force --copy Putting files in AC_CONFIG_AUX_DIR, config. + autoheader + automake --add-missing --copy doc/Makefile.am:22: installing config/mdate-sh + autoconf $ make html /bin/sh ./config.status --recheck ... Making html in ./doc make[1]: Entering directory /tmp/sic/doc Updating version.texi makeinfo --html sic.texi make[1]: Leaving directory /tmp/sic/doc Hopefully, it now goes without saying that I also need to add the `doc' subdirectory to `AC_OUTPUT' in `configure.in' and to `SUBDIRS' in the top-level `Makefile.am'.  File: autobook.info, Node: Adding a Test Suite, Prev: Including Texinfo Documentation, Up: A Large GNU Autotools Project 12.5 Adding a Test Suite ======================== Automake has very flexible support for automated test-suites within a project distribution, which are discussed more fully in the Automake manual. I have added a simple shell script based testing facility to Sic using this support - this kind of testing mechanism is perfectly adequate for command line projects. The tests themselves simply feed prescribed input to the uninstalled `sic' interpreter and compare the actual output with what is expected. Here is one of the test scripts: ## -*- sh -*- ## incomplete.test -- Test incomplete command handling # Common definitions if test -z "$srcdir"; then srcdir=echo "$0" | sed s,[^/]*$,,' test "$srcdir" = "$0" && srcdir=. test -z "$srcdir" && srcdir=. test "${VERBOSE+set}" != set && VERBOSE=1 fi . $srcdir/defs # this is the test script cat <<\EOF > in.sic echo "1 2 3" EOF # this is the output we should expect to see cat <<\EOF >ok 1 2 3 EOF cat <<\EOF >errok EOF # Run the test saving stderr to a file, and showing stdout # if VERBOSE == 1 $RUNSIC in.sic 2> err | tee -i out >&2 # Test against expected output if ${CMP} -s out ok; then : else echo "ok:" >&2 cat ok >&2 exit 1 fi # Munge error output to remove leading directories, `lt-' or # trailing `.exe' sed -e "s,^[^:]*[lt-]*sic[.ex]*:,sic:," err >sederr && mv sederr err # Show stderr if doesnt match expected output if VERBOSE == 1 if "$CMP" -s err errok; then : else echo "err:" >&2 cat err >&2 echo "errok:" >&2 cat errok >&2 exit 1 fi The tricky part of this script is the first part which discovers the location of (and loads) `$srcdir/defs'. It is a little convoluted because it needs to work if the user has compiled the project in a separate build tree - in which case the `defs' file is in a separate source tree and not in the actual directory in which the test is executed. The `defs' file allows me to factor out the common definitions from each of the test files so that it can be maintained once in a single file that is read by all of the tests: #! /bin/sh # Make sure srcdir is an absolute path. Supply the variable # if it does not exist. We want to be able to run the tests # stand-alone!! # srcdir=${srcdir-.} if test ! -d $srcdir ; then echo "defs: installation error" 1>&2 exit 1 fi # IF the source directory is a Unix or a DOS root directory, ... # case "$srcdir" in /* | [A-Za-z]:\\*) ;; *) srcdir=`\cd $srcdir && pwd` ;; esac case "$top_builddir" in /* | [A-Za-z]:\\*) ;; *) top_builddir=`\cd ${top_builddir-..} && pwd` ;; esac progname=`echo "$0" | sed 's,^.*/,,'` testname=`echo "$progname" | sed 's,-.*$,,'` testsubdir=${testsubdir-testSubDir} SIC_MODULE_PATH=$top_builddir/modules export SIC_MODULE_PATH # User can set VERBOSE to prevent output redirection case x$VERBOSE in xNO | xno | x0 | x) exec > /dev/null 2>&1 ;; esac rm -rf $testsubdir > /dev/null 2>&1 mkdir $testsubdir cd $testsubdir \ || { echo "Cannot make or change into $testsubdir"; exit 1; } echo "=== Running test $progname" CMP="${CMP-cmp}" RUNSIC="${top_builddir}/src/sic" Having written a few more test scripts, and made sure that they are working by running them from the command line, all that remains is to write a suitable `Makefile.am' so that `automake' can run the test suite automatically. ## Makefile.am -- Process this file with automake to produce Makefile.in EXTRA_DIST = defs $(TESTS) MAINTAINERCLEANFILES = Makefile.in testsubdir = testSubDir TESTS_ENVIRONMENT = top_builddir=$(top_builddir) TESTS = \ empty-eval.test \ empty-eval-2.test \ empty-eval-3.test \ incomplete.test \ multicmd.test distclean-local: -rm -rf $(testsubdir) I have used the `testsubdir' macro to run the tests in their own subdirectory so that the directory containing the actual test scripts is not polluted with lots of fallout files generated by running the tests. For completeness I have used a "hook target"(1) to remove this subdirectory when the user types: $ make distclean ... rm -rf testSubDir ... Adding more tests is accomplished by creating a new test script and adding it to the list in `noinst_SCRIPTS'. Remembering to add the new `tests' subdirectory to `configure.in' and the top-level `Makefile.am', and reconfiguring the project to propagate the changes into the various generated files, I can run the whole test suite from the top directory with: $ make check It is often useful run tests in isolation, either when developing new tests, or to examine more closely why a test has failed unexpectedly. Having set this test suite up as I did, individual tests can be executed with: $ VERBOSE=1 make check TESTS=incomplete.test make check-TESTS make[1]: Entering directory /tmp/sic/tests === Running test incomplete.test 1 2 3 PASS: incomplete.test ================== All 1 tests passed ================== make[1]: Leaving directory /tmp/sic/tests $ ls testSubDir/ err errok in.sic ok out The `testSubDir' subdirectory now contains the expected and actual output from that particular test for both `stdout' and `stderr', and the input file which generated the actual output. Had the test failed, I would be able to look at these files to decide whether there is a bug in the program or simply a bug in the test script. Being able to examine individual tests like this is invaluable, especially when the test suite becomes very large - because you will, naturally, add tests every time you add features to a project or find and fix a bug. Another alternative to the pure shell based test mechanism I have presented here is the Autotest facility by Franc,ois Pinard, as used in Autoconf after release 2.13. Later in *Note A Complex GNU Autotools Project::, the Sic project will be revisited to take advantage of some of the more advanced features of GNU Autotools. But first these advanced features will be discussed in the next several chapters - starting, in the next chapter, with a discussion of how GNU Autotools can help you to make a tarred distribution of your own projects. ---------- Footnotes ---------- (1) This is a sort of callback function which will be called by the `make' rules generated by Automake.  File: autobook.info, Node: Rolling Distribution Tarballs, Next: Installing and Uninstalling, Prev: A Large GNU Autotools Project, Up: Top 13 Rolling Distribution Tarballs ******************************** There's something about the word `tarballs' that make you want to avoid them altogether, let alone get involved in the disgusting process of rolling one. And, in the past, that was apparently the attitude of most developers, as witnessed by the strange ways distribution tar archives were created and unpacked. Automake largely automates this tedious process, in a sense providing you with the obliviousness you crave. * Menu: * Introduction to Distributions:: * What goes in:: * The distcheck rule:: * Some caveats:: * Implementation::  File: autobook.info, Node: Introduction to Distributions, Next: What goes in, Up: Rolling Distribution Tarballs 13.1 Introduction to Distributions ================================== The basic approach to creating a tar distribution is to run make make dist The generated tar file is named PACKAGE-VERSION.tar.gz, and will unpack into a directory named PACKAGE-VERSION. These two rules are mandated by the GNU Coding Standards, and are just good ideas in any case, because it is convenient for the end user to have the version information easily accessible while building a package. It removes any doubt when she goes back to an old tree after some time away from it. Unpacking into a fresh directory is always a good idea - in the old days some packages would unpack into the current directory, requiring an annoying clean-up job for the unwary system administrator. The unpacked archive is completely portable, to the extent of Automake's ability to enforce this. That is, all the generated files (e.g., `configure') are newer than their inputs (e.g., `configure.in'), and the distributed `Makefile.in' files should work with any version of `make'. Of course, some of the responsibility for portability lies with you: you are free to introduce non-portable code into your `Makefile.am', and Automake can't diagnose this. No special tools beyond the minimal tool list (*note Minimal Tool List: (standards)Utilities in Makefiles.) plus whatever your own `Makefile' and `configure' additions use, will be required for the end user to build the package. By default Automake creates a `.tar.gz' file. It notices if you are using GNU `tar' and arranges to create portable archives in this case. (1) People do sometimes want to make other sorts of distributions. Automake allows this through the use of options. `dist-bzip2' Add a `dist-bzip2' target, which creates a `.tar.bz2' file. These files are frequently smaller than the corresponding `.tar.gz' file. `dist-shar' Add a `dist-shar' target, which creates a `shar' archive. `dist-zip' Add a `dist-zip' target, which creates a `zip' file. These files are popular for Windows distributions. `dist-tarZ' Add a `dist-tarZ' target, which creates a `.tar.Z' file. This exists mostly for die-hard old-time Unix hackers; the rest of the world has moved on to `gzip' or `bzip2'. ---------- Footnotes ---------- (1) By default, GNU `tar' can create non-portable archives in certain (rare) situations. To be safe, Automake arranges to use the `-o' compatibility flag when GNU `tar' is used.  File: autobook.info, Node: What goes in, Next: The distcheck rule, Prev: Introduction to Distributions, Up: Rolling Distribution Tarballs 13.2 What goes in ================= Automake tries to make creating a distribution as easy as possible. The rules are set up by default to distribute those things which Automake knows belong in a distribution. For instance, Automake always distributes your `configure' script and your `NEWS' file. All the files Automake automatically distributes are shown by `automake --help': $ automake --help ... Files which are automatically distributed, if found: ABOUT-GNU README config.guess ltconfig ABOUT-NLS THANKS config.h.bot ltmain.sh AUTHORS TODO config.h.top mdate-sh BACKLOG acconfig.h config.sub missing COPYING acinclude.m4 configure mkinstalldirs COPYING.LIB aclocal.m4 configure.in stamp-h.in ChangeLog ansi2knr.1 elisp-comp stamp-vti INSTALL ansi2knr.c install-sh texinfo.tex NEWS compile libversion.in ylwrap ... Automake also distributes some files about which it has no built-in knowledge, but about which it learns from your `Makefile.am'. For instance, the source files listed in a `_SOURCES' variable go into the distribution. This is why you ought to list uninstalled header files in the `_SOURCES' variable: otherwise you'll just have to introduce another variable to distribute them - Automake will only know about them if you tell it. Not all primaries are distributed by default. The rule is arbitrary, but pretty simple: of all the primaries, only `_TEXINFOS' and `_HEADERS' are distributed by default. (Sources that make up programs and libraries are also distributed by default, but, perhaps confusingly, `_SOURCES' is not considered a primary.) While there is no rhyme, there is a reason: defaults were chosen based on feedback from users. Typically, `enough' reports of the form `I auto-generate my `_SCRIPTS'. How do I prevent them from ending up in the distribution?' would cause a change in the default. Although the defaults are adequate in many situations, sometimes you have to distribute files which aren't covered automatically. It is easy to add additional files to a distribution; simply list them in the macro `EXTRA_DIST'. You can list files in subdirectories here. You can also list a directory's name here and the entire contents will be copied into the distribution by `make dist'. Use this last feature with care. A typical failure is that you'll put a `temporary' file in the directory and then it will end up in the distribution when you forget to remove it. Similarly, version control files, such as a `CVS' subdirectory, can easily end up in a distribution this way. If a primary is not distributed by default, but in your case it ought to be, you can easily correct it with `EXTRA_DIST': EXTRA_DIST = $(bin_SCRIPTS) The next major Automake release (1) will have a better method for controlling whether primaries do or do not go into the distribution. In 1.5 you will be able to use the `dist' and `nodist' prefixes to control distribution on a per-variable basis. You will even be able to simultaneously use both prefixes with a given primary to include some files and omit others: dist_bin_SCRIPTS = distribute-this nodist_bin_SCRIPTS = but-not-this ---------- Footnotes ---------- (1) Probably numbered 1.5.  File: autobook.info, Node: The distcheck rule, Next: Some caveats, Prev: What goes in, Up: Rolling Distribution Tarballs 13.3 The distcheck rule ======================= The `make dist' documentation sounds nice, and `make dist' did do something, but how do you know it really works? It is a terrible feeling when you realize your carefully crafted distribution is missing a file and won't compile on a user's machine. I wouldn't write such an introduction unless Automake provided a solution. The solution is a smoke test known as `make distcheck'. This rule performs a `make dist' as usual, but it doesn't stop there. Instead, it then proceeds to untar the new archive into a fresh directory, build it in a fresh build directory separate from the source directory, install it into a third fresh directory, and finally run `make check' in the build tree. If any step fails, `distcheck' aborts, leaving you to fix the problem before it will create a distribution. While not a complete test - it only tries one architecture, after all - `distcheck' nevertheless catches most packaging errors (as opposed to portability bugs), and its use is highly recommended.  File: autobook.info, Node: Some caveats, Next: Implementation, Prev: The distcheck rule, Up: Rolling Distribution Tarballs 13.4 Some caveats ================= Earlier, if you were awake, you noticed that I recommended the use of `make' before `make dist' or `make distcheck'. This practice ensures that all the generated files are newer than their inputs. It also solves some problems related to dependency tracking (*note Advanced GNU Automake Usage::). Note that currently Automake will allow you to make a distribution when maintainer mode is off, or when you do not have all the required maintainer tools. That is, you can make a subtly broken distribution if you are motivated or unlucky. This will be addressed in a future version of Automake.  File: autobook.info, Node: Implementation, Prev: Some caveats, Up: Rolling Distribution Tarballs 13.5 Implementation =================== In order to understand how to use the more advanced `dist'-related features, you must first understand how `make dist' is implemented. For most packages, what we've already covered will suffice. Few packages will need the more advanced features, though I note that many use them anyway. The `dist' rules work by building a copy of the source tree and then archiving that copy. This copy is made in stages: a `Makefile' in a particular directory updates the corresponding directory in the shadow tree. In some cases, `automake' is run to create a new `Makefile.in' in the new distribution tree. After each directory's `Makefile' has had a chance to update the distribution directory, the appropriate command is run to create the archive. Finally, the temporary directory is removed. If your `Makefile.am' defines a `dist-hook' rule, then Automake will arrange to run this rule when the copying work for this directory is finished. This rule can do literally anything to the distribution directory, so some care is required - careless use will result in an unusable distribution. For instance, Automake will create the shadow tree using links, if possible. This means that it is inadvisable to modify the files in the `dist' tree in a dist hook. One common use for this rule is to remove files that erroneously end up in the distribution (in rare situations this can happen). The variable `distdir' is defined during the `dist' process and refers to the corresponding directory in the distribution tree; `top_distdir' refers to the root of the distribution tree. Here is an example of removing a file from a distribution: dist-hook: -rm $(distdir)/remove-this-file  File: autobook.info, Node: Installing and Uninstalling, Next: Writing Portable C, Prev: Rolling Distribution Tarballs, Up: Top 14 Installing and Uninstalling Configured Packages ************************************************** Have you ever seen a package where, once built, you were expected to keep the build tree around forever, and always `cd' there before running the tool? You might have to cast your mind way, way back to the bad old days of 1988 to remember such a horrible thing. The GNU Autotools provides a canned solution to this problem. While not without flaws, it does provide a reasonable and easy-to-use framework. In this chapter we discuss how the GNU Autotools installation model, how to convince `automake' to install files where you want them, and finally we conclude with some information about uninstalling, including a brief discussion of its flaws. 14.1 Where files are installed ============================== If you've ever run `configure --help', you've probably been frightened by the huge number of options offered. Although nobody ever uses more than two or three of these, they are still important to understand when writing your package; their proper use will help you figure out where each file should be installed. For a background on these standard directories and their uses, refer to *Note Invoking configure::. We do recommend using the standard directories as described. While most package builders only use `--prefix' or perhaps `--exec-prefix', some packages (eg. GNU/Linux distributions) require more control. For instance, if your package `quux' puts a file into `sysconfigdir', then in the default configuration it will end up in `/usr/local/var'. However, for a GNU/Linux distribution it would make more sense to configure with `--sysconfigdir=/var/quux'. Automake makes it very easy to use the standard directories. Each directory, such as `bindir', is mapped onto a `Makefile' variable of the same name. Automake adds three useful variables to the standard list: `pkgincludedir' This is a convenience variable whose value is `$(includedir)/$(PACKAGE)'. `pkgdatadir' A convenience variable whose value is `$(datadir)/$(PACKAGE)'. `pkglibdir' A variable whose value is `$(libdir)/$(PACKAGE)'. These cannot be set on the `configure' command line but are always defined as above. (1) In Automake, a directory variable's name, without the `dir' suffix, can be used as a prefix to a primary to indicate install location. Confused yet? And example will help: items listed in `bin_PROGRAMS' are installed in `bindir'. Automake's rules are actually a bit more precise than this: the directory and the primary must agree. It doesn't make sense to install a library in `datadir', so Automake won't let you. Here is a complete list showing primaries and the directories which can be used with them: `PROGRAMS' `bindir', `sbindir', `libexecdir', `pkglibdir'. `LIBRARIES' `libdir', `pkglibdir'. `LTLIBRARIES' `libdir', `pkglibdir'. `SCRIPTS' `bindir', `sbindir', `libexecdir', `pkgdatadir'. `DATA' `datadir', `sysconfdir', `sharedstatedir', `localstatedir', `pkgdatadir'. `HEADERS' `includedir', `oldincludedir', `pkgincludedir'. `TEXINFOS' `infodir'. `MANS' `man', `man0', `man1', `man2', `man3', `man4', `man5', `man6', `man7', `man8', `man9', `mann', `manl'. There are two other useful prefixes which, while not directory names, can be used in their place. These prefixes are valid with any primary. The first of these is `noinst'. This prefix tells Automake that the listed objects should not be installed, but should be built anyway. For instance, you can use `noinst_PROGRAMS' to list programs which will not be installed. The second such non-directory prefix is `check'. This prefix tells Automake that this object should not be installed, and furthermore that it should only be built when the user runs `make check'. Early in Automake history we discovered that even Automake's extended built-in list of directories was not enough - basically anyone who had written a `Makefile.am' sent in a bug report about this. Now Automake lets you extend the list of directories. First you must define your own directory variable. This is a macro whose name ends in `dir'. Define this variable however you like. We suggest that you define it relative to an autoconf directory variable; this gives the user some control over the value. Don't hardcode it to something like `/etc'; absolute hardcoded paths are rarely portable. Now you can attach the base part of the new variable to a primary just as you can with the built-in directories: foodir = $(datadir)/foo foo_DATA = foo.txt Automake lets you attach such a variable to any primary, so you can do things you ordinarily wouldn't want to do or be allowed to do. For instance, Automake won't diagnose this piece of code that tries to install a program in an architecture-independent location: foodir = $(datadir)/foo foo_PROGRAMS = foo 14.2 Fine-grained control of install ==================================== The second most common way (2) to configure a package is to set `prefix' and `exec-prefix' to different values. This way, a system administrator on a heterogeneous network can arrange to have the architecture-independent files shared by all platforms. Typically this doesn't save very much space, but it does make in-place bug fixing or platform-independent runtime configuration a lot easier. To this end, Automake provides finer control to the user than a simple `make install'. For instance, the user can strip all the package executables at install time by running `make install-strip' (though we recommend setting the various `INSTALL' environment variables instead; this is discussed later). More importantly, Automake provides a way to install the architecture-dependent and architecture-independent parts of a package independently. In the above scenario, installing the architecture-independent files more than once is just a waste of time. Our hypothetical administrator can install those pieces exactly once, with `make install-data', and then on each type of build machine install only the architecture-dependent files with `make install-exec'. Nonstandard directories specified in `Makefile.am' are also separated along `data' and `exec' lines, giving the user complete control over installation. If, and only if, the directory variable name contains the string `exec', then items ending up in that directory will be installed by `install-exec' and not `install-data'. At some sites, the paths referred to by software at runtime differ from those used to actually install the software. For instance, suppose `/usr/local' is mounted read-only throughout the network. On the server, where new packages are built, the file system is available read-write as `/w/usr/local' - a directory which is not mounted anywhere else. In this situation the sysadmin can configure and build using the _runtime_ values, but use the `DESTDIR' trick to temporarily change the paths at install time: ./configure --prefix=/usr/local make make DESTDIR=/w install Note that `DESTDIR' operates as a prefix only. Sometimes this isn't enough. In this situation you can explicitly override each directory variable: ./configure --prefix=/usr/local make make prefix=/w/usr/local datadir=/w/usr/share install Here is a full example (3) showing how you can unpack, configure, and build a typical GNU program on multiple machines at the same time: sunos$ tar zxf foo-0.1.tar.gz sunos$ mkdir sunos linux In one window: sunos$ cd sunos sunos$ ../foo-0.1/configure --prefix=/usr/local \ > --exec-prefix=/usr/local/sunos sunos$ make sunos$ make install And in another window: sunos$ rsh linux linux$ cd ~/linux linux$ ../foo-0.1/configure --prefix=/usr/local \ > --exec-prefix=/usr/local/linux linux$ make linux$ make install-exec In this example we install everything on the `sunos' machine, but we only install the platform-dependent files on the `linux' machine. We use a different `exec-prefix', so for example GNU/Linux executables will end up in `/usr/local/linux/bin/'. 14.3 Install hooks ================== As with `dist', the install process allows for generic targets which can be used when the existing install functionality is not enough. There are two types of targets which can be used: local rules and hooks. A local rule is named either `install-exec-local' or `install-data-local', and is run during the course of the normal install procedure. This rule can be used to install things in ways that Automake usually does not support. For instance, in `libgcj' we generate a number of header files, one per Java class. We want to install them in `pkgincludedir', but we want to preserve the hierarchical structure of the headers (e.g., we want `java/lang/String.h' to be installed as `$(pkgincludedir)/java/lang/String.h', not `$(pkgincludedir)/String.h'), and Automake does not currently support this. So we resort to a local rule, which is a bit more complicated than you might expect: install-data-local: @for f in $(nat_headers) $(extra_headers); do \ ## Compute the install directory at runtime. d="echo $$f | sed -e s,/[^/]*$$,,'"; \ ## Make the install directory. $(mkinstalldirs) $(DESTDIR)$(includedir)/$$d; \ ## Find the header file -- in our case it might be in srcdir or ## it might be in the build directory. "p" is the variable that ## names the actual file we will install. if test -f $(srcdir)/$$f; then p=$(srcdir)/$$f; else p=$$f; fi; \ ## Actually install the file. $(INSTALL_DATA) $$p $(DESTDIR)$(includedir)/$$f; \ done A hook is guaranteed to run after the install of objects in this directory has completed. This can be used to modify files after they have been installed. There are two install hooks, named `install-data-hook' and `install-exec-hook'. For instance, suppose you have written a program which must be `setuid' root. You can accomplish this by changing the permissions after the program has been installed: bin_PROGRAMS = su su_SOURCES = su.c install-exec-hook: chown root $(bindir)/su chmod u+s $(bindir)/su Unlike an install hook, and install rule is not guaranteed to be after all other install rules are run. This lets it be run in parallel with other install rules when a parallel `make' is used. Ordinarily this is not very important, and in practice you almost always see local hooks and not local rules. The biggest caveat to using a local rule or an install hook is to make sure that it will work when the source and build directories are not the same--many people forget to do this. This means being sure to look in `$(srcdir)' when the file is a source file. It is also very important to make sure that you do not use a local rule when install order is important - in this case, your `Makefile' will succeed on some machines and fail on others. 14.4 Uninstall ============== As if things arent confusing enough, there is still one more major installation-related feature which we haven't mentioned: uninstall. Automake adds an `uninstall' target to your `Makefile' which does the reverse of `install': it deletes the newly installed package. Unlike `install', there is no `uninstall-data' or `uninstall-exec'; while possible in theory we don't think this would be useful enough to actually use. Like `install', you can write `uninstall-local' or `uninstall-hook' rules. In our experience, `uninstall' is not a very useful feature. Automake implements it because it is mandated by the GNU Standards, but it doesn't work reliably across packages. Maintainers who write install hooks typically neglect to write uninstall hooks. Also, since it can't reliably uninstall a _previously_ installed version of a package, it isn't useful for what most people would want to use it for anyway. We recommend using a real packaging system, several of which are freely available. In particular, GNU Stow, RPM, and the Debian packaging system seem like good choices. ---------- Footnotes ---------- (1) There has been some debate in the Autoconf community about extending Autoconf to allow new directories to be set on the `configure' command line. Currently the consensus seems to be that there are too many arguments to `configure' already. (2) The most common way being to simply set `prefix'. (3) This example assumes the use of GNU tar when extracting; this is standard on Linux but does not come with Solaris.  File: autobook.info, Node: Writing Portable C, Next: Writing Portable C++, Prev: Installing and Uninstalling, Up: Top 15 Writing Portable C with GNU Autotools **************************************** GNU Autotools permits you to write highly portable programs. However, using GNU Autotools is not by itself enough to make your programs portable. You must also write them portably. In this chapter we will give an introduction to writing portable programs in C. We will start with some notes on portable use of the C language itself. We will then discuss cross-Unix portability. We will finish with some notes on portability between Unix and Windows. Portability is a big topic, and we can not cover everything in this chapter. The basic rule of portable code is to remember that every system is in some ways unique. Do not assume that every other system is like yours. It is very helpful to be familiar with relevant standards, such as the ISO C standard and the POSIX.1 standard. Finally, there is no substitute for experience; if you have the opportunity to build and test your program on different systems, do so. * Menu: * C Language Portability:: * Cross-Unix Portability:: * Unix/Windows Portability::  File: autobook.info, Node: C Language Portability, Next: Cross-Unix Portability, Up: Writing Portable C 15.1 C Language Portability =========================== The C language makes it easy to write non-portable code. In this section we discuss these portability issues, and how to avoid them. We concentrate on differences that can arise on systems in common use today. For example, all common systems today define `char' to be 8 bits, and define a pointer to hold the address of an 8-bit byte. We do not discuss the more exotic possibilities found on historical machines or on certain supercomputers. If your program needs to run in unusual settings, make sure you understand the characteristics of those systems; the system documentation should include a C portability guide describing the problems you are likely to encounter. * Menu: * ISO C:: * C Data Type Sizes:: * C Endianness:: * C Structure Layout:: * C Floating Point:: * GNU cc Extensions::  File: autobook.info, Node: ISO C, Next: C Data Type Sizes, Up: C Language Portability 15.1.1 ISO C ------------ The ISO C standard first appeared in 1989 (the standard is often called ANSI C). It added several new features to the C language, most notably function prototypes. This led to many years of portability issues when deciding whether to use ISO C features. We think that programs written today can assume the presence of an ISO C compiler. Therefore, we will not discuss issues related to the differences between ISO C compilers and older compilers--often called K&R compilers, from the first book on C by Kernighan and Ritchie. You may see these differences handled in older programs. There is a newer C standard called `C9X'. Because compilers that support it are not widely available as of this writing, this discussion does not cover it.  File: autobook.info, Node: C Data Type Sizes, Next: C Endianness, Prev: ISO C, Up: C Language Portability 15.1.2 C Data Type Sizes ------------------------ The C language defines data types in terms of a minimum size, rather than an exact size. As of this writing, this mainly matters for the types `int' and `long'. A variable of type `int' must be at least 16 bits, and is often 32 bits. A variable of type `long' must be at least 32 bits, and is sometimes 64 bits. The range of a 16 bit number is -32768 to 32767 for a signed number, or 0 to 65535 for an unsigned number. If a variable may hold numbers larger than 16 bits, use `long' rather than `int'. Never assume that `int' or `long' have a specific size, or that they will overflow at a particular point. When appropriate, use variables of system defined types rather than `int' or `long': `size_t' Use this to hold the size of an object, as returned by `sizeof'. `ptrdiff_t' Use this to hold the difference between two pointers into the same array. `time_t' Use this to hold a time value as returned by the `time' function. `off_t' On a Unix system, use this to hold a file position as returned by `lseek'. `ssize_t' Use this to hold the result of the Unix `read' or `write' functions. Some books on C recommend using typedefs to specify types of particular sizes, and then adjusting those typedefs on specific systems. GNU Autotools supports this using the `AC_CHECK_SIZEOF' macro. However, while we agree with using typedefs for clarity, we do not recommend using them purely for portability. It is safest to rely only on the minimum size assumptions made by the C language, rather than to assume that a type of a specific size will always be available. Also, most C compilers will define `int' to be the most efficient type for the system, so it is normally best to simply use `int' when possible.  File: autobook.info, Node: C Endianness, Next: C Structure Layout, Prev: C Data Type Sizes, Up: C Language Portability 15.1.3 C Endianness ------------------- When a number longer than a single byte is stored in memory, it must be stored in some particular format. Modern systems do this by storing the number byte by byte such that the bytes can simply be concatenated into the final number. However, the order of storage varies: some systems store the least significant byte at the lowest address in memory, while some store the most significant byte there. These are referred to as "little-endian" and "big-endian" systems, respectively.(1) This difference means that portable code may not make any assumptions about the order of storage of a number. For example, code like this will act differently on different systems: /* Example of non-portable code; don't do this */ int i = 4; char c = *(char *) i; Although that was a contrived example, real problems arise when writing numeric data in a file or across a network connection. If the file or network connection may be read on a different type of system, numeric data must be written in a format which can be unambiguously recovered. It is not portable to simply do something like /* Example of non-portable code; don't do this */ write (fd, &i, sizeof i); This example is non-portable both because of endianness and because it assumes that the size of the type of `i' are the same on both systems. Instead, do something like this: int j; char buf[4]; for (j = 0; j < 4; ++j) buf[j] = (i >> (j * 8)) & 0xff; write (fd, buf, 4); /* In real code, check the return value */ This unambiguously writes out a little endian 4 byte value. The code will work on any system, and the result can be read unambiguously on any system. Another approach to handling endianness is to use the `htonS' and `ntohS' functions available on most systems. These functions convert between "network endianness" and host endianness. Network endianness is big-endian; it has that name because the standard TCP/IP network protocols use big-endian ordering. These functions come in two sizes: `htonl' and `ntohl' operate on 4-byte quantities, and `htons' and `ntohs' operate on 2-byte quantities. The `hton' functions convert host endianness to network endianness. The `ntoh' functions convert network endianness to host endianness. On big-endian systems, these functions simply return their arguments; on little-endian systems, they return their arguments after swapping the bytes. Although these functions are used in a lot of existing code, they can be difficult to use in highly portable code, because they require knowing the exact size of your data types. If you know that the type `int' is exactly 4 bytes long, then it is possible to write code like the following: int j; j = htonl (i); write (fd, &j, 4); However, if `int' is not exactly 4 bytes long, this example will not work correctly on all systems. ---------- Footnotes ---------- (1) These names come from `Gulliver's Travels'.  File: autobook.info, Node: C Structure Layout, Next: C Floating Point, Prev: C Endianness, Up: C Language Portability 15.1.4 C Structure Layout ------------------------- C compilers on different systems lay out structures differently. In some cases there can even be layout differences between different C compilers on the same system. Compilers add gaps between fields, and these gaps have different sizes and are at different locations. You can normally assume that there are no gaps between fields of type `char' or array of `char'. However, you can not make any assumptions about gaps between fields of any larger type. You also can not make any assumptions about the layout of bitfield types. These structure layout issues mean that it is difficult to portably use a C struct to define the format of data which may be read on another type of system, such as data in a file or sent over a network connection. Portable code must read and write such data field by field, rather than trying to read an entire struct at once. Here is an example of non-portable code when reading data which may have been written to a file or a network connection on another type of system. Don't do this. /* Example of non-portable code; don't do this */ struct { short i; int j; } s; read (fd, &s, sizeof s); Instead, do something like this (the struct `s' is assumed to be the same as above): unsigned char buf[6]; read (fd, buf, sizeof buf); /* Should check return value */ s.i = buf[0] | (buf[1] << 8); s.j = buf[2] | (buf[3] << 8) | (buf[4] << 16) | (buf[5] << 24); Naturally the code to write out the structure should be similar.  File: autobook.info, Node: C Floating Point, Next: GNU cc Extensions, Prev: C Structure Layout, Up: C Language Portability 15.1.5 C Floating Point ----------------------- Most modern systems handle floating point following the IEEE-695 standard. However, there are still portability issues. Most processors use 64 bits of precision when computing floating point values. However, the widely used Intel x86 series of processors compute temporary values using 80 bits of precision, as do most instances of the Motorola 68k series. Some other processors, such as the PowerPC, provide fused multiply-add instructions which perform a multiplication and an addition using high precision for the intermediate value. Optimizing compilers will generate such instructions based on sequences of C operations. For almost all programs, these differences do not matter. However, for programs which do intensive floating point operations, the differences can be significant. It is possible to write floating point loops which terminate on one sort of processor but not on another. Unfortunately, there is no rule of thumb that can be used to avoid these problems. Most compilers provide an option to disable the use of extended precision (for GNU cc, the option is `-ffloat-store'). However, on the one hand, this merely shifts the portability problem elsewhere, and, on the other, the extended precision is often good rather than bad. Although these portability problems can not be easily avoided, you should at least be aware of them if you write programs which require very precise floating point operations. The IEEE-695 standard specifies certain flags which the floating point processor should make available (e.g., overflow, underflow, inexact), and specifies that there should be some control over the floating point rounding mode. Most processors make these flags and controls available; however, there is no portable way to access them. A portable program should not assume that it will have this degree of control over floating point operations.  File: autobook.info, Node: GNU cc Extensions, Prev: C Floating Point, Up: C Language Portability 15.1.6 GNU cc Extensions ------------------------ The GNU `cc' compiler has several useful extensions, which are documented in the GNU `cc' manual. A program which must be portable to other C compilers must naturally avoid these extensions; the `-pedantic' option may be used to warn about any accidental use of an extension. However, the GNU cc compiler is itself highly portable, and it runs on all modern Unix platforms as well as on Windows. Depending upon your portability requirements, you may be able to simply assume that GNU cc is available, in which case your program may use extensions when they are useful. Note that some extensions are inherently non-portable, such as inline assembler code, or using attributes to specify a particular section for a function or a global variable.  File: autobook.info, Node: Cross-Unix Portability, Next: Unix/Windows Portability, Prev: C Language Portability, Up: Writing Portable C 15.2 Cross-Unix Portability =========================== In the previous section, we discussed issues related to the C language. Here we will discuss the portability of C programs across different Unix implementations. All modern Unix systems conform to the POSIX.1 (1990 edition) and POSIX.2 (1992 edition) standards. They also all support the sockets interface for networking code. However, there are still significant differences between systems which can affect portability. We will not discuss portability to older Unix systems which do not conform to the POSIX standards. If you need this sort of portability, you can often find some valuable hints in the set of macros defined by `autoconf', and in the `configure.in' files of older programs which use `autoconf'. * Menu: * Cross-Unix Function Calls:: * Cross-Unix System Interfaces::  File: autobook.info, Node: Cross-Unix Function Calls, Next: Cross-Unix System Interfaces, Up: Cross-Unix Portability 15.2.1 Cross-Unix Function Calls -------------------------------- Functions not mentioned in POSIX.1 may not be available on all systems. If you want to use one of these functions, you should normally check for its presence by using `AC_CHECK_FUNCS' in your `configure.in' script, and adapt to its absence if possible. Here is a list of some popular functions which are available on many, but not all, modern Unix systems: `alloca' There are several portability issues with `alloca'. See the description of `AC_FUNC_ALLOCA' in the autoconf manual. Although this function can be very convenient, it is normally best to avoid it in highly portable code. `dlopen' GNU libtool provides a portable alternate interface to `dlopen'. *Note Dynamic Loading::. `getline' In some cases `fgets' may be used as a fallback. In others, you will need to provide your own version of this function. `getpagesize' On some systems, the page size is available as the macro `PAGE_SIZE' in the header file `sys/param.h'. On others, the page size is available via the `sysconf' function. If none of those work, you must generally simply guess a value such as `4096'. `gettimeofday' When this is not available, fall back to a less precise function such as `time' or `ftime' (which itself is not available on all systems). `mmap' In some cases you can use either `mmap' or ordinary file I/O. In others, a program which uses `mmap' will simply not be portable to all Unix systems. Note that `mmap' is an optional part of the 1996 version of POSIX.1, so it is likely to be added to all Unix systems over time. `ptrace' Unix systems without `ptrace' generally provide some other mechanism for debugging subprocesses, such as `/proc'. However, there is no widely portable method for controlling subprocesses, as evidenced by the source code to the GNU debugger, `gdb'. `setuid' Different Unix systems handle this differently. On some systems, any program can switch between the effective user ID of the executable and the real user ID. On others, switching to the real user ID is final; some of those systems provide the `setreuid' function instead to switch the effective and real user ID. The effect when a program run by the superuser calls `setuid' varies among systems. `snprintf' If this is not available, then in some cases it will be reasonable to simply use `sprintf', and in others you will need to write a little routine to estimate the required length and allocate an appropriate buffer before calling `sprintf'. `strcasecmp' `strdup' `strncasecmp' You can normally provide your own version of these simple functions. `valloc' When this is not available, just use `malloc' instead. `vfork' When this is not available, just use `fork' instead.  File: autobook.info, Node: Cross-Unix System Interfaces, Prev: Cross-Unix Function Calls, Up: Cross-Unix Portability 15.2.2 Cross-Unix System Interfaces ----------------------------------- There are several Unix system interfaces which have associated portability issues. We do not have the space here to discuss all of these in detail across all Unix systems. However, we mention them here to indicate issues where you may need to consider portability. `curses' `termcap' `terminfo' Many Unix systems provide the `curses' interface for simple graphical terminal access, but the name of the library varies. Typical names are `-lcurses' or `-lncurses'. Some Unix systems do not provide `curses', but do provide the `-ltermcap' or `-lterminfo' library. The latter libraries only provide an interface to the `termcap' file or `terminfo' files. These files contain information about specific terminals, the difference being mainly the manner in which they are stored. `proc file system' The `/proc' file system is not available on all Unix systems, and when it is available the actual set of files and their format varies. `pseudo terminals' All Unix systems provide pseudo terminals, but the interface to obtain them varies widely. We recommend examining the configuration of an existing program which uses them, such as GNU emacs or Expect. `shared libraries' Shared libraries differ across Unix systems. The GNU libtool program was written to provide an interface to hide the differences. *Note Introducing GNU Libtool::. `termios' `termio' `tty' The `termios' interface to terminals is standard on modern Unix systems. Avoid the older, non-portable, `termio' and `tty' interfaces (these interfaces are defined in `termio.h' and `sgtty.h', respectively). `threads' Many, but not all, Unix systems support multiple threads in a single process, but the interfaces differ. One thread interface, pthreads, was standardized in the 1996 edition of POSIX.1, so Unix systems are likely to converge on that interface over time. `utmp' `wtmp' Most Unix systems maintain the `utmp' and `wtmp' files to record information about which users are logged onto the system. However, the format of the information in the files varies across Unix systems, as does the exact location of the files and the functions which some systems provide to access the information. Programs which merely need to obtain login information will be more portable if they invoke a program such as `w'. Programs which need to update the login information must be prepared to handle a range of portability issues. `X Window System' Version 11 of the X Window System is widely available across Unix systems. The actual release number varies somewhat, as does the set of available programs and window managers. Extensions such as OpenGL are not available on all systems.  File: autobook.info, Node: Unix/Windows Portability, Prev: Cross-Unix Portability, Up: Writing Portable C 15.3 Unix/Windows Portability ============================= Unix and Windows are very different operating systems, with very different APIs and functionality. However, it is possible to write programs which run on both Unix and Windows, with significant extra work and some sacrifice in functionality. For more information on how GNU Autotools can help you write programs which run on both Unix and Windows, see *Note Integration with Cygnus Cygwin::. * Menu: * Unix/Windows Emulation:: * Unix/Windows Portable Scripting Language:: * Unix/Windows User Interface Library:: * Unix/Windows Specific Code:: * Unix/Windows Issues::  File: autobook.info, Node: Unix/Windows Emulation, Next: Unix/Windows Portable Scripting Language, Up: Unix/Windows Portability 15.3.1 Unix/Windows Emulation ----------------------------- The simplest way to write a program which runs on both Unix and Windows is to use an emulation layer. This generally results in a program which runs, but does not really feel like other programs for the operating system in question. For example, the Cygwin package, which is freely available from Cygnus Solutions(1), provides a Unix API which works on Windows. This permits Unix programs to be compiled to run on Windows. It is even possible to run an X server in the Cygwin environment, so graphical programs will work as well, although they will not have the Windows look and feel. The Cygwin package is discussed in more detail in *note Integration with Cygnus Cygwin::. There are also commercial packages available to compile Unix programs for Windows (e.g., Interix) and to compile Windows programs on Unix (e.g., Bristol Technology). The main disadvantage with using an emulation layer is that the resulting programs have the wrong look and feel. They do not behave as users expect, so they are awkward to use. This is generally not acceptable for high quality programs. ---------- Footnotes ---------- (1) `http://sourceware.cygnus.com/cygwin/'  File: autobook.info, Node: Unix/Windows Portable Scripting Language, Next: Unix/Windows User Interface Library, Prev: Unix/Windows Emulation, Up: Unix/Windows Portability 15.3.2 Unix/Windows Portable Scripting Language ----------------------------------------------- Another approach to Unix/Windows portability is to develop the program using a portable scripting language. An example of such a scripting language is Tcl/Tk(1). Programs written in Tcl/Tk will work on both Unix and Windows (and on the Apple Macintosh operating system as well, for that matter). Graphical programs will more or less follow the look and feel for the platform upon which they are run. Since Tcl/Tk was originally developed on Unix, graphical Tcl/Tk programs will typically not look quite right to experienced Windows users, but they will be usable and of reasonable quality. Other portable scripting languages are Perl, Python, and Guile. One disadvantage of this approach is that scripting languages tend to be less efficient than straight C code, but it is often possible to recode important routines in C. Another disadvantage is the need to learn a new language, one which furthermore may not be well designed for large programming projects. ---------- Footnotes ---------- (1) `http://www.scriptics.com/'  File: autobook.info, Node: Unix/Windows User Interface Library, Next: Unix/Windows Specific Code, Prev: Unix/Windows Portable Scripting Language, Up: Unix/Windows Portability 15.3.3 Unix/Windows User Interface Library ------------------------------------------ Some programs' main interaction with the operating system is drawing on the screen. It is often possible to write such programs using a cross platform user interface library. A cross-platform user interface library is a library providing basic windowing functions which has been implemented separately for Unix and Windows. The program calls generic routines which are translated into the appropriate calls on each platform. These libraries generally provide a good look and feel on each platform, so this can be a reasonable approach for programs which do not require additional services from the system. The main disadvantage is the least common denominator effect: the libraries often only provide functionality which is available on both Unix and Windows. Features specific to either Unix or Windows may be very useful for the program, but they may not be available via the library.  File: autobook.info, Node: Unix/Windows Specific Code, Next: Unix/Windows Issues, Prev: Unix/Windows User Interface Library, Up: Unix/Windows Portability 15.3.4 Unix/Windows Specific Code --------------------------------- When writing a program which should run on both Unix and Windows, it is possible to simply write different code for the two platforms. This requires a careful separation of the operating system interface, including the graphical user interface, from the rest of the program. An API must be designed to provide the system needs, and that API must be implemented separately on Unix and Windows. The API should be set at an appropriate level to avoid the least common denominator effect. This approach can be useful for a program which has significant platform independent computation as well as significant user interface or other system needs. It generally produces better results than the other approaches discussed above. The disadvantage is that this approach requires much more work that the others discussed above.  File: autobook.info, Node: Unix/Windows Issues, Prev: Unix/Windows Specific Code, Up: Unix/Windows Portability 15.3.5 Unix/Windows Issues -------------------------- Whatever approach is used to support the program on both Unix and Windows, there are certain issues which may affect the design of the program, or many specific areas of the program. * Menu: * Unix/Windows Text/Binary:: * Unix/Windows Filesystems:: * Unix/Windows Miscellaneous::  File: autobook.info, Node: Unix/Windows Text/Binary, Next: Unix/Windows Filesystems, Up: Unix/Windows Issues 15.3.5.1 Text and Binary Files .............................. Windows supports two different types of files: text files and binary files. On Unix, there is no such distinction. On Windows, any program which uses files must know whether each file is text or binary, and open and use them accordingly. In a text file on Windows, each line is terminated with a carriage return character followed by a line feed character. When the file is read by a C program in text mode, the C library converts each carriage return/line feed pair into a single line feed character. If the file is read in binary mode, the program will see both the carriage return and the line feed. You may have seen this distinction when transferring files between Unix and Window systems via FTP. You need to set the FTP program into binary or text mode as appropriate for the file you want to transfer. When transferring a binary file, the FTP program simply transfers the data unchanged. When transferring a text file, the FTP program must convert each carriage return/line feed pair into a single line feed. When using the C standard library, a binary file is indicated by adding `b' after the `r', `w', or `a' in the call to `fopen'. When reading a text file, the program can not simply count characters and use that when computing arguments to `fseek'.  File: autobook.info, Node: Unix/Windows Filesystems, Next: Unix/Windows Miscellaneous, Prev: Unix/Windows Text/Binary, Up: Unix/Windows Issues 15.3.5.2 File system Issues ........................... There are several differences between the file systems used on Unix and Windows, mainly in the areas of what names can be used for files. The program `doschk', which can be found in the gcc distribution, may be used on Unix to check for filenames which are not permitted on DOS or Windows. * Menu: * DOS Filename Restrictions:: * Windows File Name Case:: * Windows Whitespace in File Names:: * Windows Separators and Drive Letters::  File: autobook.info, Node: DOS Filename Restrictions, Next: Windows File Name Case, Up: Unix/Windows Filesystems 15.3.5.3 DOS Filename Restrictions .................................. The older DOS FAT file systems have severe limitations on file names. These limitations no longer apply to Windows, but they do apply to DOS based systems such as DJGPP. A file name may consist of no more than 8 characters, followed by an optional extension of no more than 3 characters. This is commonly referred to as an 8.3 file name. Filenames are case insensitive. There are a couple of filenames which are treated specially. You can not name a file `aux' or `prn'. In some cases, you can not even use an extension, such as `aux.c'. These restrictions apply to DOS and also to at least some versions of Windows.  File: autobook.info, Node: Windows File Name Case, Next: Windows Whitespace in File Names, Prev: DOS Filename Restrictions, Up: Unix/Windows Filesystems 15.3.5.4 Windows File Name Case ............................... Windows normally folds case when referring to files, unlike Unix. That is, on Windows, the file names `file', `File', and `FiLe' all refer to the same file. You must be aware of this when porting Unix programs to Windows, as the Unix programs may expect that using different case is reflected in the file system. For example, the procedure used to build the program `perl' from source relies on distinguishing between the files `PERL' and `perl'. This fails on Windows. As a matter of interest, the Windows file system stores files under the name with which they were created. The DOS shell displays the names in all upper case. The `Explorer' shell displays them with each word in the file name capitalized.  File: autobook.info, Node: Windows Whitespace in File Names, Next: Windows Separators and Drive Letters, Prev: Windows File Name Case, Up: Unix/Windows Filesystems 15.3.5.5 Whitespace in File Names ................................. Both Unix and Windows file systems permit whitespace in file names. However, Unix users rarely take advantage of this, while Windows users often do. For example, many Windows systems use a directory named `Program Files', whose name has an embedded space. This is a clash of conventions. Many programs developed on Unix unintentionally assume that there will be no spaces in file and directory names, and behave mysteriously if any are encountered. On Unix these bugs will almost never be seen. On Windows, they will pop up immediately. When writing a program which must run on Windows, consider these issues. Don't forget to test it on directories and files with embedded spaces.  File: autobook.info, Node: Windows Separators and Drive Letters, Prev: Windows Whitespace in File Names, Up: Unix/Windows Filesystems 15.3.5.6 Windows Separators and Drive Letters ............................................. On Unix, directories in a file name are separated by a forward slash (`/'). On Windows, directories are separated by a backward slash (`\'). For example, the Unix file `dir/file' on Windows would be `dir\file'.(1) On Unix, a list of directories is normally separated by a colon (`:'). On Windows, a list of directories is normally separated by a semicolon (`;'). For example, a simple Unix search path might look like this: `/bin:/usr/bin'. The same search path on Windows would probably look like this: `c:\bin;c:\usr\bin'. On Unix, the file system is a single tree rooted at the directory simply named `/'. On Windows, there are multiple file system trees. Absolute file names often start with a drive letter followed by a colon. Windows maintains a default drive, and a default directory on each drive, which can make it difficult for a program to convert a relative file name into the absolute file name intended by the user. Windows permits referring to files on other systems by using a file name which starts with two slashes followed by a system name. ---------- Footnotes ---------- (1) Windows does permit a program to use a forward slash to separate directories when calling routines such as `fopen'. However, Windows users do not expect to type forward slashes when they enter file names, and they do not expect to see forward slashes when a file name is printed.  File: autobook.info, Node: Unix/Windows Miscellaneous, Prev: Unix/Windows Filesystems, Up: Unix/Windows Issues 15.3.5.7 Miscellaneous Issues ............................. Windows shared libraries (DLLs) are different from typical Unix shared libraries. They require special declarations for global variables declared in a shared library. Programs which use shared libraries must generally use special macros in their header files to define these appropriately. GNU libtool can help with some shared library issues, but not all. There are some Unix system features which are not supported under Windows: pseudo terminals, effective user ID, file modes with user/group/other permission, named FIFOs, an executable overriding functions called by shared libraries, `select' on anything other than sockets. There are some Windows system features which are not supported under Unix: the Windows event loop, many graphical capabilities, some aspects of the rich set of interthread communication mechanisms, the `WSAAsyncSelect' function. You should keep these issues in mind when designing and writing a program which should run on both Unix and Windows.  File: autobook.info, Node: Writing Portable C++, Next: Dynamic Loading, Prev: Writing Portable C, Up: Top 16 Writing Portable C++ with GNU Autotools ****************************************** My first task in industry was to port a large C++ application from one Unix platform to another. My colleagues immediately offered their sympathies and I remember my initial reaction-`what's the big deal?'. After all, this application used the C++ standard library, a modest subset of common Unix system calls and C++ was approaching ISO standardization. Little did I know what lay ahead--endless hurdles imposed by differences to C++ implementations in use on those platforms. Being essentially a superset of the C programming language, C++ suffers from all of the machine-level portability issues described in *Note Writing Portable C::. In addition to this, variability in the language and standard libraries present additional trouble when writing portable C++ programs. There have been comprehensive guides written on C++ portability (*note Further Reading::). This chapter will attempt to draw attention to the less portable areas of the C++ language and describe how the GNU Autotools can help you overcome these (*note How GNU Autotools Can Help::). In many instances, the best approach to multi-platform C++ portability is to simply re-express your programs using more widely supported language constructs. Fortunately, this book has been written at a time when the C++ language standard has been ratified and C++ implementations are rapidly conforming. Gladly, as time goes on the necessity for this chapter will diminish. * Menu: * Brief History of C++:: * Changeable C++:: * Compiler Quirks:: * How GNU Autotools Can Help:: * Further Reading::  File: autobook.info, Node: Brief History of C++, Next: Changeable C++, Up: Writing Portable C++ 16.1 Brief History of C++ ========================= C++ was developed in 1983 by Bjarne Stroustrup at AT&T. Stroustrup was seeking a new object-oriented language with which to write simulations. C++ has now become a mainstream systems programming language and is increasingly being used to implement free software packages. C++ underwent a lengthy standardization process and was ratified as an ISO standard in 1998. The first specification of C++ was available in a book titled `The Annotated C++ Reference Manual' by Stroustrup and Ellis, also known as the `ARM'. Since this initial specification, C++ has developed in some areas. These developments will be discussed in *Note Changeable C++::. The first C++ compiler, known as "cfront", was produced by Stroustrup at AT&T. Because of its strong ties to C and because C is such a general purpose systems language, cfront consisted of a translator from C++ to C. After translation, an existing C compiler was used to compile the intermediate C code down to machine code for almost any machine you care to mention. C++ permits overloaded functions--that is, functions with the same name but different argument lists, so cfront implemented a _name mangling_ algorithm (*note Name Mangling::) to give each function a unique name in the linker's symbol table. In 1989, the first true C++ compiler, G++, was written by Michael Tiemann of Cygnus Support. G++ mostly consisted of a new front-end to the GCC portable compiler, so G++ was able to produce code for most of the targets that GCC already supported. In the years following, a number of new C++ compilers were produced. Unfortunately many were unable to keep pace with the development of the language being undertaken by the standards committee. This divergence of implementations is the fundamental cause of non-portable C++ programs.  File: autobook.info, Node: Changeable C++, Next: Compiler Quirks, Prev: Brief History of C++, Up: Writing Portable C++ 16.2 Changeable C++ =================== The C++ standard encompasses the language and the interface to the standard library, including the Standard Template Library (*note Standard Template Library::). The language has evolved somewhat since the ARM was published; mostly driven by the experience of early C++ users. In this section, the newer features of C++ will be briefly explained. Alternatives to these features, where available, will be presented when compiler support is lacking. The alternatives may be used if you need to make your code work with older C++ compilers or to avoid these features until the compilers you are concerned with are mature. If you are releasing a free software package to the wider community, you may need to specify a minimum level of standards conformance for the end-user's C++ compiler, or use the unappealing alternative of using lowest-common denominator C++ features. In covering these, we'll address the following language features: * Built-in `bool' type * Exceptions * Casts * Variable scoping in `for' loops * Namespaces * The `explicit' keyword * The `mutable' keyword * The `typename' keyword * Runtime Type Identification (RTTI) * Templates * Default template arguments * Standard library headers * Standard Template Library (STL) * Menu: * Built-in bool type:: * Exceptions:: * Casts:: * Variable Scoping in For Loops:: * Namespaces:: * The explicit Keyword:: * The mutable Keyword:: * The typename Keyword:: * Runtime Type Identification (RTTI):: * Templates:: * Default template arguments:: * Standard library headers:: * Standard Template Library::  File: autobook.info, Node: Built-in bool type, Next: Exceptions, Up: Changeable C++ 16.2.1 Built-in bool type ------------------------- C++ introduced a built-in boolean data type called `bool'. The presence of this new type makes it unnecessary to use an `int' with the values `0' and `1' and improves type safety. The two possible values of a `bool' are `true' and `false'-these are reserved words. The compiler knows how to coerce a `bool' into an `int' and vice-versa. If your compiler does not have the `bool' type and `false' and `true' keywords, an alternative is to produce such a type using a `typedef' of an enumeration representing the two possible values: enum boolvals { false, true }; typedef enum boolvals bool; What makes this simple alternative attractive is that it prevents having to adjust the prolific amount of code that might use `bool' objects once your compiler supports the built-in type.  File: autobook.info, Node: Exceptions, Next: Casts, Prev: Built-in bool type, Up: Changeable C++ 16.2.2 Exceptions ----------------- Exception handling is a language feature present in other modern programming languages. Ada and Java both have exception handling mechanisms. In essence, exception handling is a means of propagating a classified error by unwinding the procedure call stack until the error is caught by a higher procedure in the procedure call chain. A procedure indicates its willingness to handle a kind of error by _catching_ it: void foo (); void func () { try { foo (); } catch (...) { cerr << "foo failed!" << endl; } } Conversely, a procedure can throw an exception when something goes wrong: typedef int io_error; void init () { int fd; fd = open ("/etc/passwd", O_RDONLY); if (fd < 0) { throw io_error(errno); } } C++ compilers tend to implement exception handling in full, or not at all. If any C++ compiler you may be concerned with does not implement exception handling, you may wish to take the lowest common denominator approach and eliminate such code from your project.  File: autobook.info, Node: Casts, Next: Variable Scoping in For Loops, Prev: Exceptions, Up: Changeable C++ 16.2.3 Casts ------------ C++ introduced a collection of _named_ casting operators to replace the conventional C-style cast of the form `(type) expr'. The new casting operators are `static_cast', `reinterpret_cast', `dynamic_cast' and `const_cast'. They are reserved words. These refined casting operators are vastly preferred over conventional C casts for C++ programming. In fact, even Stroustrup recommends that the older style of C casts be banished from programming projects where at all possible `The C++ Programming Language, 3rd edition'. Reasons for preferring the new named casting operators include: - They provide the programmer with a mechanism for more explicitly specifying the kind of type conversion. This assists the compiler in identifying incorrect conversions. - They are easier to locate in source code, due to their unique syntax: `X_cast(expr)'. If your compiler does not support the new casting operators, you may have to continue to use C-style casts--and carefully! I have seen one project agree to use macros such as the one shown below to encourage those involved in the project to adopt the new operators. While the syntax does not match that of the genuine operators, these macros make it easy to later locate and alter the casts where they appear in source code. #define static_cast(T,e) (T) e  File: autobook.info, Node: Variable Scoping in For Loops, Next: Namespaces, Prev: Casts, Up: Changeable C++ 16.2.4 Variable Scoping in For Loops ------------------------------------ C++ has always permitted the declaration of a control variable in the initializer section of `for' loops: for (int i = 0; i < 100; i++) { ... } The original language specification allowed the control variable to remain live until the end of the scope of the loop itself: for (int i = 0; i < j; i++) { if (some condition) break; } if (i < j) // loop terminated early In a later specification of the language, the control variable's scope only exists within the body of the `for' loop. The simple resolution to this incompatible change is to not use the older style. If a control variable needs to be used outside of the loop body, then the variable should be defined before the loop: int i; for (i = 0; i < j; i++) { if (some condition) break; } if (i < j) // loop terminated early  File: autobook.info, Node: Namespaces, Next: The explicit Keyword, Prev: Variable Scoping in For Loops, Up: Changeable C++ 16.2.5 Namespaces ----------------- C++ namespaces are a facility for expressing a relationship between a set of related declarations such as a set of constants. Namespaces also assist in constraining _names_ so that they will not collide with other identical names in a program. Namespaces were introduced to the language in 1993 and some early compilers were known to have incorrectly implemented namespaces. Here's a small example of namespace usage: namespace Animals { class Bird { public: fly (); {} // fly, my fine feathered friend! }; }; // Instantiate a bird. Animals::Bird b; For compilers which do not correctly support namespaces it is possible to achieve a similar effect by placing related declarations into an enveloping structure. Note that this utilises the fact that C++ structure members have public protection by default: struct Animals { class Bird { public: fly (); {} // fly, my find feathered friend! }; protected // Prohibit construction. Animals (); }; // Instantiate a bird. Animals::Bird b;  File: autobook.info, Node: The explicit Keyword, Next: The mutable Keyword, Prev: Namespaces, Up: Changeable C++ 16.2.6 The `explicit' Keyword ----------------------------- C++ adopted a new `explicit' keyword to the language. This keyword is a qualifier used when declaring constructors. When a constructor is declared as `explicit', the compiler will never call that constructor implicitly as part of a type conversion. This allows the compiler to perform stricter type checking and to prevent simple programming errors. If your compiler does not support the `explicit' keyword, you should avoid it and do without the benefits that it provides.  File: autobook.info, Node: The mutable Keyword, Next: The typename Keyword, Prev: The explicit Keyword, Up: Changeable C++ 16.2.7 The `mutable' Keyword ---------------------------- C++ classes can be designed so that they behave correctly when `const' objects of those types are declared. Methods which do not alter internal object state can be qualified as `const': class String { public: String (const char* s); ~String (); size_t Length () const { return strlen (buffer); } private: char* buffer; }; This simple, though incomplete, class provides a `Length' method which guarantees, by virtue of its `const' qualifier, to never modify the object state. Thus, `const' objects of this class can be instantiated and the compiler will permit callers to use such objects' `Length' method. The `mutable' keyword enables classes to be implemented where the concept of constant objects is sensible, but details of the implementation make it difficult to declare essential methods as `const'. A common application of the `mutable' keyword is to implement classes that perform caching of internal object data. A method may not modify the logical state of the object, but it may need to update a cache-an implementation detail. The data members used to implement the cache storage need to be declared as `mutable' in order for `const' methods to alter them. Let's alter our rather farfetched `String' class so that it implements a primitive cache that avoids needing to call the `strlen' library function on each invocation of `Length ()': class String { public: String (const char* s) :length(-1) { /* copy string, etc. */ } ~String (); size_t Length () const { if (length < 0) length = strlen(buffer); return length; } private: char* buffer; mutable size_t length; } When the `mutable' keyword is not available, your alternatives are to avoid implementing classes that need to alter internal data, like our caching string class, or to use the `const_cast' casting operator (*note Casts::) to cast away the `constness' of the object.  File: autobook.info, Node: The typename Keyword, Next: Runtime Type Identification (RTTI), Prev: The mutable Keyword, Up: Changeable C++ 16.2.8 The `typename' Keyword ----------------------------- The `typename' keyword was added to C++ after the initial specification and is not recognized by all compilers. It is a hint to the compiler that a name following the keyword is the name of a type. In the usual case, the compiler has sufficient context to know that a symbol is a defined type, as it must have been encountered earlier in the compilation: class Foo { public: typedef int map_t; }; void func () { Foo::map_t m; } Here, `map_t' is a type defined in class `Foo'. However, if `func' happened to be a function template, the class which contains the `map_t' type may be a template parameter. In this case, the compiler simply needs to be guided by qualifying `T::map_t' as a _type name_: class Foo { public: typedef int map_t; }; template void func () { typename T::map_t t; }  File: autobook.info, Node: Runtime Type Identification (RTTI), Next: Templates, Prev: The typename Keyword, Up: Changeable C++ 16.2.9 Runtime Type Identification (RTTI) ----------------------------------------- Run-time Type Identification, or RTTI, is a mechanism for interrogating the type of an object at runtime. Such a mechanism is useful for avoiding the dreaded _switch-on-type_ technique used before RTTI was incorporated into the language. Until recently, some C++ compilers did not support RTTI, so it is necessary to assume that it may not be widely available. Switch-on-type involves giving all classes a method that returns a special type token that an object can use to discover its own type. For example: class Shape { public: enum types { TYPE_CIRCLE, TYPE_SQUARE }; virtual enum types type () = 0; }; class Circle: public Shape { public: enum types type () { return TYPE_CIRCLE; } }; class Square: public Shape { public: enum types type () { return TYPE_SQUARE; } }; Although switch-on-type is not elegant, RTTI isn't particularly object-oriented either. Given the limited number of times you ought to be using RTTI, the switch-on-type technique may be reasonable.  File: autobook.info, Node: Templates, Next: Default template arguments, Prev: Runtime Type Identification (RTTI), Up: Changeable C++ 16.2.10 Templates ----------------- Templates--known in other languages as _generic types_--permit you to write C++ classes which represent parameterized data types. A common application for _class templates_ is container classes. That is, classes which implement data structures that can contain data of any type. For instance, a well-implemented binary tree is not interested in the type of data in its nodes. Templates have undergone a number of changes since their initial inclusion in the ARM. They are a particularly troublesome C++ language element in that it is difficult to implement templates well in a C++ compiler. Here is a fictitious and overly simplistic C++ class template that implements a fixed-sized stack. It provides a pair of methods for setting (and getting) the element at the bottom of the stack. It uses the modern C++ template syntax, including the new `typename' keyword (*note The typename Keyword::). template class Stack { public: T first () { return stack[9]; } void set_first (T t) { stack[9] = t; } private: T stack[10]; }; C++ permits this class to be instantiated for any type you like, using calling code that looks something like this: int main () { Stack s; s.set_first (7); cout << s.first () << endl; return 0; } An old trick for fashioning class templates is to use the C preprocessor. Here is our limited `Stack' class, rewritten to avoid C++ templates: #define Stack(T) \ class Stack__##T##__LINE__ \ { \ public: \ T first () { return stack[0]; } \ void set_first (T t) { stack[0] = t; } \ \ private: \ T stack[10]; \ } There is a couple of subtleties being used here that should be highlighted. This generic class declaration uses the C preprocessor operator `##' to generate a type name which is unique amongst stacks of any type. The `__LINE__' macro is defined by the preprocessor and is used here to maintain unique names when the template is instantiated multiple times. The trailing semicolon that must follow a class declaration has been omitted from the macro. int main () { Stack (int) s; s.set_first (7); cout << s.first () << endl; return 0; } The syntax for instantiating a `Stack' is slightly different to modern C++, but it does work relatively well, since the C++ compiler still applies type checking after the preprocessor has expanded the macro. The main problem is that unless you go to great lengths, the generated type name (such as `Stack__int') could collide with other instances of the same type in the program.  File: autobook.info, Node: Default template arguments, Next: Standard library headers, Prev: Templates, Up: Changeable C++ 16.2.11 Default template arguments ---------------------------------- A later refinement to C++ templates was the concept of _default template arguments_. Templates allow C++ types to be _parameterized_ and as such, the parameter is in essence a variable that the programmer must specify when instantiating the template. This refinement allows defaults to be specified for the template parameters. This feature is used extensively throughout the Standard Template Library (*note Standard Template Library::) to relieve the programmer from having to specify a comparison function for sorted container classes. In most circumstances, the default less-than operator for the type in question is sufficient. If your compiler does not support default template arguments, you may have to suffer without them and require that users of your class and function templates provide the default parameters themselves. Depending on how inconvenient this is, you might begrudgingly seek some assistance from the C preprocessor and define some preprocessor macros.  File: autobook.info, Node: Standard library headers, Next: Standard Template Library, Prev: Default template arguments, Up: Changeable C++ 16.2.12 Standard library headers -------------------------------- Newer C++ implementations provide a new set of standard library header files. These are distinguished from older incompatible header files by their filenames--the new headers omit the conventional `.h' extension. Classes and other declarations in the new headers are placed in the `std' namespace. Detecting the kind of header files present on any given system is an ideal application of Autoconf. For instance, the header `' declares the class `std::vector'. However, if it is not available, `' declares the class `vector' in the global namespace.  File: autobook.info, Node: Standard Template Library, Prev: Standard library headers, Up: Changeable C++ 16.2.13 Standard Template Library --------------------------------- The Standard Template Library (STL) is a library of containers, iterators and algorithms. I tend to think of the STL in terms of the container classes it provides, with algorithms and iterators necessary to make these containers useful. By segregating these roles, the STL becomes a powerful library--containers can store any kind of data and algorithms can use iterators to traverse the containers. There are about half a dozen STL implementations. Since the STL relies so heavily on templates, these implementations tend to inline all of their method definitions. Thus, there are no precompiled STL libraries, and as an added bonus, you're guaranteed to get the source code to your STL implementation. Hewlett-Packard and SGI produce freely redistributable STL implementations. It is widely known that the STL can be implemented with complex C++ constructs and is a certain workout for any C++ compiler. The best policy for choosing an STL is to use a modern compiler such as GCC 2.95 or to use the STL that your vendor may have provided as part of their compiler. Unfortunately, using the STL is pretty much an `all or nothing' proposition. If it is not available on a particular system, there are no viable alternatives. There is a macro in the Autoconf macro archive (*note Autoconf macro archive::) that can test for a working STL.  File: autobook.info, Node: Compiler Quirks, Next: How GNU Autotools Can Help, Prev: Changeable C++, Up: Writing Portable C++ 16.3 Compiler Quirks ==================== C++ compilers are complex pieces of software. Sadly, sometimes the details of a compiler's implementations leak out and bother the application programmer. The two aspects of C++ compiler implementation that have caused grief in the past are efficient template instantiation and name mangling. Both of these aspects will be explained. * Menu: * Template Instantiation:: * Name Mangling::  File: autobook.info, Node: Template Instantiation, Next: Name Mangling, Up: Compiler Quirks 16.3.1 Template Instantiation ----------------------------- The problem with template instantiation exists because of a number of complex constraints: - The compiler should only generate an instance of a template once, to speed the compilation process. - The linker needs to be smart about where to locate the object code for instantiations produced by the compiler. This problem is exacerbated by separate compilation--that is, the method bodies for `List' may be located in a header file or in a separate compilation unit. These files may even be in a different directory than the current directory! Life is easy for the compiler when the template definition appears in the same compilation unit as the site of the instantiation--everything that is needed is known: template class List { private: T* head; T* current; }; List li; This becomes significantly more difficult when the site of a template instantiation and the template definition is split between two different compilation units. In `Linkers and Loaders', Levine describes in detail how the compiler driver deals with this by iteratively attempting to link a final executable and noting, from `undefined symbol' errors produced by the linker, which template instantiations must be performed to successfully link the program. In large projects where templates may be instantiated in multiple locations, the compiler may generate instantiations multiple times for the same type. Not only does this slow down compilation, but it can result in some difficult problems for linkers which refuse to link object files containing duplicate symbols. Suppose there is the following directory layout: src | `--- core | `--- core.cxx `--- modules | `--- http.cxx `--- lib `--- stack.h If the compiler generates `core.o' in the `core' directory and `libhttp.a' in the `http' directory, the final link may fail because `libhttp.a' and the final executable may contain duplicate symbols--those symbols generated as a result of both `http.cxx' and `core.cxx' instantiating, say, a `Stack'. Linkers, such as that provided with AIX will allow duplicate symbols during a link, but many will not. Some compilers have solved this problem by maintaining a template repository of template instantiations. Usually, the entire template definition is expanded with the specified type parameters and compiled into the repository, leaving the linker to collect the required object files at link time. The main concerns about non-portability with repositories center around getting your compiler to do the right thing about maintaining a single repository across your entire project. This often requires a vendor-specific command line option to the compiler, which can detract from portability. It is conceivable that Libtool could come to the rescue here in the future.  File: autobook.info, Node: Name Mangling, Prev: Template Instantiation, Up: Compiler Quirks 16.3.2 Name Mangling -------------------- Early C++ compilers mangled the names of C++ symbols so that existing linkers could be used without modification. The cfront C++ translator also mangled names so that information from the original C++ program would not be lost in the translation to C. Today, name mangling remains important for enabling overloaded function names and link-time type checking. Here is an example C++ source file which illustrates name mangling in action: class Foo { public: Foo (); void go (); void go (int where); private: int pos; }; Foo::Foo () { pos = 0; } void Foo::go () { go (0); } void Foo::go (int where) { pos = where; } int main () { Foo f; f.go (10); } $ g++ -Wall example.cxx -o example.o $ nm --defined-only example.o 00000000 T __3Foo 00000000 ? __FRAME_BEGIN__ 00000000 t gcc2_compiled. 0000000c T go__3Foo 0000002c T go__3Fooi 00000038 T main Even though `Foo' contains two methods with the same name, their argument lists (one taking an `int', one taking no arguments) help to differentiate them once their names are mangled. The `go__3Fooi' is the version which takes an `int' argument. The `__3Foo' symbol is the constructor for `Foo'. The GNU binutils package includes a utility called `c++filt' that can demangle names. Other proprietary tools sometimes include a similar utility, although with a bit of imagination, you can often demangle names in your head. $ nm --defined-only example.o | c++filt 00000000 T Foo::Foo(void) 00000000 ? __FRAME_BEGIN__ 00000000 t gcc2_compiled. 0000000c T Foo::go(void) 0000002c T Foo::go(int) 00000038 T main Name mangling algorithms differ between C++ implementations so that object files assembled by one tool chain may not be linked by another if there are legitimate reasons to prohibit linking. This is a deliberate move, as other aspects of the object file may make them incompatible--such as the calling convention used for making function calls. This implies that C++ libraries and packages cannot be practically distributed in binary form. Of course, you were intending to distribute the source code to your package anyway, weren't you?  File: autobook.info, Node: How GNU Autotools Can Help, Next: Further Reading, Prev: Compiler Quirks, Up: Writing Portable C++ 16.4 How GNU Autotools Can Help =============================== Each of the GNU Autotools contribute to C++ portability. Now that you are familiar with the issues, the following subsections will outline precisely how each tool contributes to achieving C++ portability. * Menu: * Testing C++ Implementations with Autoconf:: * Automake C++ support:: * Libtool C++ support::  File: autobook.info, Node: Testing C++ Implementations with Autoconf, Next: Automake C++ support, Up: How GNU Autotools Can Help 16.4.1 Testing C++ Implementations with Autoconf ------------------------------------------------ Of the GNU Autotools, perhaps the most valuable contribution to the portability of your C++ programs will come from Autoconf. All of the portability issues raised in *Note Changeable C++:: can be detected using Autoconf macros. Luc Maisonobe has written a large suite of macros for this purpose and they can be found in the Autoconf macro archive (*note Autoconf macro archive::). If any of these macros become important enough, they may become incorporated into the core Autoconf release. These macros perform their tests by compiling small fragments of C++ code to ensure that the compiler accepts them. As a side effect, these macros typically use `AC_DEFINE' to define preprocessor macros of the form `HAVE_feature', which may then be exploited through conditional compilation.  File: autobook.info, Node: Automake C++ support, Next: Libtool C++ support, Prev: Testing C++ Implementations with Autoconf, Up: How GNU Autotools Can Help 16.4.2 Automake C++ support --------------------------- Automake provides support for compiling C++ programs. In fact, it makes it practically trivial: files listed in a `SOURCES' primary may include `.c++', `.cc', `.cpp', `.cxx' or `.C' extensions and Automake will know to use the C++ compiler to build them. For a project containing C++ source code, it is necessary to invoke the `AC_PROG_CXX' macro in `configure.in' so that Automake knows how to run the most suitable compiler. Fortunately, when little details like this happen to escape you, `automake' will produce a warning: $ automake automake: Makefile.am: C++ source seen but CXX not defined in automake: Makefile.am: `configure.in'  File: autobook.info, Node: Libtool C++ support, Prev: Automake C++ support, Up: How GNU Autotools Can Help 16.4.3 Libtool C++ support -------------------------- At the moment, Libtool is the weak link in the chain when it comes to working with C++. It is very easy to naively build a shared library from C++ source using `libtool': $ libtool -mode=link g++ -o libfoo.la -rpath /usr/local/lib foo.c++ This works admirably for trivial examples, but with real code, there are several things that can go wrong: - On many architectures, for a variety of reasons, `libtool' needs to perform object linking using `ld'. Unfortunately, the C++ compiler often links in standard libraries at this stage, and using `ld' causes them to be dropped. This can be worked around (at the expense of portability) by explicitly adding these missing libraries to the link line in your `Makefile'. You could even write an Autoconf macro to probe the host machine to discover likely candidates. - The C++ compiler likes to instantiate static constructors in the library objects, which C++ programmers often rely on. Linking with `ld' will cause this to fail. The only reliable way to work around this currently is to not write C++ that relies on static constructors in libraries. You might be lucky enough to be able to link with `LD=$CXX' in your environment with some projects, but it would be prone to stop working as your project develops. - Libtool's inter-library dependency analysis can fail when it can't find the special runtime library dependencies added to a shared library by the C++ compiler at link time. The best way around this problem is to explicitly add these dependencies to `libtool''s link line: $ libtool -mode=link g++ -o libfoo.la -rpath /usr/local/lib foo.cxx \ -lstdc++ -lg++ Now that C++ compilers on Unix are beginning to see widespread acceptance and are converging on the ISO standard, it is becoming unacceptable for Libtool to impose such limits. There is work afoot to provide generalized multi-language and multi-compiler support into Libtool---currently slated to arrive in Libtool 1.5. Much of the work for supporting C++ is already finished at the time of writing, pending beta testing and packaging(1). ---------- Footnotes ---------- (1) Visit the Libtool home page at `http://www.gnu.org/software/libtool' for breaking news.  File: autobook.info, Node: Further Reading, Prev: How GNU Autotools Can Help, Up: Writing Portable C++ 16.5 Further Reading ==================== A number of books have been published which are devoted to the topic of C++ portability. Unfortunately, the problem with printed publications that discuss the state of C++ is that they date quickly. These publications may also fail to cover inadequacies of your particular compiler, since portability know-how is something that can only be acquired by collective experience. Instead, online guides such as the Mozilla C++ Portability Guide (1) tend to be a more useful resource. An online guide such as this can accumulate the knowledge of a wider developer community and can be readily updated as new facts are discovered. Interestingly, the Mozilla guide is aggressive in its recommendations for achieving true C++ portability: item 3, for instance, states `Don't use exceptions'. While you may not choose to follow each recommendation, there is certainly a lot of useful experience captured in this document. ---------- Footnotes ---------- (1) `http://www.mozilla.org/hacking/portable-cpp.html'  File: autobook.info, Node: Dynamic Loading, Next: Using GNU libltdl, Prev: Writing Portable C++, Up: Top 17 Dynamic Loading ****************** An increasingly popular way of adding functionality to a project is to give a program the ability to dynamically load plugins, or modules. By doing this your users can extend your project in new ways, which even you perhaps hadn't envisioned. "Dynamic Loading", then, is the process of loading compiled objects into a running program and executing some or all of the code from the loaded objects in the same context as the main executable. This chapter begins with a discussion of the mechanics of dynamic modules and how they are used, and ends with example code for very simple module loading on GNU/Linux, along with the example code for a complementary dynamically loadable module. Once you have read this chapter and understand the principles of dynamic loading, the next chapter will explain how to use GNU Autotools to write portable dynamic module loading code and address some of the shortcomings of native dynamic loading APIs. * Menu: * Dynamic Modules:: * Module Access Functions:: * Finding a Module:: * A Simple GNU/Linux Module Loader:: * A Simple GNU/Linux Dynamic Module::  File: autobook.info, Node: Dynamic Modules, Next: Module Access Functions, Up: Dynamic Loading 17.1 Dynamic Modules ==================== In order to dynamically load some code into your executable, that code must be compiled in some special but architecture dependent fashion. Depending on the compiler you use and the platform you are compiling for, there are different conventions you must observe in the code for the module, and for the particular combination of compiler options you need to select if the resulting objects are to be suitable for use in a dynamic module. For the rest of this chapter I will concentrate on the conventions used when compiling dynamic modules with GCC on GNU/Linux, which although peculiar to this particular combination of compiler and host architecture, are typical of the sorts of conventions you would need to observe on other architectures or with a different compiler. With GCC on GNU/Linux, you must compile each of the source files with `-fPIC'(1), the resulting objects must be linked into a loadable module with `gcc''s `-shared' option: $ gcc -fPIC -c foo.c $ gcc -fPIC -c bar.c $ gcc -shared -o baz.so foo.o bar.o This is pretty similar to how you might go about linking a shared library, except that the `baz.so' module will never be linked with a `-lbaz' option, so the `lib' prefix isn't necessary. In fact, it would probably be confusing if you used the prefix. Similarly, there is no constraint to use any particular filename suffix, but it is sensible to use the target's native shared library suffix (GNU/Linux uses `.so') to make it obvious that the compiled file is some sort of shared object, and not a normal executable. Apart from that, the only difference between a shared library built for linking at compile-time and a dynamic module built for loading at run-time is that the module must provide known "entry points" for the main executable to call. That is, when writing code destined for a dynamic module, you must provide functions or variables with known names and semantics that the main executable can use to access the functionality of the module. This _is_ different to the function and variable names in a regular library, which are already known when you write the client code, since the libraries are always written _before_ the code that uses them; a runtime module loading system must, by definition, be able to cope with modules that are written _after_ the code that uses those modules. ---------- Footnotes ---------- (1) Not essential but will be slower without this option, see *Note Position Independent Code::.  File: autobook.info, Node: Module Access Functions, Next: Finding a Module, Prev: Dynamic Modules, Up: Dynamic Loading 17.2 Module Access Functions ============================ In order to access the functionality of dynamic modules, different architectures provide various APIs to bring the code from the module into the address space of the loading program, and to access the symbols exported by that module. GNU/Linux uses the dynamic module API introduced by Sun's Solaris operating system, and widely adopted (and adapted!) by the majority of modern Unices(1). The interface consists of four functions. In practice, you really ought not to use these functions, since you would be locking your project into this single API, and the class of machines that supports it. This description is over-simplified to serve as a comparison with the fully portable libltdl API described in *Note Using GNU libltdl::. The minutiae are not discussed, because therein lie the implementation peculiarities that spoil the portability of this API. As they stand, these descriptions give a good overview of how the functions work at a high level, and are broadly applicable to the various implementations in use. If you are curious, the details of your machines particular dynamic loading API will be available in its system manual pages. -- Function: void * dlopen (const char *FILENAME, int FLAG) This function brings the code from a named module into the address space of the running program that calls it, and returns a handle which is used by the other API functions. If FILENAME is not an absolute path, GNU/Linux will search for it in directories named in the `LD_LIBRARY_PATH' environment variable, and then in the standard library directories before giving up. The flag argument is made by `OR'ing together various flag bits defined in the system headers. On GNU/Linux, these flags are defined in `dlfcn.h': `RTLD_LAZY' Resolve undefined symbols when they are first used. `RTLD_NOW' If all symbols cannot be resolved when the module is loaded, `dlopen' will fail and return `NULL'. `RTLD_GLOBAL' All of the global symbols in the loaded module will be available to resolve undefined symbols in subsequently loaded modules. -- Function: void * dlsym (void *HANDLE, char *NAME) Returns the address of the named symbol in the module which returned HANDLE when it was `dlopen'ed. You must cast the returned address to a known type before using it. -- Function: int dlclose (void *HANDLE) When you are finished with a particular module, it can be removed from memory using this function. -- Function: const char * dlerror (void) If any of the other three API calls fails, this function returns a string which describes the last error that occurred. In order to use these functions on GNU/Linux, you must `#include ' for the function prototypes, and link with `-ldl' to provide the API implementation. Other Unices use `-ldld' or provide the implementation of the API inside the standard C library. ---------- Footnotes ---------- (1) HP-UX being the most notable exception.  File: autobook.info, Node: Finding a Module, Next: A Simple GNU/Linux Module Loader, Prev: Module Access Functions, Up: Dynamic Loading 17.3 Finding a Module ===================== When you are writing a program that will load dynamic modules, a major stumbling block is writing the code to find the modules you wish to load. If you are worried about portability (which you must be, or you wouldn't be reading this book!), you can't rely on the default search algorithm of the vendor `dlopen' function, since it varies from implementation to implementation. You can't even rely on the name of the module, since the module suffix will vary according to the conventions of the target host (though you could insist on a particular suffix for modules you are willing to load). Unfortunately, this means that you will need to implement your own searching algorithm and always use an absolute pathname when you call `dlopen'. A widely adopted mechanism is to look for each module in directories listed in an environment variable specific to your application, allowing your users to inform the application of the location of any modules they have written. If a suitable module is not yet found, the application would then default to looking in a list of standard locations - say, in a subdirectory of the user's home directory, and finally a subdirectory of the application installation tree. For application `foo', you might use `/usr/lib/foo/module.so' - that is, `$(pkglibdir)/module.so' if you are using Automake. This algorithm can be further improved: * If you try different module suffixes to the named module for every directory in the search path, which will avoid locking your code into a subset of machines that use the otherwise hardcoded module suffix. With this in place you could ask the module loader for module `foomodule', and if it was not found in the first search directory, the module loader could try `foomodule.so', `foomodule.sl' and `foomodule.dll' before moving on to the next directory. * You might also provide command line options to your application which will preload modules before starting the program proper or to modify the module search path. For example, GNU M4, version 1.5, will have the following dynamic loading options: $ m4 --help Usage: m4 [OPTION]... [FILE]... ... Dynamic loading features: -M, --module-directory=DIRECTORY add DIRECTORY to the search path -m, --load-module=MODULE load dynamic MODULE from M4MODPATH ... Report bugs to .  File: autobook.info, Node: A Simple GNU/Linux Module Loader, Next: A Simple GNU/Linux Dynamic Module, Prev: Finding a Module, Up: Dynamic Loading 17.4 A Simple GNU/Linux Module Loader ===================================== Something to be aware of, is that when your users write dynamic modules for your application, they are subject to the interface you design. It is very important to design a dynamic module interface that is clean and functional before other people start to write modules for your code. If you ever need to change the interface, your users will need to rewrite their modules. Of course you can carefully change the interface to retain backwards compatibility to save your users the trouble of rewriting their modules, but that is no substitute for designing a good interface from the outset. If you do get it wrong, and subsequently discover that the design you implemented is misconceived (this is the voice of experience speaking!), you will be left with a difficult choice: try to tweak the broken API so that it does work while retaining backwards compatibility, and the maintenance and performance penalty that brings? Or start again with a fresh design born of the experience gained last time, and rewrite all of the modules you have so far? If there are other applications which have similar module requirements to you, it is worth writing a loader that uses the same interface and semantics. That way, you will (hopefully) be building from a known good API design, and you will have access to all the modules for that other application too, and vice versa. For the sake of clarity, I have sidestepped any issues of API design for the following example, by choosing this minimal interface: -- Function: int run (const char *ARGUMENT) When the module is successfully loaded a function with the following prototype is called with the argument given on the command line. If this entry point is found and called, but returns `-1', an error message is displayed by the calling program. Here's a simplistic but complete dynamic module loading application you can build for this interface with the GNU/Linux dynamic loading API: #include #include #ifndef EXIT_FAILURE # define EXIT_FAILURE 1 # define EXIT_SUCCESS 0 #endif #include #ifndef PATH_MAX # define PATH_MAX 255 #endif #include /* This is missing from very old Linux libc. */ #ifndef RTLD_NOW # define RTLD_NOW 2 #endif typedef int entrypoint (const char *argument); /* Save and return a copy of the dlerror() error message, since the next API call may overwrite the original. */ static char *dlerrordup (char *errormsg); int main (int argc, const char *argv[]) { const char modulepath[1+ PATH_MAX]; const char *errormsg = NULL; void *module = NULL; entrypoint *run = NULL; int errors = 0; if (argc != 3) { fprintf (stderr, "USAGE: main MODULENAME ARGUMENT\n"); exit (EXIT_FAILURE); } /* Set the module search path. */ getcwd (modulepath, PATH_MAX); strcat (modulepath, "/"); strcat (modulepath, argv[1]); /* Load the module. */ module = dlopen (modulepath, RTLD_NOW); if (!module) { strcat (modulepath, ".so"); module = dlopen (modulepath, RTLD_NOW); } if (!module) errors = 1; /* Find the entry point. */ if (!errors) { run = dlsym (module, "run"); /* In principle, run might legitimately be NULL, so I don't use run == NULL as an error indicator. */ errormsg = dlerrordup (errormsg); if (errormsg != NULL) errors = dlclose (module); } /* Call the entry point function. */ if (!errors) { int result = (*run) (argv[2]); if (result < 0) errormsg = strdup ("module entry point execution failed"); else printf ("\t=> %d\n", result); } /* Unload the module, now that we are done with it. */ if (!errors) errors = dlclose (module); if (errors) { /* Diagnose the encountered error. */ errormsg = dlerrordup (errormsg); if (!errormsg) { fprintf (stderr, "%s: dlerror() failed.\n", argv[0]); return EXIT_FAILURE; } } if (errormsg) { fprintf (stderr, "%s: %s.\n", argv[0], errormsg); free (errormsg); return EXIT_FAILURE; } return EXIT_SUCCESS; } /* Be careful to save a copy of the error message, since the next API call may overwrite the original. */ static char * dlerrordup (char *errormsg) { char *error = (char *) dlerror (); if (error && !errormsg) errormsg = strdup (error); return errormsg; } You would compile this on a GNU/Linux machine like so: $ gcc -o simple-loader simple-loader.c -ldl However, despite making reasonable effort with this loader, and ignoring features which could easily be added, it still has some seemingly insoluble problems: 1. It will fail if the user's platform doesn't have the `dlopen' API. This also includes platforms which have no shared libraries. 2. It relies on the implementation to provide a working self-opening mechanism. `dlopen (NULL, RTLD_NOW)' is very often unimplemented, or buggy, and without that, it is impossible to access the symbols of the main program through the `dlsym' mechanism. 3. It is quite difficult to figure out at compile time whether the target host needs `libdl.so' to be linked. I will use GNU Autotools to tackle these problems in the next chapter.  File: autobook.info, Node: A Simple GNU/Linux Dynamic Module, Prev: A Simple GNU/Linux Module Loader, Up: Dynamic Loading 17.5 A Simple GNU/Linux Dynamic Module ====================================== As an appetiser for working with dynamic loadable modules, here is a minimal module written for the interface used by the loader in the previous section: #include int run (const char *argument) { printf ("Hello, %s!\n", argument); return 0; } Again, to compile on a GNU/Linux machine: $ gcc -fPIC -c simple-module.c $ gcc -shared -o simple-module.so Having compiled both loader and module, a test run looks like this: $ ./simple-loader simple-module World Hello, World! => 0 If you have a GNU/Linux system, you should experiment with the simple examples from this chapter to get a feel for the relationship between a dynamic module loader and its modules - tweak the interface a little; try writing another simple module. If you have a machine with a different dynamic loading API, try porting these examples to that machine to get a feel for the kinds of problems you would encounter if you wanted a module system that would work with both APIs. The next chapter will do just that, and develop these examples into a fully portable module loading system with the aid of GNU Autotools. In *Note A Module Loading Subsystem::, I will add a more realistic module loader into the Sic project last discussed in *Note A Large GNU Autotools Project::.  File: autobook.info, Node: Using GNU libltdl, Next: Advanced GNU Automake Usage, Prev: Dynamic Loading, Up: Top 18 Using GNU libltdl ******************** Now that you are conversant with the mechanics and advantages of using dynamic run time modules in your projects, you can probably already imagine a hundred and one uses for a plugin architecture. As I described in the last chapter, there are several gratuitously different architecture dependent dynamic loading APIs, and yet several more shortcomings in many of those. If you have Libtool installed on your machine, then you almost certainly have libltdl which has shipped as part of the standard Libtool distribution since release 1.3.In this chapter I will describe "GNU libltdl", the *L*ib*T*ool *D*ynamic *L*oading *lib*rary, and explain some of its features and how to make use of them. * Menu: * Introducing libltdl:: * Using libltdl:: * Portable Library Design:: * dlpreopen Loading:: * User Module Loaders::  File: autobook.info, Node: Introducing libltdl, Next: Using libltdl, Up: Using GNU libltdl 18.1 Introducing libltdl ======================== Probably the best known and supported Unix run time linking API is the `dlopen' interface, used by Solaris and GNU/Linux amongst others, and discussed earlier in *Note Dynamic Loading::. libltdl is based on the `dlopen' API, with a few small differences and several enhancements. The following libltdl API functions are declared in `ltdl.h': -- Function: lt_dlhandle lt_dlopen (const char *FILENAME) This function brings the code from a named module into the address space of the running program that calls it, and returns a handle which is used by the other API functions. If FILENAME is not an absolute path, libltdl will search for it in directories named in the `LTDL_LIBRARY_PATH' environment variable, and then in the standard library directories before giving up. It is safe to call this function many times, libltdl will keep track of the number of calls made, but will require the same number of calls to `lt_dlclose' to actually unload the module. -- Function: lt_ptr_t lt_dlsym (lt_dlhandle HANDLE, const char *NAME) Returns the address of the named symbol in the module which returned HANDLE when it was `lt_dlopen'ed. You must cast the returned address to a known type before using it. -- Function: int lt_dlclose (lt_dlhandle HANDLE) When you are finished with a particular module, it can be removed from memory using this function. -- Function: const char * lt_dlerror (void) If any of the libltdl API calls fail, this function returns a string which describes the last error that occurred. In order to use these functions, you must `#include ' for the function prototypes, and link with `-lltdl' to provide the API implementation. Assuming you link your application with `libtool', and that you call the necessary macros from your `configure.in' (*note Using libltdl::), then any host specific dependent libraries (for example, `libdl' on GNU/Linux) will automatically be added to the final link line by `libtool'. You don't limit yourself to using only Libtool compiled modules when you use libltdl. By writing the module loader carefully, it will be able to load native modules too--although you will not be able to preload non-Libtool modules (*note dlpreopen Loading::. The loader in *Note Module Loader: libltdl Module Loader. is written in this way. It is useful to be able to load modules flexibly like this, because you don't tie your users into using Libtool for any modules they write. Compare the descriptions of the functions above with the API described in *Note Module Access Functions::. You will notice that they are very similar. Back-linking is the process of resolving any remaining symbols by referencing back into the application that loads the library at runtime - a mechanism implemented on almost all modern Unices. For instance, your main application may provide some utility function, `my_function', which you want a module to have access to. There are two ways to do that: * You could use Libtool to link your application, using the `-export-dynamic' option to ensure that the global application symbols are available to modules. When libltdl loads a module into an application compiled like this, it will "back-link" symbols from the application to resolve any otherwise undefined symbols in a module. When the module is `ltdlopen'ed, libltdl will arrange for calls to `my_function' in the module, to execute the `my_function' implementation in the application. If you have need of this functionality, relying on back-linking is the simplest way to achieve it. Unfortunately, this simplicity is at the expense of portability: some platforms have no support for back-linking at all, and others will not allow a module to be created with unresolved symbols. Never-the-less, libltdl allows you to do this if you want to. * You could split the code that implements the symbols you need to share with modules into a separate library. This library would then be used to resolve the symbols you wish to share, by linking it into modules and application alike. The definition of `my_function' would be compiled separately into a library, `libmy_function.la'. References to `my_function' from the application would be resolved by linking it with `libmy_function.la', and the library would be installed so that modules which need to call `my_function' would be able to resolve the symbol by linking with `-lmy_function'. This method requires support for neither back-linking nor unresolved link time symbols from the host platform. The disadvantage is that when you realise you need this functionality, it may be quite complicated to extract the shared functionality from the application to be compiled in a stand alone library. On those platforms which support "back-linking", libltdl can be configured to resolve external symbol references in a dynamic module with any global symbols already present in the main application. This has two implications for the libltdl API: * There is no need to pass `RTLD_GLOBAL' (or equivalent) to `lt_dlopen' as might be necessary with the native module loading API. * You should be aware that your application will not work on some platforms--most notably, Windows and AIX--if you rely on a back-linking. Similarly, there is no need to specify whether the module should be integrated into the application core before `lt_dlopen' returns, or else when the symbols it provides are first referenced. libltdl will use "lazy loading" if it is supported, since this is a slight performance enhancement, or else fall back to loading everything immediately. Between this feature and the support of back-linking, there is no need to pass flags into `lt_dlopen' as there is with most native `dlopen' APIs. There are a couple of other important API functions which you will need when using libltdl: -- Function: int lt_dlinit (void) You must call this function to initialise libltdl before calling any of the other libltdl API functions. It is safe to call this function many times, libltdl will keep track of the number of calls made, but will require the same number of calls to `lt_dlexit' to actually recycle the library resources. If you don't call `lt_dlinit' before any other API call, the other calls, including `lt_dlerror', will return their respective failure codes (`NULL' or `1', as appropriate). -- Function: int lt_dlexit (void) When you are done with libltdl and all dynamic modules have been unloaded you can call this function to finalise the library, and recycle its resources. If you forget to unload any modules, the call to `lt_dlexit' will `lt_dlclose' them for you. Another useful departure that the libltdl API makes from a vanilla `dlopen' implementation is that it also will work correctly with old K&R C compilers, by virtue of not relying on `void *' pointers. libltdl uses `lt_dlhandle's to pass references to loaded modules, and this also improves ANSI C compiler's type checking compared to the untyped addresses typically used by native `dlopen' APIs.  File: autobook.info, Node: Using libltdl, Next: Portable Library Design, Prev: Introducing libltdl, Up: Using GNU libltdl 18.2 Using libltdl ================== Various aspects of libltdl are addressed in the following subsections, starting with a step by step guide to adding libltdl to your own GNU Autotools projects (*note Configury: libltdl Configury.) and an explanation of how to initialise libltdl's memory management (*note Memory Management: libltdl Memory Management.). After this comes a simple libltdl module loader which you can use as the basis for a module loader in your own projects (*note Module Loader: libltdl Module Loader.), including an explanation of how libltdl finds and links any native dynamic module library necessary for the host platform. The next subsection (*note Dependent Libraries: libltdl Dependent Libraries.) deals with the similar problem of dynamic modules which depend on other libraries - take care not to confuse the problems discussed in the previous two subsections. Following that, the source code for and use of a simple dynamic module for use with this section's module loader is detailed (*note Dynamic Module: libltdl Dynamic Module.). * Menu: * libltdl Configury:: * libltdl Memory Management:: * libltdl Module Loader:: * libltdl Dependent Libraries:: * libltdl Dynamic Module::  File: autobook.info, Node: libltdl Configury, Next: libltdl Memory Management, Up: Using libltdl 18.2.1 Configury ---------------- Because libltdl supports so many different platforms(1) it needs to be configured for the host platform before it can be used. The path of least resistance to successfully integrating libltdl into your own project, dictates that the project use Libtool for linking its module loader with libltdl. This is certainly the method I use and recommend, and is the method discussed in this chapter. However, I have seen projects which did not use Libtool (specifically because Libtool's poor C++ support made it difficult to adopt), but which wanted the advantages of libltdl. It is possible to use libltdl entirely without Libtool, provided you take care to use the configuration macros described here, and use the results of those running these macros to determine how to link your application with libltdl. The easiest way to add libltdl support to your own projects is with the following simple steps: 1. You must add the libltdl sources to your project distribution. If you are not already using Libtool in some capacity for your project, you should add `AC_PROG_LIBTOOL'(2) to your `configure.in'. That done, move to the top level directory of the project, and execute: $ libtoolize --ltdl $ ls -F aclocal.m4 configure.in libltdl/ $ ls libltdl/ COPYING.LIB README aclocal.m4 configure.in stamp-h.in Makefile.am acconfig.h config.h.in ltdl.c Makefile.in acinclude.m4 configure ltdl.h 2. libltdl has its own configuration to run in addition to the configuration for your project, so you must be careful to call the subdirectory configuration from your top level `configure.in': AC_CONFIG_SUBDIRS(libltdl) And you must ensure that Automake knows that it must descend into the libltdl source directory at make time, by adding the name of that subdirectory to the `SUBDIRS' macro in your top level `Makefile.am': SUBDIRS = libltdl src 3. You must also arrange for the code of libltdl to be linked into your application. There are two ways to do this: as a regular Libtool library; or as a convenience library (*note Creating Convenience Libraries: Creating Convenience Libraries with libtool.). Either way there are catches to be aware of, which will be addressed in a future release. Until libltdl is present on the average user's machine, I recommend building a convenience library. You can do that in `configure.in': AC_LIBLTDL_CONVENIENCE AC_PROG_LIBTOOL The main thing to be aware of when you follow these steps, is that you can only have one copy of the code from libltdl in any application. Once you link the objects into a library, that library will not work with any other library which has also linked with libltdl, or any application which has its own copy of the objects. If you were to try, the libltdl symbol names would clash. The alternative is to substitute `AC_LIBLTDL_CONVENIENCE' with `AC_LIBLTDL_INSTALLABLE'. Unfortunately there are currently many potential problems with this approach. This macro will try to find an already installed libltdl and use that, or else the embedded libltdl will be built as a standard shared library, which must be installed along with any libraries or applications that use it. There is no testing for version compatibility, so it is possible that two or more applications that use this method will overwrite one anothers copies of the installed libraries and headers. Also, the code which searches for the already installed version of libltdl tends not to find the library on many hosts, due to the native libraries it depends on being difficult to predict. Both of the `AC_LIBLTDL_...' macros set the values of `INCLTDL' and `LIBLTDL' so that they can be used to add the correct include and library flags to the compiler in your Makefiles. They are not substituted by default. If you need to use them you must also add the following macros to your `configure.in': AC_SUBST(INCLTDL) AC_SUBST(LIBLTDL) 4. Many of the libltdl supported hosts require that a separate shared library be linked into any application that uses dynamic runtime loading. libltdl is wrapped around this native implementation on these hosts, so it is important to link that library too. Adding support for module loading through the wrapped native implementation is independent of Libtool's determination of how shared objects are compiled. On GNU/Linux, you would need to link your program with libltdl and `libdl', for example. Libtool installs a macro, `AC_LIBTOOL_DLOPEN', which adds tests to your `configure' that will search for this native library. Whenever you use libltdl you should add this macro to your `configure.in' before `AC_PROG_LIBTOOL': AC_LIBTOOL_DLOPEN AC_LIBLTDL_CONVENIENCE AC_PROG_LIBTOOL ... AC_SUBST(INCLTDL) AC_SUBST(LIBLTDL) `AC_LIBTOOL_DLOPEN' takes care to substitute a suitable value of `LIBADD_DL' into your `Makefile.am', so that your code will compile correctly wherever the implementation library is discovered: INCLUDES += @INCLTDL@ bin_PROGRAMS = your_app your_app_SOURCES = main.c support.c your_app_LDADD = @LIBLTDL@ @LIBADD_DL@ Libtool 1.4 has much improved inter-library dependency tracking code which no longer requires `@LIBADD_DL@' be explicitly referenced in your `Makefile.am'. When you install libltdl, Libtool 1.4 (or better) will make a note of any native library that libltdl depends on - linking it automatically, provided that you link `libltdl.la' with `libtool'. You might want to omit the `@LIBADD_DL@' from your `Makefile.am' in this case, if seeing the native library twice (once as a dependee of libltdl, and again as an expansion of `@LIBADD_DL@') on the link line bothers you. Beyond this basic configury setup, you will also want to write some code to form a module loading subsystem for your project, and of course some modules! That process is described in *Note Module Loader: libltdl Module Loader. and *Note Dynamic Module: libltdl Dynamic Module. respectively. ---------- Footnotes ---------- (1) As I always like to say, `from BeOS to Windows!'. And yes, I do think that it is a better catchphrase than `from AIX to Xenix'! (2) Use `AM_PROG_LIBTOOL' if you have `automake' version 1.4 or older or a version of `libtool' earlier than 1.4.  File: autobook.info, Node: libltdl Memory Management, Next: libltdl Module Loader, Prev: libltdl Configury, Up: Using libltdl 18.2.2 Memory Management ------------------------ Internally, libltdl maintains a list of loaded modules and symbols on the heap. If you find that you want to use it with a project that has an unusual memory management API, or if you simply want to use a debugging `malloc', libltdl provides hook functions for you to set the memory routines it should call. The way to use these hooks is to point them at the memory allocation routines you want libltdl to use before calling any of its API functions: lt_dlmalloc = (lt_prt_t (*) PARAMS((size_t))) mymalloc; lt_dlfree = (void (*) PARAMS((lt_ptr_t))) myfree; Notice that the function names need to be cast to the correct type before assigning them to the hook symbols. You need to do this because the prototypes of the functions you want libltdl to use will vary slightly from libltdls own function pointer types-- libltdl uses `lt_ptr_t' for compatibility with K&R compilers, for example.  File: autobook.info, Node: libltdl Module Loader, Next: libltdl Dependent Libraries, Prev: libltdl Memory Management, Up: Using libltdl 18.2.3 Module Loader -------------------- This section contains a fairly minimal libltdl based dynamic module loader that you can use as a base for your own code. It implements the same API as the simple module loader in *Note A Simple GNU/Linux Module Loader::, and because of the way libltdl is written is able to load modules written for that loader, too. The only part of this code which is arguably more complex than the equivalent from the previous example loader, is that `lt_dlinit' and `lt_dlexit' must be called in the appropriate places. In contrast, The module search path initialisation is much simplified thanks to another relative improvement in the libltdl API: -- Function: int lt_dlsetsearchpath (const char *PATH) This function takes a colon separated list of directories, which will be the first directories libltdl will search when trying to locate a dynamic module. Another new API function is used to actually load the module: -- Function: lt_dlhandle lt_dlopenext (const char *FILENAME) This function is used in precisely the same way as `lt_dlopen'. However, if the search for the named module by exact match against FILENAME fails, it will try again with a `.la' extension, and then the native shared library extension (`.sl' on HP-UX, for example). The advantage of using `lt_dlopenext' to load dynamic modules is that it will work equally well when loading modules not compiled with Libtool. Also, by passing the module name parameter with no extension, this function allows module coders to manage without Libtool. #include #include #ifndef EXIT_FAILURE # define EXIT_FAILURE 1 # define EXIT_SUCCESS 0 #endif #include #ifndef PATH_MAX # define PATH_MAX 255 #endif #include #include #ifndef MODULE_PATH_ENV # define MODULE_PATH_ENV "MODULE_PATH" #endif typedef int entrypoint (const char *argument); /* Save and return a copy of the dlerror() error message, since the next API call may overwrite the original. */ static char *dlerrordup (char *errormsg); int main (int argc, const char *argv[]) { char *errormsg = NULL; lt_dlhandle module = NULL; entrypoint *run = NULL; int errors = 0; if (argc != 3) { fprintf (stderr, "USAGE: main MODULENAME ARGUMENT\n"); exit (EXIT_FAILURE); } /* Initialise libltdl. */ errors = lt_dlinit (); /* Set the module search path. */ if (!errors) { const char *path = getenv (MODULE_PATH_ENV); if (path != NULL) errors = lt_dlsetsearchpath (path); } /* Load the module. */ if (!errors) module = lt_dlopenext (argv[1]); /* Find the entry point. */ if (module) { run = (entrypoint *) lt_dlsym (module, "run"); /* In principle, run might legitimately be NULL, so I don't use run == NULL as an error indicator in general. */ errormsg = dlerrordup (errormsg); if (errormsg != NULL) { errors = lt_dlclose (module); module = NULL; } } else errors = 1; /* Call the entry point function. */ if (!errors) { int result = (*run) (argv[2]); if (result < 0) errormsg = strdup ("module entry point execution failed"); else printf ("\t=> %d\n", result); } /* Unload the module, now that we are done with it. */ if (!errors) errors = lt_dlclose (module); if (errors) { /* Diagnose the encountered error. */ errormsg = dlerrordup (errormsg); if (!errormsg) { fprintf (stderr, "%s: dlerror() failed.\n", argv[0]); return EXIT_FAILURE; } } /* Finished with ltdl now. */ if (!errors) if (lt_dlexit () != 0) errormsg = dlerrordup (errormsg); if (errormsg) { fprintf (stderr, "%s: %s.\n", argv[0], errormsg); free (errormsg); exit (EXIT_FAILURE); } return EXIT_SUCCESS; } /* Be careful to save a copy of the error message, since the next API call may overwrite the original. */ static char * dlerrordup (char *errormsg) { char *error = (char *) lt_dlerror (); if (error && !errormsg) errormsg = strdup (error); return errormsg; } This file must be compiled with `libtool', so that the dependent libraries (`libdl.so' on my GNU/Linux machine) are handled correctly, and so that the dlpreopen support is compiled in correctly (*note dlpreopen Loading::): $ libtool --mode=link gcc -g -o ltdl-loader -dlopen self \ -rpath /tmp/lib ltdl-loader.c -lltdl gcc -g -o ltdl-loader -Wl,--rpath,/tmp/lib ltdl-loader.c -lltdl -ldl By using _both_ of `lt_dlopenext' and `lt_dlsetsearchpath', this module loader will make a valiant attempt at loading anything you pass to it - including the module I wrote for the simple GNU/Linux module loader earlier (*note A Simple GNU/Linux Dynamic Module::). Here, you can see the new `ltdl-loader' loading and using the `simple-module' module from *Note A Simple GNU/Linux Dynamic Module::: $ ltdl-loader simple-module World Hello, World! => 0  File: autobook.info, Node: libltdl Dependent Libraries, Next: libltdl Dynamic Module, Prev: libltdl Module Loader, Up: Using libltdl 18.2.4 Dependent Libraries -------------------------- On modern Unices(1), the shared library architecture is smart enough to encode all of the other libraries that a dynamic module depends on as part of the format of the file which is that module. On these architectures, when you `lt_dlopen' a module, if any shared libraries it depends on are not already loaded into the main application, the system runtime loader will ensure that they too are loaded so that all of the module's symbols are satisfied. Less well endowed systems(2), cannot do this by themselves. Since Libtool release 1.4, libltdl uses the record of inter-library dependencies in the libtool pseudo-library (*note Introducing GNU Libtool::) to manually load dependent libraries as part of the `lt_dlopen' call. An example of the sort of difficulties that can arise from trying to load a module that has a complex library dependency chain is typified by a problem I encountered with GNU Guile a few years ago: Earlier releases of the libXt Athena widget wrapper library for GNU Guile failed to load on my a.out based GNU/Linux system. When I tried to load the module into a running Guile interpreter, it couldn't resolve any of the symbols that referred to libXt. I soon discovered that the libraries that the module depended upon were not loaded by virtue of loading the module itself. I needed to build the interpreter itself with libXt and rely on back-linking to resolve the `Xt' references when I loaded the module. This pretty much defeated the whole point of having the wrapper library as a module. Had Libtool been around in those days, it would have been able to load libXt as part of the process of loading the module. If you program with the X window system, you will know that the list of libraries you need to link into your applications soon grows to be very large. Worse, if you want to load an X extension module into a non-X aware application, you will encounter the problems I found with Guile, unless you link your module with `libtool' and dynamically load it with libltdl. At the moment, the various X Window libraries are not built with libtool, so you must be sure to list all of the dependencies when you link a module. By doing this, Libtool can use the list to check that all of the libraries required by a module are loaded correctly as part of the call to `lt_dlopen', like this: $ libtool --mode=link gcc -o module.so -module -avoid-version \ source.c -L/usr/X11R6/lib -lXt -lX11 ... $ file .libs/module.so .libs/module.so: ELF 32-bit LSB shared object, Intel 80386, version 1, not stripped $ ldd .libs/module.so libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x4012f00) libXt.so.6 => /usr/X11R6/lib/libXt.so.6 (0x4014500) Or, if you are using Automake: ... lib_LTLIBRARIES = module.la module_la_SOURCES = source.c module_la_LDFLAGS = -module -avoid-version -L$(X11LIBDIR) module_la_LIBADD = -lXt -lX11 ... It is especially important to be aware of this if you develop on a modern platform which correctly handles these dependencies natively (as in the example above), since the code may still work on your machine even if you don't correctly note all of the dependencies. It will only break if someone tries to use it on a machine that needs Libtool's help for it to work, thus reducing the portability of your project. ---------- Footnotes ---------- (1) Architectures which use ELF and ECOFF binary format for example. (2) Those which use a.out binary format, for example.  File: autobook.info, Node: libltdl Dynamic Module, Prev: libltdl Dependent Libraries, Up: Using libltdl 18.2.5 Dynamic Module --------------------- Writing a module for use with the libltdl based dynamic module loader is no more involved than before: It must provide the correct entry points, as expected by the simple API I designed - the `run' entry point described in *Note A Simple GNU/Linux Module Loader::. Here is such a module, `ltdl-module.c': #include #include #define run ltdl_module_LTX_run int run (const char *argument) { char *end = NULL; long number; if (!argument || *argument == '\0') { fprintf (stderr, "error: invalid argument, \"%s\".\n", argument ? argument : "(null)"); return -1; } number = strtol (argument, &end, 0); if (end && *end != '\0') { fprintf (stderr, "warning: trailing garbage \"%s\".\n", end); } printf ("Square root of %s is %f\n", argument, sqrt (number)); return 0; } To take full advantage of the new module loader, the module itself *must* be compiled with Libtool. Otherwise dependent libraries will not have been stored when libltdl tries to load the module on an architecture that doesn't load them natively, or which doesn't have shared libraries at all (*note dlpreopen Loading::). $ libtool --mode=compile gcc -c ltdl-module.c rm -f .libs/ltdl-module.lo gcc -c ltdl-module.c -fPIC -DPIC -o .libs/ltdl-module.lo gcc -c ltdl-module.c -o ltdl-module.o >/dev/null 2>&1 mv -f .libs/ltdl-module.lo ltdl-module.lo $ libtool --mode=link gcc -g -o ltdl-module.la -rpath `pwd` \ -no-undefined -module -avoid-version ltdl-module.lo -lm rm -fr .libs/ltdl-module.la .libs/ltdl-module.* .libs/ltdl-module.* gcc -shared ltdl-module.lo -lm -lc -Wl,-soname \ -Wl,ltdl-module.so -o .libs/ltdl-module.so ar cru .libs/ltdl-module.a ltdl-module.o creating ltdl-module.la (cd .libs && rm -f ltdl-module.la && ln -s ../ltdl-module.la \ ltdl-module.la) You can see from the interaction below that `ltdl-loader' does not load the math library, `libm', and that the shared part of the Libtool module, `ltdl-module', does have a reference to it. The pseudo-library also has a note of the `libm' dependency so that libltdl will be able to load it even on architectures that can't do it natively: $ libtool --mode=execute ldd ltdl-loader libltdl.so.0 => /usr/lib/libltdl.so.0 (0x4001a000) libdl.so.2 => /lib/libdl.so.2 (0x4001f000) libc.so.6 => /lib/libc.so.6 (0x40023000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) $ ldd .libs/ltdl-module.so libm.so.6 => /lib/libm.so.6 (0x40008000) libc.so.6 => /lib/libc.so.6 (0x40025000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000) $ fgrep depend ltdl-module.la # Libraries that this one depends upon. dependency_libs=' -lm' This module is now ready to load from `ltdl-loader': $ ltdl-loader ltdl-module 9 Square root of 9 is 3.000000 => 0  File: autobook.info, Node: Portable Library Design, Next: dlpreopen Loading, Prev: Using libltdl, Up: Using GNU libltdl 18.3 Portable Library Design ============================ When partitioning the functionality of your project into libraries, and particularly loadable modules, it easy to inadvertently rely on modern shared library features such as _back-linking_ or _dependent library loading_. If you do accidentally use any of these features, you probably won't find out about it until someone first tries to use your project on an older or less featureful host. I have already used the `-module' and `-avoid-version' libtool linking options when compiling the libltdl module in the last section, the others are useful to know also. All of these are used with the `link' mode of `libtool' (`libtool --mode=link'): `-module' This option tells `libtool' that the target is a dynamically loadable module (as opposed to a conventional shared library) and as such need not have the `lib' prefix. `-avoid-version' When linking a dynamic module, this option can be used instead of the `-version-info' option, so that the module is not subject to the usual shared library version number suffixes. `-no-undefined' This is an extremely important option when you are aiming for maximum portability. It declares that all of the symbols required by the target are resolved at link time. Some shared library architectures do not allow undefined symbols by default (Tru64 Unix), and others do not allow them at all (AIX). By using this switch, and ensuring that all symbols really are resolved at link time, your libraries will work on even these platforms. *Note Creating Libtool Libraries with Automake::. `-export-dynamic' Almost the opposite of `-no-undefined', this option will compile the target so that the symbols it exports can be used to satisfy unresolved symbols in subsequently loaded modules. Not all shared library architectures support this feature, and many that do support it, do so by default regardless of whether this option is supplied. If you rely on this feature, then you should use this option, in the knowledge that you project will not work correctly on architectures that have no support for the feature. For maximum portability, you should neither rely on this feature nor use the `-export-dynamic' option - but, on the occasions you do need the feature, this option is necessary to ensure that the linker is called correctly. When you have the option to do so, I recommend that you design your project so that each of the libraries and modules is self contained, except for minimal number of dependent libraries, arranged in a directional graph shaped like a tree. That is, by relying on back-linking, or mutual or cyclic dependencies you reduce the portability of your project. In the diagrams below, an arrow indicates that the compilation object relies on symbols from the objects that it points to: main .---> main main | | | | .----+----, | .----+----, .----+----, v v | v v v v liba libb liba libb liba<-----libb | | | ^ v v v | libc libc libc-------' Tree: good Backlinking: bad Cyclic: bad  File: autobook.info, Node: dlpreopen Loading, Next: User Module Loaders, Prev: Portable Library Design, Up: Using GNU libltdl 18.4 dlpreopen Loading ====================== On machines which do not have any facility for shared libraries or dynamic modules, libltdl allows an application to `lt_dlopen' modules, provided that the modules are known at link time. This works by linking the code for the modules into the application in advance, and then looking up the addresses of the already loaded symbols when `lt_dlsym' is called. We call this mechanism "dlpreopening" - so named because the modules must be loaded at link time, not because the API to use modules loaded in this way is any different. This feature is extremely useful for debugging, allowing you to make a fully statically linked application from the executable and module objects, without changing any source code to work around the module loading calls. As far as the code outside the libltdl API can tell, these modules really are being loaded dynamically. Driving a symbolic debugger across module boundaries is however much easier when blocks of code aren't moving in and out of memory during execution. You may have wondered about the purpose of the following line in the dynamic module code in *Note Dependent Libraries: libltdl Dependent Libraries.: #define run ltdl_module_LTX_run The reason for redefining the entry point symbol in this way is to prevent a symbol clash when two or more modules that provide identically named entry point functions are preloaded into an executable. It would be otherwise impossible to preload both `simple-module.c' and `ltdl-module.c', for example, since each defines the symbol `run'. To allow us to write dynamic modules that are potentially preloaded, `lt_dlsym' will first try to lookup the address of a named symbol with a prefix consisting of the canonicalized name of the module being searched, followed by the characters `_LTX_'. The module name part of this prefix is canonicalized by replacing all non-alphanumeric characters with an underscore. If that fails, `lt_dlsym' resorts to the unadorned symbol name, which is how `run' was found in `simple-module.la' by `ltdl-loader' earlier. Supporting this feature in your module loading code is a simple matter of initialising the address lookup table, and `ltdl.h' defines a convenient macro to do exactly that: -- Macro: LTDL_SET_PRELOADED_SYMBOLS () Add this macro to the code of your module loading code, before the first call to a libltdl function, to ensure that the dlopen address lookup table is populated. Now change the contents of `ltdl-loader.c', and add a call to this macro, so that it looks like this: /* Initialise preloaded symbol lookup table. */ LTDL_SET_PRELOADED_SYMBOLS(); /* Initialise libltdl. */ errors = lt_dlinit (); Libtool will now be able to fall back to using preloaded static modules if you tell it to, or if the host platform doesn't support native dynamic loading. If you use `LTDL_SET_PRELOADED_SYMBOLS' in your module loader, you *must* also specify something to preload to avoid compilation failure due to undefined `lt_preloaded_symbols'. You can name modules on the Libtool link command line using one of `-dlopen' or `-dlpreopen'. This includes support for accessing the symbols of the main executable opened with `lt_dlopen(NULL)'--you can ask Libtool to fall back to preopening the main modules like this: $ libtool gcc -g -o ltdl-loader -dlopen self -rpath /tmp/lib \ ltdl-loader.c -lltdl rm -f .libs/ltdl-loader.nm .libs/ltdl-loader.nmS \ .libs/ltdl-loader.nmT creating .libs/ltdl-loaderS.c (cd .libs && gcc -c -fno-builtin -fno-rtti -fno-exceptions "ltdl-loaderS.c") rm -f .libs/ltdl-loaderS.c .libs/ltdl-loader.nm .libs/ltdl-loader.nmS .libs/ltdl-loader.nmT gcc -o ltdl-loader .libs/ltdl-loaderS.o ltdl-loader.c -Wl,--export-dynamic /usr/lib/libltdl.so -ldl -Wl,--rpath -Wl,/tmp/lib rm -f .libs/ltdl-loaderS.o It doesn't make sense to add preloaded module support to a project, when you have no modules to preopen, so the compilation failure in that case is actually a feature of sorts. The `LTDL_SET_PRELOADED_SYMBOLS' macro does not interfere with the normal operation of the code when modules are dynamically loaded, provided you use the `-dlopen' option on the link line. The advantage of referencing the macro by default is that you can recompile the application with or without preloaded module, and all without editing the sources. If you have no modules to link in by default, you can force Libtool to populate the preload symbol table by using the `-dlopen force' option. This is the option used to preload the symbols of the main executable so that you can subsequently call `lt_dlopen(NULL)'. Multiple modules can be preloaded, although at the time of writing only Libtool compiled modules can be used. If there is a demand, Libtool will be extended to include native library preloading in a future revision. To illustrate, I have recompiled the `simple-module.c' module with `libtool': $ libtool --mode=compile gcc -c simple-module.c rm -f .libs/simple-module.lo gcc -c simple-module.c -fPIC -DPIC -o .libs/simple-module.lo gcc -c simple-module.c -o simple-module.o >/dev/null 2>&1 mv -f .libs/simple-module.lo simple-module.lo $ libtool --mode=link gcc -g -o simple-module.la -rpath `pwd` -no-undefined -module -avoid-version simple-module.lo rm -fr .libs/simple-module.la .libs/simple-module.* .libs/simple-module.* gcc -shared simple-module.lo -lc -Wl,-soname \ -Wl,simple-module.so -o .libs/simple-module.so ar cru .libs/simple-module.a simple-module.o creating simple-module.la (cd .libs && rm -f simple-module.la && ln -s ../simple-module.la \ simple-module.la) The names of the modules that may be subsequently `lt_dlopen'ed are added to the application link line. I am using the `-static' option to force a static only link, which must use dlpreopened modules by definition. I am only specifying this because my host has native dynamic loading, and Libtool will use that unless I force a static only link, like this: $ libtool --mode=link gcc -static -g -o ltdl-loader ltdl-loader.c \ -lltdl -dlopen ltdl-module.la -dlopen simple-module.la rm -f .libs/ltdl-loader.nm .libs/ltdl-loader.nmS \ .libs/ltdl-loader.nmT creating .libs/ltdl-loaderS.c extracting global C symbols from ./.libs/ltdl-module.a extracting global C symbols from ./.libs/simple-module.a (cd .libs && gcc -c -fno-builtin -fno-rtti -fno-exceptions \ "ltdl-loaderS.c") rm -f .libs/ltdl-loaderS.c .libs/ltdl-loader.nm \ .libs/ltdl-loader.nmS .libs/ltdl-loader.nmT gcc -g -o ltdl-loader ltdl-loader.c .libs/ltdl-loaderS.o \ ./.libs/ltdl-module.a -lm ./.libs/simple-module.a \ /usr/lib/libltdl.a -ldl rm -f .libs/ltdl-loaderS.o $ ./ltdl-loader ltdl-module 345 Square root of 345 is 18.574176 => 0 $ ./ltdl-loader simple-module World Hello, World! => 0 Note that the current release of Libtool requires that the pseudo-library be present for any libltdl loaded module, even preloaded ones. Once again, if there is sufficient demand, this may be fixed in a future release. Until then, if the pseudo-library was deleted or cannot be found, this will happen: $ rm -f simple-module.la $ ./ltdl-loader simple-module World ./ltdl-loader: file not found. A side effect of using the `LTDL_SET_PRELOADED_SYMBOLS' macro is that if you subsequently link the application without Libtool, you will get an undefined symbol for the Libtool supplied `lt_preloaded_symbols'. If you need to link in this fashion, you will need to provide a stub that supplies the missing definition. Conversely, you must be careful not to link the stub file when you _do_ link with Libtool, because it will clash with the Libtool generated table it is supposed to replace: #include const lt_dlsymlist lt_preloaded_symbols[] = { { 0, 0 } }; Of course, if you use this stub, and link the application without the benefits of Libtool, you will not be able to use any preloaded modules - even if you statically link them, since there is no preloaded symbol lookup table in this case.  File: autobook.info, Node: User Module Loaders, Prev: dlpreopen Loading, Up: Using GNU libltdl 18.5 User Module Loaders ======================== While writing the module loading code for GNU M4 1.5, I found that libltdl did not provide a way for loading modules in exactly the way I required: As good as the preloading feature of libltdl may be, and as useful as it is for simplifying debugging, it doesn't have all the functionality of full dynamic module loading when the host platform is limited to static linking. After all, you can only ever load modules that were specified at link time, so for access to user supplied modules the whole application must be relinked to preload these new modules before `lt_dlopen' will be able to make use of the additional module code. In this situation, it would be useful to be able to automate this process. That is, if a libltdl using process is unable to `lt_dlopen' a module in any other fashion, but can find a suitable static archive in the module search path, it should relink itself along with the static archive (using `libtool' to preload the module), and then `exec' the new executable. Assuming all of this is successful, the attempt to `lt_dlopen' can be tried again - if the `suitable' static archive was chosen correctly it should now be possible to access the preloaded code. * Menu: * libltdl Loader Mechanism:: * libltdl Loader Management:: * libltdl Loader Errors::  File: autobook.info, Node: libltdl Loader Mechanism, Next: libltdl Loader Management, Up: User Module Loaders 18.5.1 Loader Mechanism ----------------------- Since Libtool 1.4, libltdl has provided a generalized method for loading modules, which can be extended by the user. libltdl has a default built in list of module loading mechanisms, some of which are peculiar to a given platform, others of which are more general. When the `libltdl' subdirectory of a project is configured, the list is narrowed to include only those _mechanisms_, or simply "loaders", which can work on the host architecture. When `lt_dlopen' is called, the loaders in this list are tried, in order, until the named module has loaded, or all of the loaders in the list have been exhausted. The entries in the final list of loaders each have a unique name, although there may be several candidate loaders for a single name before the list is narrowed. For example, the `dlopen' loader is implemented differently on BeOS and Solaris - for a single host, there can be only one implementation of any named loader. The name of a module loader is something entirely different to the name of a loaded module, something that should become clearer as you read on. In addition to the loaders supplied with libltdl, your project can add more loaders of its own. New loaders can be added to the end of the existing list, or immediately before any other particular loader, thus giving you complete control of the relative priorities of all of the active loaders in your project. In your module loading API, you might even support the dynamic loading of user supplied loaders: that is your users would be able to create dynamic modules which added more loading mechanisms to the existing list of loaders! Version 1.4 of Libtool has a default list that potentially contains an implementation of the following loaders (assuming all are supported by the host platform): `dlpreopen' If the named module was preloaded, use the preloaded symbol table for subsequent `lt_dlsym' calls. `dlopen' If the host machine has a native dynamic loader API use that to try and load the module. `dld' If the host machine has GNU dld(1), use that to try and load the module. Note that loader names with a `dl' prefix are reserved for future use by Libtool, so you should choose something else for your own module names to prevent a name clash with future Libtool releases. ---------- Footnotes ---------- (1) `http://www.gnu.org/software/dld'  File: autobook.info, Node: libltdl Loader Management, Next: libltdl Loader Errors, Prev: libltdl Loader Mechanism, Up: User Module Loaders 18.5.2 Loader Management ------------------------ The API supplies all of the functions you need to implement your own module loading mechanisms to solve problems just like this: -- Function: lt_dlloader_t * lt_dlloader_find (const char *LOADER_NAME) Each of the module loaders implemented by libltdl is stored according to a unique name, which can be used to lookup the associated handle. These handles operate in much the same way as `lt_dlhandle's: They are used for passing references to modules in and out of the API, except that they represent a kind of _module loading method_, as opposed to a loaded module instance. This function finds the `lt_dlloader_t' handle associated with the unique name passed as the only argument, or else returns `NULL' if there is no such module loader registered. -- Function: int lt_dlloader_add (lt_dlloader_t *PLACE, lt_user_dlloader *DLLOADER, const char *LOADER_NAME) This function is used to register your own module loading mechanisms with libltdl. If PLACE is given it must be a handle for an already registered module loader, which the new loader DLLOADER will be placed in front of for the purposes of which order to try loaders in. If PLACE is `NULL', on the other hand, the new DLLOADER will be added to the end of the list of loaders to try when loading a module instance. In either case LOADER_NAME must be a unique name for use with `lt_dlloader_find'. The DLLOADER argument must be a C structure of the following format, populated with suitable function pointers which determine the functionality of your module loader: struct lt_user_dlloader { const char *sym_prefix; lt_module_open_t *module_open; lt_module_close_t *module_close; lt_find_sym_t *find_sym; lt_dlloader_exit_t *dlloader_exit; lt_dlloader_data_t dlloader_data; }; -- Function: int lt_dlloader_remove (const char *LOADER_NAME) When there are no more loaded modules that were opened by the given module loader, the loader itself can be removed using this function. When you come to set the fields in the `lt_user_dlloader' structure, they must each be of the correct type, as described below: -- Type: const char * sym_prefix If a particular module loader relies on a prefix to each symbol being looked up (for example, the Windows module loader necessarily adds a `_' prefix to each symbol name passed to `lt_dlsym'), it should be recorded in the `sym_prefix' field. -- Type: lt_module_t lt_module_open_t (lt_dlloader_data_t LOADER_DATA, const char *MODULE_NAME) When `lt_dlopen' has reached your registered module loader when attempting to load a dynamic module, this is the type of the `module_open' function that will be called. The name of the module that libltdl is attempting to load, along with the module loader instance data associated with the loader being used currently, are passed as arguments to such a function call. The `lt_module_t' returned by functions of this type can be anything at all that can be recognised as unique to a successfully loaded module instance when passed back into the `module_close' or `find_sym' functions in the `lt_user_dlloader' module loader structure. -- Type: int lt_module_close_t (lt_dlloader_data_t LOADER_DATA, lt_module_t MODULE) In a similar vein, a function of this type will be called by `lt_dlclose', where MODULE is the returned value from the `module_open' function which loaded this dynamic module instance. -- Type: lt_ptr_t lt_find_sym_t (lt_dlloader_data_t LOADER_DATA, lt_module_t MODULE, const char *SYMBOL_NAME) In a similar vein once more, a function of this type will be called by `lt_dlsym', and must return the address of SYMBOL_NAME in MODULE. -- Type: int lt_dlloader_exit_t (lt_dlloader_data_t LOADER_DATA) When a user module loader is `lt_dlloader_remove'd, a function of this type will be called. That function is responsible for releasing any resources that were allocated during the initialisation of the loader, so that they are not `leaked' when the `lt_user_dlloader' structure is recycled. Note that there is no initialisation function type: the initialisation of a user module loader should be performed before the loader is registered with `lt_dlloader_add'. -- Type: lt_dlloader_data_t dlloader_data The DLLOADER_DATA is a spare field which can be used to store or pass any data specific to a particular module loader. That data will always be passed as the value of the first argument to each of the implementation functions above.  File: autobook.info, Node: libltdl Loader Errors, Prev: libltdl Loader Management, Up: User Module Loaders 18.5.3 Loader Errors -------------------- When writing the code to fill out each of the functions needed to populate the `lt_user_dlloader' structure, you will often need to raise an error of some sort. The set of standard errors which might be raised by the internal module loaders are available for use in your own loaders, and should be used where possible for the sake of uniformity if nothing else. On the odd occasion where that is not possible, libltdl has API calls to register and set your own error messages, so that users of your module loader will be able to call `lt_dlerror' and have the error message you set returned: -- Function: int lt_dlseterror (int ERRORCODE) By calling this function with one of the error codes enumerated in the header file, `ltdl.h', `lt_dlerror' will return the associated diagnostic until the error code is changed again. -- Function: int lt_dladderror (const char *DIAGNOSTIC) Often you will find that the existing error diagnostics do not describe the failure you have encountered. By using this function you can register a more suitable diagnostic with libltdl, and subsequently use the returned integer as an argument to `lt_dlseterror'. libltdl provides several other functions which you may find useful when writing a custom module loader. These are covered in the Libtool manual, along with more detailed descriptions of the functions described in the preceding paragraphs. In the next chapter, we will discuss the more complex features of Automake, before moving on to show you how to use those features and add libltdl module loading to the Sic project from *Note A Large GNU Autotools Project:: in the chapter after that.  File: autobook.info, Node: Advanced GNU Automake Usage, Next: A Complex GNU Autotools Project, Prev: Using GNU libltdl, Up: Top 19 Advanced GNU Automake Usage ****************************** This chapter covers a few seemingly unrelated Automake features which are commonly considered `advanced': conditionals, user-added language support, and automatic dependency tracking. * Menu: * Automake Conditionals:: * Language support:: * Automatic dependency tracking::  File: autobook.info, Node: Automake Conditionals, Next: Language support, Up: Advanced GNU Automake Usage 19.1 Conditionals ================= Automake conditionals are a way to omit or include different parts of the `Makefile' depending on what `configure' discovers. A conditional is introduced in `configure.in' using the `AM_CONDITIONAL' macro. This macro takes two arguments: the first is the name of the condition, and the second is a shell expression which returns true when the condition is true. For instance, here is how to make a condition named `TRUE' which is always true: AM_CONDITIONAL(TRUE, true) As another example, here is how to make a condition named `DEBUG' which is true when the user has given the `--enable-debug' option to `configure': AM_CONDITIONAL(DEBUG, test "$enable_debug" = yes) Once you've defined a condition in `configure.in', you can refer to it in your `Makefile.am' using the `if' statement. Here is a part of a sample `Makefile.am' that uses the conditions defined above: if TRUE ## This is always used. bin_PROGRAMS = foo endif if DEBUG AM_CFLAGS = -g -DDEBUG endif It's important to remember that Automake conditionals are _configure-time_ conditionals. They don't rely on any special feature of `make', and there is no way for the user to affect the conditionals from the `make' command line. Automake conditionals work by rewriting the `Makefile' - `make' is unaware that these conditionals even exist. Traditionally, Automake conditionals have been considered an advanced feature. However, practice has shown that they are often easier to use and understand than other approaches to solving the same problem. I now recommend the use of conditionals to everyone. For instance, consider this example: bin_PROGRAMS = echo if FULL_ECHO echo_SOURCES = echo.c extras.c getopt.c else echo_SOURCES = echo.c endif In this case, the equivalent code without conditionals is more confusing and correspondingly more difficult for the new Automake user to figure out: bin_PROGRAMS = echo echo_SOURCES = echo.c echo_LDADD = @echo_extras@ EXTRA_echo_SOURCES = extras.c getopt.c Automake conditionals have some limitations. One known problem is that conditionals don't interact properly with `+=' assignment. For instance, consider this code: bin_PROGRAMS = z z_SOURCES = z.c if SOME_CONDITION z_SOURCES += cond.c endif This code appears to have an unambiguous meaning, but Automake 1.4 doesn't implement this and will give an error. This bug will be fixed in the next major Automake release.  File: autobook.info, Node: Language support, Next: Automatic dependency tracking, Prev: Automake Conditionals, Up: Advanced GNU Automake Usage 19.2 Language support ===================== Automake comes with built-in knowledge of the most common compiled languages: C, C++, Objective C, Yacc, Lex, assembly, and Fortran. However, programs are sometimes written in an unusual language, or in a custom language that is translated into something more common. Automake lets you handle these cases in a natural way. Automake's notion of a `language' is tied to the suffix appended to each source file written in that language. You must inform Automake of each new suffix you introduce. This is done by listing them in the `SUFFIXES' macro. For instance, suppose you are writing part of your program in the language `M', which is compiled to object code by a program named `mc'. The typical suffix for an `M' source file is `.m'. In your `Makefile.am' you would write: SUFFIXES = .m This differs from ordinary `make' usage, where you would use the special `.SUFFIX' target to list suffixes. Now you need to tell Automake (and `make') how to compile a `.m' file to a `.o' file. You do this by writing an ordinary `make' suffix rule: MC = mc .m.o: $(MC) $(MCFLAGS) $(AM_MCFLAGS) -c $< Note that we introduced the `MC', `MCFLAGS', and `AM_MCFLAGS' variables. While not required, this is good style in case you want to override any of these later (for instance from the command line). Automake understands enough about suffix rules to recognize that `.m' files can be treated just like any file it already understands, so now you can write: bin_PROGRAMS = myprogram myprogram_SOURCES = foo.c something.m Note that Automake does not really understand chained suffix rules; however, frequently the right thing will happen anyway. For instance, if you have a `.m.c' rule, Automake will naively assume that `.m' files should be turned into `.o' files - and then it will proceed to rely on `make' to do the real work. In this example, if the translation takes three steps--from `.m' to `.x', then from `.x' to `.c', and finally to `.o'--then Automake's simplistic approach will break. Fortunately, these cases are very rare.  File: autobook.info, Node: Automatic dependency tracking, Prev: Language support, Up: Advanced GNU Automake Usage 19.3 Automatic dependency tracking ================================== Keeping track of dependencies for a large program is tedious and error-prone. Many edits require the programmer to update dependencies, but for some changes, such as adding a `#include' to an existing header, the change is large enough that he simply refuses (or does it incorrectly). To fix this problem, Automake supports automatic dependency tracking. The implementation of automatic dependency tracking in Automake 1.4 requires `gcc' and GNU `make'. These programs are only required for maintainers; the `Makefile's generated by `make dist' are completely portable. If you can't use `gcc' or GNU `make' for your project, then you are simply out of luck; you have to disable dependency tracking. Automake 1.5 will include a completely new dependency tracking implementation. This new implementation will work with any compiler and any version of `make'. Another limitation of the current scheme is that the dependencies included into the portable `Makefile's by `make dist' are derived from the current build environment. First, this means that you must use `make all' before you can meaningfully run `make dist' (otherwise the dependencies won't have been created). Second, this means that any files not built in your current tree will not have dependencies in the distributed `Makefile's. The new implementation will avoid both of these shortcomings as well. Automatic dependency tracking is on by default; you don't have to do anything special to get it. To turn it off, either run `automake -i' instead of plain `automake', or put `no-dependencies' into the `AUTOMAKE_OPTIONS' macro in each `Makefile.am'.  File: autobook.info, Node: A Complex GNU Autotools Project, Next: M4, Prev: Advanced GNU Automake Usage, Up: Top 20 A Complex GNU Autotools Project ********************************** This chapter polishes the worked example I introduced in *Note A Small GNU Autotools Project::, and developed in *Note A Large GNU Autotools Project::. As always, the ideas presented here are my own views and not necessarily the only way to do things. Everything I present here has, however, served me well for quite some time, and you should find plenty of interesting ideas for your own projects. Herein, I will add a libltdl module loading system to Sic, as well as some sample modules to illustrate how extensible such a project can be. I will also explain how to integrate the `dmalloc' library into the development of a project, and show why this is important. If you noticed that, as it stands, Sic is only useful as an interactive shell unable to read commands from a file, then go to the top of the class! In order for it to be of genuine use, I will extend it to interpret commands from a file too. * Menu: * A Module Loading Subsystem:: * A Loadable Module:: * Interpreting Commands from a File:: * Integrating Dmalloc::  File: autobook.info, Node: A Module Loading Subsystem, Next: A Loadable Module, Up: A Complex GNU Autotools Project 20.1 A Module Loading Subsystem =============================== As you saw in *Note Using GNU libltdl::, I need to put an invocation of the macro `AC_LIBTOOL_DLOPEN' just before `AC_PROG_LIBTOOL', in the file `configure.in'. But, as well as being able to use `libtoolize --ltdl', which adds libltdl in a subdirectory with its own subconfigure, you can also manually copy just the ltdl source files into your project(1), and use `AC_LIB_LTDL' in your existing `configure.in'. At the time of writing, this is still a very new and (as yet) undocumented feature, with a few kinks that need to be ironed out. In any case you probably shouldn't use this method to add `ltdl.lo' to a C++ library, since `ltdl.c' is written in C. If you do want to use libltdl with a C++ library, things will work much better if you build it in a subdirectory generated with `libtoolize --ltdl'. For this project, lets: $ cp /usr/share/libtool/libltdl/ltdl.[ch] sic/ The Sic module loader is probably as complicated as any you will ever need to write, since it must support two kinds of modules: modules which contain additional built-in commands for the interpreter; and modules which extend the Sic syntax table. A single module can also provide both syntax extensions _and_ additional built-in commands. * Menu: * Initialising the Module Loader:: * Managing Module Loader Errors:: * Loading a Module:: * Unloading a Module:: ---------- Footnotes ---------- (1) If you have an early 1.3c snapshot of Libtool, you will also need to copy the `ltdl.m4' file into your distribution.  File: autobook.info, Node: Initialising the Module Loader, Next: Managing Module Loader Errors, Up: A Module Loading Subsystem 20.1.1 Initialising the Module Loader ------------------------------------- Before using this code (or any other libltdl based module loader for that matter), a certain amount of initialisation is required: * libltdl itself requires initialisation. 1. libltdl should be told to use the same memory allocation routines used by the rest of Sic. 2. Any preloaded modules (*note dlpreopen Loading::) need to be initialised with `LTDL_SET_PRELOADED_SYMBOLS()'. 3. `ltdl_init()' must be called. * The module search path needs to be set. Here I allow the installer to specify a default search path to correspond with the installed Sic modules at compile time, but search the directories in the runtime environment variable `SIC_MODULES_PATH' first. * The internal error handling needs to be initialised. Here is the start of the module loader, `sic/module.c', including the initialisation code for libltdl: #if HAVE_CONFIG_H # include #endif #include "common.h" #include "builtin.h" #include "eval.h" #include "ltdl.h" #include "module.h" #include "sic.h" #ifndef SIC_MODULE_PATH_ENV # define SIC_MODULE_PATH_ENV "SIC_MODULE_PATH" #endif int module_init (void) { static int initialised = 0; int errors = 0; /* Only perform the initialisation once. */ if (!initialised) { /* ltdl should use the same mallocation as us. */ lt_dlmalloc = (lt_ptr_t (*) (size_t)) xmalloc; lt_dlfree = (void (*) (lt_ptr_t)) free; /* Make sure preloaded modules are initialised. */ LTDL_SET_PRELOADED_SYMBOLS(); last_error = NULL; /* Call ltdl initialisation function. */ errors = lt_dlinit(); /* Set up the module search directories. */ if (errors == 0) { const char *path = getenv (SIC_MODULE_PATH_ENV); if (path != NULL) errors = lt_dladdsearchdir(path); } if (errors == 0) errors = lt_dladdsearchdir(MODULE_PATH); if (errors != 0) last_error = lt_dlerror (); ++initialised; return errors ? SIC_ERROR : SIC_OKAY; } last_error = multi_init_error; return SIC_ERROR; }  File: autobook.info, Node: Managing Module Loader Errors, Next: Loading a Module, Prev: Initialising the Module Loader, Up: A Module Loading Subsystem 20.1.2 Managing Module Loader Errors ------------------------------------ The error handling is a very simplistic wrapper for the libltdl error functions, with the addition of a few extra errors specific to this module loader code(1). Here are the error messages from `module.c': static char multi_init_error[] = "module loader initialised more than once"; static char no_builtin_table_error[] = "module has no builtin or syntax table"; static char builtin_unload_error[] = "builtin table failed to unload"; static char syntax_unload_error[] = "syntax table failed to unload"; static char module_not_found_error[] = "no such module"; static char module_not_unloaded_error[] = "module not unloaded"; static const char *last_error = NULL; const char * module_error (void) { return last_error; } ---------- Footnotes ---------- (1) This is very different to the way errors are managed when writing a custom loader for libltdl. Compare this section with *Note Loader Errors: libltdl Loader Errors.  File: autobook.info, Node: Loading a Module, Next: Unloading a Module, Prev: Managing Module Loader Errors, Up: A Module Loading Subsystem 20.1.3 Loading a Module ----------------------- Individual modules are managed by finding specified "entry points" (prescribed exported symbols) in the module: -- Variable: const Builtin * builtin_table An array of names of built-in commands implemented by a module, with associated handler functions. -- Function: void module_init (Sic *SIC) If present, this function will be called when the module is loaded. -- Function: void module_finish (Sic *SIC) If supplied, this function will be called just before the module is unloaded. -- Variable: const Syntax * syntax_table An array of syntactically significant symbols, and associated handler functions. -- Function: int syntax_init (Sic *SIC) If specified, this function will be called by Sic before the syntax of each input line is analysed. -- Function: int syntax_finish (Sic *SIC, BufferIn *IN, BufferOut *OUT) Similarly, this function will be call after the syntax analysis of each line has completed. All of the hard work in locating and loading the module, and extracting addresses for the symbols described above is performed by libltdl. The `module_load' function below simply registers these symbols with the Sic interpreter so that they are called at the appropriate times - or diagnoses any errors if things don't go according to plan: int module_load (Sic *sic, const char *name) { lt_dlhandle module; Builtin *builtin_table; Syntax *syntax_table; int status = SIC_OKAY; last_error = NULL; module = lt_dlopenext (name); if (module) { builtin_table = (Builtin*) lt_dlsym (module, "builtin_table"); syntax_table = (Syntax *) lt_dlsym (module, "syntax_table"); if (!builtin_table && !syntax_table) { lt_dlclose (module); last_error = no_builtin_table_error; module = NULL; } } if (module) { ModuleInit *init_func = (ModuleInit *) lt_dlsym (module, "module_init"); if (init_func) (*init_func) (sic); } if (module) { SyntaxFinish *syntax_finish = (SyntaxFinish *) lt_dlsym (module, "syntax_finish"); SyntaxInit *syntax_init = (SyntaxInit *) lt_dlsym (module, "syntax_init"); if (syntax_finish) sic->syntax_finish = list_cons (list_new (syntax_finish), sic->syntax_finish); if (syntax_init) sic->syntax_init = list_cons (list_new (syntax_init), sic->syntax_init); } if (module) { if (builtin_table) status = builtin_install (sic, builtin_table); if (syntax_table && status == SIC_OKAY) status = syntax_install (sic, module, syntax_table); return status; } last_error = lt_dlerror(); if (!last_error) last_error = module_not_found_error; return SIC_ERROR; } Notice that the generalised `List' data type introduced earlier (*note A Small GNU Autotools Project::) is reused to keep a list of accumulated module initialisation and finalisation functions.  File: autobook.info, Node: Unloading a Module, Prev: Loading a Module, Up: A Module Loading Subsystem 20.1.4 Unloading a Module ------------------------- When unloading a module, several things must be done: * Any built-in commands implemented by this module must be unregistered so that Sic doesn't try to call them after the implementation has been removed. * Any syntax extensions implemented by this module must be similarly unregistered, including `syntax_init' and `syntax_finish' functions. * If there is a finalisation entry point in the module, `module_finish' (*note Loading a Module::), it must be called. My first cut implementation of a module subsystem kept a list of the entry points associated with each module so that they could be looked up and removed when the module was subsequently unloaded. It also kept track of multiply loaded modules so that a module wasn't unloaded prematurely. libltdl already does all of this though, and it is wasteful to duplicate all of that work. This system uses `lt_dlforeach' and `lt_dlgetinfo' to access libltdls records of loaded modules, and save on duplication. These two functions are described fully in*Note Libltdl interface: (Libtool)Libltdl interface. static int unload_ltmodule (lt_dlhandle module, lt_ptr_t data); struct unload_data { Sic *sic; const char *name; }; int module_unload (Sic *sic, const char *name) { struct unload_data data; last_error = NULL; data.sic = sic; data.name = name; /* Stopping might be an error, or we may have unloaded the module. */ if (lt_dlforeach (unload_ltmodule, (lt_ptr_t) &data) != 0) if (!last_error) return SIC_OKAY; if (!last_error) last_error = module_not_found_error; return SIC_ERROR; } This function asks libltdl to call the function `unload_ltmodule' for each of the modules it has loaded, along with some details of the module it wants to unload. The tricky part of the callback function below is recalculating the entry point addresses for the module to be unloaded and then removing all matching addresses from the appropriate internal structures. Otherwise, the balance of this callback is involved in informing the calling `lt_dlforeach' loop of whether a matching module has been found and handled: static int userdata_address_compare (List *elt, void *match); /* This callback returns 0 if the module was not yet found. If there is an error, LAST_ERROR will be set, otherwise the module was successfully unloaded. */ static int unload_ltmodule (lt_dlhandle module, void *data) { struct unload_data *unload = (struct unload_data *) data; const lt_dlinfo *module_info = lt_dlgetinfo (module); if ((unload == NULL) || (unload->name == NULL) || (module_info == NULL) || (module_info->name == NULL) || (strcmp (module_info->name, unload->name) != 0)) { /* No match, return 0 to keep searching */ return 0; } if (module) { /* Fetch the addresses of the entrypoints into the module. */ Builtin *builtin_table = (Builtin*) lt_dlsym (module, "builtin_table"); Syntax *syntax_table = (Syntax *) lt_dlsym (module, "syntax_table"); void *syntax_init_address = (void *) lt_dlsym (module, "syntax_init"); void **syntax_finish_address = (void *) lt_dlsym (module, "syntax_finish"); List *stale; /* Remove all references to these entry points in the internal data structures, before actually unloading the module. */ stale = list_remove (&unload->sic->syntax_init, syntax_init_address, userdata_address_compare); XFREE (stale); stale = list_remove (&unload->sic->syntax_finish, syntax_finish_address, userdata_address_compare); XFREE (stale); if (builtin_table && builtin_remove (unload->sic, builtin_table) != SIC_OKAY) { last_error = builtin_unload_error; module = NULL; } if (syntax_table && SIC_OKAY != syntax_remove (unload->sic, module, syntax_table)) { last_error = syntax_unload_error; module = NULL; } } if (module) { ModuleFinish *finish_func = (ModuleFinish *) lt_dlsym (module, "module_finish"); if (finish_func) (*finish_func) (unload->sic); } if (module) { if (lt_dlclose (module) != 0) module = NULL; } /* No errors? Stop the search! */ if (module) return 1; /* Find a suitable diagnostic. */ if (!last_error) last_error = lt_dlerror(); if (!last_error) last_error = module_not_unloaded_error; /* Error diagnosed. Stop the search! */ return -1; } static int userdata_address_compare (List *elt, void *match) { return (int) (elt->userdata - match); } The `userdata_address_compare' helper function at the end is used to compare the address of recalculated entry points against the already registered functions and handlers to find which items need to be unregistered. There is also a matching header file to export the module interface, so that the code for loadable modules can make use of it: #ifndef SIC_MODULE_H #define SIC_MODULE_H 1 #include #include #include BEGIN_C_DECLS typedef void ModuleInit (Sic *sic); typedef void ModuleFinish (Sic *sic); extern const char *module_error (void); extern int module_init (void); extern int module_load (Sic *sic, const char *name); extern int module_unload (Sic *sic, const char *name); END_C_DECLS #endif /* !SIC_MODULE_H */ This header also includes some of the other Sic headers, so that in most cases, the source code for a module need only `#include '. To make the module loading interface useful, I have added built-ins for `load' and `unload'. Naturally, these must be compiled into the bare `sic' executable, so that it is able to load additional modules: #if HAVE_CONFIG_H # include #endif #include "module.h" #include "sic_repl.h" /* List of built in functions. */ #define builtin_functions \ BUILTIN(exit, 0, 1) \ BUILTIN(load, 1, 1) \ BUILTIN(unload, 1, -1) BUILTIN_DECLARATION (load) { int status = SIC_ERROR; if (module_load (sic, argv[1]) < 0) { sic_result_clear (sic); sic_result_append (sic, "module \"", argv[1], "\" not loaded: ", module_error (), NULL); } else status = SIC_OKAY; return status; } BUILTIN_DECLARATION (unload) { int status = SIC_ERROR; int i; for (i = 1; argv[i]; ++i) if (module_unload (sic, argv[i]) != SIC_OKAY) { sic_result_clear (sic); sic_result_append (sic, "module \"", argv[1], "\" not unloaded: ", module_error (), NULL); } else status = SIC_OKAY; return status; } These new built-in commands are simply wrappers around the module loading code in `module.c'. As with `dlopen', you can use libltdl to `lt_dlopen' the main executable, and then lookup _its_ symbols. I have simplified the initialisation of Sic by replacing the `sic_init' function in `src/sic.c' by `loading' the executable itself as a module. This works because I was careful to use the same format in `sic_builtin.c' and `sic_syntax.c' as would be required for a genuine loadable module, like so: /* initialise the module subsystem */ if (module_init () != SIC_OKAY) sic_fatal ("module initialisation failed"); if (module_load (sic, NULL) != SIC_OKAY) sic_fatal ("sic initialisation failed");  File: autobook.info, Node: A Loadable Module, Next: Interpreting Commands from a File, Prev: A Module Loading Subsystem, Up: A Complex GNU Autotools Project 20.2 A Loadable Module ====================== A feature of the Sic interpreter is that it will use the `unknown' built-in to handle any command line which is not handled by any of the other registered built-in callback functions. This mechanism is very powerful, and allows me to lookup unhandled built-ins in the user's `PATH', for instance. Before adding any modules to the project, I have created a separate subdirectory, `modules', to put the module source code into. Not forgetting to list this new subdirectory in the `AC_OUTPUT' macro in `configure.in', and the `SUBDIRS' macro in the top level `Makefile.am', a new `Makefile.am' is needed to build the loadable modules: ## Makefile.am -- Process this file with automake to produce Makefile.in INCLUDES = -I$(top_builddir) -I$(top_srcdir) \ -I$(top_builddir)/sic -I$(top_srcdir)/sic \ -I$(top_builddir)/src -I$(top_srcdir)/src pkglib_LTLIBRARIES = unknown.la `pkglibdir' is a Sic specific directory where modules will be installed, *Note Installing and Uninstalling Configured Packages: Installing and Uninstalling. For a library to be maximally portable, it should be written so that it does not require back-linking(1) to resolve its own symbols. That is, if at all possible you should design all of your libraries (not just dynamic modules) so that all of their symbols can be resolved at linktime. Sometimes, it is impossible or undesirable to architect your libraries and modules in this way. In that case you sacrifice the portability of your project to platforms such as AIX and Windows. The key to building modules with libtool is in the options that are specified when the module is linked. This is doubly true when the module must work with libltdl's dlpreopening mechanism. unknown_la_SOURCES = unknown.c unknown_la_LDFLAGS = -no-undefined -module -avoid-version unknown_la_LIBADD = $(top_builddir)/sic/libsic.la Sic modules are built without a `lib' prefix (`-module'), and without version suffixes (`-avoid-version'). All of the undefined symbols are resolved at linktime by `libsic.la', hence `-no-undefined'. Having added `ltdl.c' to the `sic' subdirectory, and called the `AC_LIB_LTDL' macro in `configure.in', `libsic.la' cannot build correctly on those architectures which do not support back-linking. This is because `ltdl.c' simply abstracts the native `dlopen' API with a common interface, and that local interface often requires that a special library be linked - `-ldl' on linux, for example. `AC_LIB_LTDL' probes the system to determine the name of any such dlopen library, and allows you to depend on it in a portable way by using the configure substitution macro, `@LIBADD_DL@'. If I were linking a `libtool' compiled libltdl at this juncture, the system library details would have already been taken care of. In this project, I have bypassed that mechanism by compiling and linking `ltdl.c' myself, so I have altered `sic/Makefile.am' to use `@LIBADD_DL@': lib_LTLIBRARIES = libcommon.la libsic.la libsic_la_LIBADD = $(top_builddir)/replace/libreplace.la \ libcommon.la @LIBADD_DL@ libsic_la_SOURCES = builtin.c error.c eval.c list.c ltdl.c \ module.c sic.c syntax.c Having put all this infrastructure in place, the code for the `unknown' module is a breeze (helper functions omitted for brevity): #if HAVE_CONFIG_H # include #endif #include #include #include #define builtin_table unknown_LTX_builtin_table static char *path_find (const char *command); static int path_execute (Sic *sic, const char *path, char *const argv[]); /* Generate prototype. */ SIC_BUILTIN (builtin_unknown); Builtin builtin_table[] = { { "unknown", builtin_unknown, 0, -1 }, { 0, 0, -1, -1 } }; BUILTIN_DECLARATION(unknown) { char *path = path_find (argv[0]); int status = SIC_ERROR; if (!path) sic_result_append (sic, "command \"", argv[0], "\" not found", NULL); else if (path_execute (sic, path, argv) != SIC_OKAY) sic_result_append (sic, "command \"", argv[0],"\" failed: ", strerror (errno), NULL); else status = SIC_OKAY; return status; } In the first instance, notice that I have used the preprocessor to redefine the entry point functions to be compatible with libltdls `dlpreopen', hence the `unknown_LTX_builtin_table' `cpp' macro. The `unknown' handler function itself looks for a suitable executable in the user's path, and if something suitable _is_ found, executes it. Notice that Libtool doesn't relink dependent libraries (`libsic' depends on `libcommon', for example) on my GNU/Linux system, since they are not required for the static library in any case, and because the dependencies are also encoded directly into the shared archive, `libsic.so', by the original link. On the other hand, Libtool _will_ relink the dependent libraries if that is necessary for the target host. $ make /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I.. \ -I.. -I.. -I../sic -I../sic -I../src -I../src -g -O2 -c unknown.c mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I.. -I../sic -I../sic -I../src \ -I../src -g -O2 -Wp,-MD,.deps/unknown.pp -c unknown.c -fPIC -DPIC \ -o .libs/unknown.lo gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I.. -I../sic -I../sic -I../src \ I../src -g -O2 -Wp,-MD,.deps/unknown.pp -c unknown.c -o unknown.o \ >/dev/null 2>&1 mv -f .libs/unknown.lo unknown.lo /bin/sh ../libtool --mode=link gcc -g -O2 -o unknown.la -rpath \ /usr/local/lib/sic -no-undefined -module -avoid-version unknown.lo \ ../sic/libsic.la rm -fr .libs/unknown.la .libs/unknown.* .libs/unknown.* gcc -shared unknown.lo -L/tmp/sic/sic/.libs ../sic/.libs/libsic.so \ -lc -Wl,-soname -Wl,unknown.so -o .libs/unknown.so ar cru .libs/unknown.a unknown.o creating unknown.la (cd .libs && rm -f unknown.la && ln -s ../unknown.la unknown.la) $ ./libtool --mode=execute ldd ./unknown.la libsic.so.0 => /tmp/sic/.libs/libsic.so.0 (0x40002000) libc.so.6 => /lib/libc.so.6 (0x4000f000) libcommon.so.0 => /tmp/sic/.libs/libcommon.so.0 (0x400ec000) libdl.so.2 => /lib/libdl.so.2 (0x400ef000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000) After compiling the rest of the tree, I can now use the `unknown' module: $ SIC_MODULE_PATH=`cd ../modules; pwd` ./sic ] echo hello! command "echo" not found. ] load unknown ] echo hello! hello! ] unload unknown ] echo hello! command "echo" not found. ] exit $ ---------- Footnotes ---------- (1) *Note Introducing libltdl::  File: autobook.info, Node: Interpreting Commands from a File, Next: Integrating Dmalloc, Prev: A Loadable Module, Up: A Complex GNU Autotools Project 20.3 Interpreting Commands from a File ====================================== For all practical purposes, any interpreter is pretty useless if it only works interactively. I have added a `source' built-in command to `sic_builtin.c' which takes lines of input from a file and evaluates them using `sic_repl.c' in much the same way as lines typed at the prompt are evaluated otherwise. Here is the built-in handler: /* List of built in functions. */ #define builtin_functions \ BUILTIN(exit, 0, 1) \ BUILTIN(load, 1, 1) \ BUILTIN(source, 1, -1) \ BUILTIN(unload, 1, -1) BUILTIN_DECLARATION (source) { int status = SIC_OKAY; int i; for (i = 1; status == SIC_OKAY && argv[i]; ++i) status = source (sic, argv[i]); return status; } And the `source' function from `sic_repl.c': int source (Sic *sic, const char *path) { FILE *stream; int result = SIC_OKAY; int save_interactive = is_interactive; SIC_ASSERT (sic && path); is_interactive = 0; if ((stream = fopen (path, "rt")) == NULL) { sic_result_clear (sic); sic_result_append (sic, "cannot source \"", path, "\": ", strerror (errno), NULL); result = SIC_ERROR; } else result = evalstream (sic, stream); is_interactive = save_interactive; return result; } The reason for separating the `source' function in this way, is that it makes it easy for the startup sequence in `main' to evaluate a startup file. In traditional Unix fashion, the startup file is named `.sicrc', and is evaluated if it is present in the user's home directory: static int evalsicrc (Sic *sic) { int result = SIC_OKAY; char *home = getenv ("HOME"); char *sicrcpath, *separator = ""; int len; if (!home) home = ""; len = strlen (home); if (len && home[len -1] != '/') separator = "/"; len += strlen (separator) + strlen (SICRCFILE) + 1; sicrcpath = XMALLOC (char, len); sprintf (sicrcpath, "%s%s%s", home, separator, SICRCFILE); if (access (sicrcpath, R_OK) == 0) result = source (sic, sicrcpath); return result; }  File: autobook.info, Node: Integrating Dmalloc, Prev: Interpreting Commands from a File, Up: A Complex GNU Autotools Project 20.4 Integrating Dmalloc ======================== A huge number of bugs in C and C++ code are caused by mismanagement of memory. Using the wrapper functions described earlier (*note Memory Management::), or their equivalent, can help immensely in reducing the occurrence of such bugs. Ultimately, you will introduce a difficult-to-diagnose memory bug in spite of these measures. That is where Dmalloc(1) comes in. I recommend using it routinely in all of your projects -- you will find all sorts of leaks and bugs that might otherwise have lain dormant for some time. Automake has explicit support for Dmalloc to make using it in your own projects as painless as possible. The first step is to add the macro `AM_WITH_DMALLOC' to `configure.in'. Citing this macro adds a `--with-dmalloc' option to `configure', which, when specified by the user, adds `-ldmalloc' to `LIBS' and defines `WITH_DMALLOC'. The usefulness of Dmalloc is much increased by compiling an entire project with the header, `dmalloc.h' - easily achieved in Sic by conditionally adding it to `common-h.in': BEGIN_C_DECLS #define XCALLOC(type, num) \ ((type *) xcalloc ((num), sizeof(type))) #define XMALLOC(type, num) \ ((type *) xmalloc ((num) * sizeof(type))) #define XREALLOC(type, p, num) \ ((type *) xrealloc ((p), (num) * sizeof(type))) #define XFREE(stale) do { \ if (stale) { free ((void *) stale); stale = 0; } \ } while (0) extern void *xcalloc (size_t num, size_t size); extern void *xmalloc (size_t num); extern void *xrealloc (void *p, size_t num); extern char *xstrdup (const char *string); END_C_DECLS #if WITH_DMALLOC # include #endif I have been careful to include the `dmalloc.h' header from the end of this file so that it overrides my own _definitions_ without renaming the function _prototypes_. Similarly I must be careful to accommodate Dmalloc's redefinition of the mallocation routines in `sic/xmalloc.c' and `sic/xstrdup.c', by putting each file inside an `#ifndef WITH_DMALLOC'. That way, when compiling the project, if `--with-dmalloc' is specified and the `WITH_DMALLOC' preprocessor symbol is defined, then Dmalloc's debugging definitions of `xstrdup' et. al. will be used in place of the versions I wrote. Enabling Dmalloc is now simply a matter of reconfiguring the whole package using the `--with-dmalloc' option, and disabling it again is a matter of recofinguring without that option. The use of Dmalloc is beyond the scope of this book, and is in any case described very well in the documentation that comes with the package. I strongly recommend you become familiar with it - the time you invest here will pay dividends many times over in the time you save debugging. This chapter completes the description of the Sic library project, and indeed this part of the book. All of the infrastructure for building an advanced command line shell is in place now - you need only add the builtin and syntax function definitions to create a complete shell of your own. Each of the chapters in the next part of the book explores a more specialised application of the GNU Autotools, starting with a discussion of M4, a major part of the implementation of Autoconf. ---------- Footnotes ---------- (1) Dmalloc is distributed from `http://www.dmalloc.com'.  File: autobook.info, Node: M4, Next: Writing Portable Bourne Shell, Prev: A Complex GNU Autotools Project, Up: Top 21 M4 ***** M4 is a general purpose tool for processing text and has existed on Unix systems of all kinds for many years, rarely catching the attention of users. Text generation through macro processing is not a new concept. Originally M4 was designed as the preprocessor for the Rational FORTRAN system and was influenced by the General Purpose Macro generator, GPM, first described by Stratchey in 1965! GNU M4 is the GNU project's implementation of M4 and was written by Rene' Seindal in 1990. In recent years, awareness of M4 has grown through its use by popular free software packages. The Sendmail package incorporates a configuration system that uses M4 to generate its complex `sendmail.cf' file from a simple specification of the desired configuration. Autoconf uses M4 to generate output files such as a `configure' script. It is somewhat unfortunate that users of GNU Autotools need to know so much about M4, because it has been too exposed. Many of these tools' implementation details were simply left up to M4, forcing the user to know about M4 in order to use them. It is a well-known problem and there is a movement amongst the development community to improve this shortcoming in the future. This deficiency is the primary reason that this chapter exists--it is important to have a good working knowledge of M4 in order to use the GNU Autotools and to extend it with your own macros (*note Writing New Macros for Autoconf::). The GNU M4 manual provides a thorough tutorial on M4. Please refer to it for additional information. * Menu: * What does M4 do? :: * How GNU Autotools uses M4 :: * Fundamentals of M4 processing :: * Features of M4 :: * Writing macros within the GNU Autotools framework ::  File: autobook.info, Node: What does M4 do?, Next: How GNU Autotools uses M4, Up: M4 21.1 What does M4 do? ===================== `m4' is a general purpose tool suitable for all kinds of text processing applications--not unlike the C preprocessor, `cpp', with which you are probably familiar. Its obvious application is as a front-end for a compiler--`m4' is in many ways superior to `cpp'. Briefly, `m4' reads text from the input and writes processed text to the output. Symbolic macros may be defined which have replacement text. As macro invocations are encountered in the input, they are replaced (`expanded') with the macro's definition. Macros may be defined with a set of parameters and the definition can specify where the actual parameters will appear in the expansion. These concepts will be elaborated on in *Note Fundamentals of M4 processing::. M4 includes a set of pre-defined macros that make it substantially more useful. The most important ones will be discussed in *Note Features of M4::. These macros perform functions such as arithmetic, conditional expansion, string manipulation and running external shell commands.  File: autobook.info, Node: How GNU Autotools uses M4, Next: Fundamentals of M4 processing, Prev: What does M4 do?, Up: M4 21.2 How GNU Autotools uses M4 ============================== The GNU Autotools may all appear to use M4, but in actual fact, it all boils down to `autoconf' that invokes `m4' to generate your `configure' script. You might be surprised to learn that the shell code in `configure' does not use `m4' to generate a final `Makefile' from `Makefile.in'. Instead, it uses `sed', since that is more likely to be present on an end-user's system and thereby removes the dependency on `m4'. Automake and Libtool include a lot of M4 input files. These are macros provided with each package that you can use directly (or indirectly) from your `configure.in'. These packages don't invoke `m4' themselves. If you have already installed Autoconf on your system, you may have encountered problems due to its strict M4 requirements. Autoconf _demands_ to use GNU M4, mostly due to it exceeding limitations present in other M4 implementations. As noted by the Autoconf manual, this is not an onerous requirement, as it only affects package maintainers who must regenerate `configure' scripts. Autoconf's own `Makefile' will freeze some of the Autoconf `.m4' files containing macros as it builds Autoconf. When M4 freezes an input file, it produces another file which represents the internal state of the M4 processor so that the input file does not need to be parsed again. This helps to reduce the startup time for `autoconf'.  File: autobook.info, Node: Fundamentals of M4 processing, Next: Features of M4, Prev: How GNU Autotools uses M4, Up: M4 21.3 Fundamentals of M4 processing ================================== When properly understood, M4 seems like child's play. However, it is common to learn M4 in a piecemeal fashion and to have an incomplete or inaccurate understanding of certain concepts. Ultimately, this leads to hours of furious debugging. It is important to understand the fundamentals well before progressing to the details. * Menu: * Token scanning :: * Macros and macro expansion :: * Quoting ::  File: autobook.info, Node: Token scanning, Next: Macros and macro expansion, Up: Fundamentals of M4 processing 21.3.1 Token scanning --------------------- `m4' scans its input stream, generating (often, just copying) text to the output stream. The first step that `m4' performs in processing is to recognize _tokens_. There are three kinds of tokens: Names A name is a sequence of characters that starts with a letter or an underscore and may be followed by additional letters, characters and underscores. The end of a name is recognized by the occurrence a character which is not any of the permitted characters--for example, a period. A name is always a candidate for macro expansion (*Note Macros and macro expansion::), whereby the name will be replaced in the output by a macro definition of the same name. Quoted strings A sequence of characters may be _quoted_ (*Note Quoting::) with a starting quote at the beginning of the string and a terminating quote at the end. The default M4 quote characters are ``' and `'', however Autoconf reassigns them to `[' and `]', respectively. Suffice to say, M4 will remove the quote characters and pass the inner string to the output (*Note Quoting::). Other tokens All other tokens are those single characters which are not recognized as belonging to any of the other token types. They are passed through to the output unaltered. Like most programming languages, M4 allows you to write comments in the input which will be ignored. Comments are delimited by the `#' character and by the end of a line. Comments in M4 differ from most languages, though, in that the text within the comment, including delimiters, is passed through to the output unaltered. Although the comment delimiting characters can be reassigned by the user, this is highly discouraged, as it may break GNU Autotools macros which rely on this fact to pass Bourne shell comment lines-which share the same comment delimiters-through to the output unaffected.  File: autobook.info, Node: Macros and macro expansion, Next: Quoting, Prev: Token scanning, Up: Fundamentals of M4 processing 21.3.2 Macros and macro expansion --------------------------------- Macros are definitions of replacement text and are identified by a name--as defined by the syntax rules given in *Note Token scanning::. M4 maintains an internal table of macros, some of which are built-ins defined when `m4' starts. When a name is found in the input that matches a name registered in M4's macro table, the macro _invocation_ in the input is replaced by the macro's definition in the output. This process is known as _expansion_--even if the new text may be shorter! Many beginners to M4 confuse themselves the moment they start to use phrases like `I am going to call this particular macro, which returns this value'. As you will see, macros differ significantly from _functions_ in other programming languages, regardless of how similar their syntax may seem. You should instead use phrases like `If I invoke this macro, it will expand to this text'. Suppose M4 knows about a simple macro called `foo' that is defined to be `bar'. Given the following input, `m4' would produce the corresponding output: That is one big foo. =>That is one big bar. The period character at the end of this sentence is not permitted in macro names, thus `m4' knows when to stop scanning the `foo' token and consult the table of macro definitions for a macro named `foo'. Curiously, macros are defined to `m4' using the built-in macro `define'. The example shown above would be defined to `m4' with the following input: define(`foo', `bar') Since `define' is itself a macro, it too must have an expansion--by definition, it is the empty string, or _void_. Thus, `m4' will appear to consume macro invocations like these from the input. The ``' and `'' characters are M4's default quote characters and play an important role (*Note Quoting::). Additional built-in macros exist for managing macro definitions (*Note Macro management::). We've explored the simplest kind of macros that exist in M4. To make macros substantially more useful, M4 extends the concept to macros which accept a number of arguments (1). If a macro is given arguments, the macro may address its arguments using the special macro names `$1' through to `$n', where `n' is the maximum number of arguments that the macro cares to reference. When such a macro is invoked, the argument list must be delimited by commas and enclosed in parentheses. Any whitespace that precedes an argument is discarded, but trailing whitespace (for example, before the next comma) is preserved. Here is an example of a macro which expands to its third argument: define(`foo', `$3') That is one big foo(3, `0x', `beef'). =>That is one big beef. Arguments in M4 are simply text, so they have no type. If a macro which accepts arguments is invoked, `m4' will expand the macro regardless of how many arguments are provided. M4 will not produce errors due to conditions such as a mismatched number of arguments, or arguments with malformed values/types. It is the responsibility of the macro to validate the argument list and this is an important practice when writing GNU Autotools macros. Some common M4 idioms have developed for this purpose and are covered in *Note Conditionals::. A macro that expects arguments can still be invoked without arguments--the number of arguments seen by the macro will be zero: This is still one big foo. =>That is one big . A macro invoked with an empty argument list is not empty at all, but rather is considered to be a single empty string: This is one big empty foo(). =>That is one big . It is also important to understand how macros are expanded. It is here that you will see why an M4 macro is not the same as a function in any other programming language. The explanation you've been reading about macro expansion thus far is a little bit simplistic: macros are not exactly matched in the input and expanded in the output. In actual fact, the macro's expansion replaces the invocation in the input stream and it is _rescanned_ for further expansions until there are none remaining. Here is an illustrative example: define(`foobar', `FUBAR') define(`f', `foo') f()bar =>FUBAR If the token `a1' were to be found in the input, `m4' would replace it with `a2' in the input stream and rescan. This continues until no definition can be found for `a4', at which point the literal text `a4' will be sent to the output. This is _by far the biggest point of misunderstanding_ for new M4 users. The same principles apply for the collection of arguments to macros which accept arguments. Before a macro's actual arguments are handed to the macro, they are expanded until there are no more expansions left. Here is an example--using the built-in `define' macro (where the problems are no different) which highlights the consequences of this. Normally, `define' will redefine any existing macro: define(foo, bar) define(foo, baz) In this example, we expect `foo' to be defined to `bar' and then redefined to `baz'. Instead, we've defined a new macro `bar' that is defined to be `baz'! Why? The second `define' invocation has its arguments expanded prior to the expanding the `define' macro. At this stage, the name `foo' is expanded to its original definition, `bar'. In effect, we've stated: define(foo, bar) define(bar, baz) Sometimes this can be a very useful property, but mostly it serves to thoroughly confuse the GNU Autotools macro writer. The key is to know that `m4' will expand as much text as it can as early as possible in its processing. Expansion can be prevented by quoting (2) and is discussed in detail in the following section. ---------- Footnotes ---------- (1) GNU M4 permits an unlimited number of arguments, whereas other versions of M4 limit the number of addressable arguments to nine. (2) Which is precisely what the ``' and `'' characters in all of the examples in this section are.  File: autobook.info, Node: Quoting, Prev: Macros and macro expansion, Up: Fundamentals of M4 processing 21.3.3 Quoting -------------- It is been shown how `m4' expands macros when it encounters a name that matches a defined macro in the input. There are times, however, when you wish to defer expansion. Principally, there are three situations when this is so: Free-form text There may be free-form text that you wish to appear at the output-and as such, be unaltered by any macros that may be inadvertently invoked in the input. It is not always possible to know if some particular name is defined as a macro, so it should be quoted. Overcoming syntax rules Sometimes you may wish to form strings which would violate M4's syntax rules - for example, you might wish to use leading whitespace or a comma in a macro argument. The solution is to quote the entire string. Macro arguments This is the most common situation for quoting: when arguments to macros are to be taken literally and not expanded as the arguments are collected. In the previous section, an example was given that demonstrates the effects of not quoting the first argument to `define'. Quoting macro arguments is considered a good practice that you should emulate. Strings are quoted by surrounding the quoted text with the ``' and `'' characters. When `m4' encounters a quoted string-as a type of token (*Note Token scanning::)-the quoted string is expanded to the string itself, with the outermost quote characters removed. Here is an example of a string that is triple quoted: ```foo''' =>``foo'' A more concrete example uses quoting to demonstrate how to prevent unwanted expansion within macro definitions: define(`foo', ``bar'')dnl define(`bar', `zog')dnl foo =>bar When the macro `foo' is defined, `m4' strips off the outermost quotes and registers the definition ``bar''. The `dnl' text has a special purpose, too, which will be covered in *Note Discarding input::. As the macro `foo' is expanded, the next pair of quote characters are stripped off and the string is expanded to `bar'. Since the expansion of the quoted string is the string itself (minus the quote characters), we have prevented unwanted expansion from the string `bar' to `zog'. As mentioned in *Note Token scanning::, the default M4 quote characters are ``' and `''. Since these are two commonly used characters in Bourne shell programming (1), Autoconf reassigns these to the `[' and `]' characters-a symmetric looking pair of characters least likely to cause problems when writing GNU Autotools macros. From this point forward, we shall use `[' and `]' as the quote characters and you can forget about the default M4 quotes. Autoconf uses M4's built-in `changequote' macro to perform this reassignment and, in fact, this built-in is still available to you. In recent years, the common practice when needing to use the quote characters `[' or `]' or to quote a string with an legitimately imbalanced number of the quote characters has been to invoke `changequote' and temporarily reassign them around the affected area: dnl Uh-oh, we need to use the apostrophe! And even worse, we have two dnl opening quote marks and no closing quote marks. changequote(<<, >>)dnl perl -e 'print "$]\n";' changequote([, ])dnl This leads to a few potential problems, the least of which is that it's easy to reassign the quote characters and then forget to reset them, leading to total chaos! Moreover, it is possible to entirely disable M4's quoting mechanism by blindly changing the quote characters to a pair of empty strings. In hindsight, the overwhelming conclusion is that using `changequote' within the GNU Autotools framework is a bad idea. Instead, leave the quote characters assigned as `[' and `]' and use the special strings `@<:@' and `@:>@' anywhere you want real square brackets to appear in your output. This is an easy practice to adopt, because it's faster and less error prone than using `changequote': perl -e 'print "$@:>@\n";' This, and other guidelines for using M4 in the GNU Autotools framework are covered in detail in *Note Writing macros within the GNU Autotools framework::. ---------- Footnotes ---------- (1) The ``' is used in grave redirection and `'' for the shell's own quote character!  File: autobook.info, Node: Features of M4, Next: Writing macros within the GNU Autotools framework, Prev: Fundamentals of M4 processing, Up: M4 21.4 Features of M4 =================== M4 includes a number of pre-defined macros that make it a powerful preprocessor. We will take a tour of the most important features provided by these macros. Although some of these features are not very relevant to GNU Autotools users, Autoconf is implemented using most of them. For this reason, it is useful to understand the features to better understand Autoconf's behavior and for debugging your own `configure' scripts. * Menu: * Discarding input :: * Macro management :: * Conditionals :: * Looping :: * Diversions :: * Including files ::  File: autobook.info, Node: Discarding input, Next: Macro management, Up: Features of M4 21.4.1 Discarding input ----------------------- A macro called `dnl' discards text from the input. The `dnl' macro takes no arguments and expands to the empty string, but it has the side effect of discarding all input up to and including the next newline character. Here is an example of `dnl' from the Autoconf source code: # AC_LANG_POP # ----------- # Restore the previous language. define([AC_LANG_POP], [popdef([_AC_LANG])dnl ifelse(_AC_LANG, [_AC_LANG], [AC_FATAL([too many $0])])dnl AC_LANG(_AC_LANG)]) It is important to remember `dnl''s behavior: it discards the newline character, which can have unexpected effects on generated `configure' scripts! If you want a newline to appear in the output, you must add an extra blank line to compensate. `dnl' need not appear in the first column of a given line - it will begin discarding input at any point that it is invoked in the input file. However, be aware of the newline eating problem again! In the example of `AC_TRY_LINK_FUNC' above, note the deliberate use of `dnl' to remove surplus newline characters. In general, `dnl' makes sense for macro invocations that appear on a single line, where you would expect the whole line to simply vanish from the output. In the following subsections, `dnl' will be used to illustrate where it makes sense to use it.  File: autobook.info, Node: Macro management, Next: Conditionals, Prev: Discarding input, Up: Features of M4 21.4.2 Macro management ----------------------- A number of built-in macros exist in M4 to manage macros. We shall examine the most common ones that you're likely to encounter. There are others and you should consult the GNU M4 manual for further information. The most obvious one is `define', which defines a macro. It expands to the empty string: define([foo], [bar])dnl define([combine], [$1 and $2])dnl It is worth highlighting again the liberal use of quoting. We wish to define a pair of macros whose names are _literally_ `foo' and `combine'. If another macro had been previously defined with either of these names, `m4' would have expanded the macro immediately and passed the expansion of `foo' to `define', giving unexpected results. The `undefine' macro will remove a macro's definition from M4's macro table. It also expands to the empty string: undefine([foo])dnl undefine([combine])dnl Recall that once removed from the macro table, unmatched text will once more be passed through to the output. The `defn' macro expands to the definition of a macro, named by the single argument to `defn'. It is quoted, so that it can be used as the body of a new, renamed macro: define([newbie], defn([foo]))dnl undefine([foo])dnl The `ifdef' macro can be used to determine if a macro name has an existing definition. If it does exist, `ifdef' expands to the second argument, otherwise it expands to the third: ifdef([foo], [yes], [no])dnl Again, `yes' and `no' have been quoted to prevent expansion due to any pre-existing macros with those names. _Always_ consider this a real possibility! Finally, a word about built-in macros: these macros are all defined for you when `m4' is started. One common problem with these macros is that they are not in any kind of name space, so it's easier to accidentally invoke them or want to define a macro with an existing name. One solution is to use the `define' and `defn' combination shown above to rename all of the macros, one by one. This is how Autoconf makes the distinction clear.  File: autobook.info, Node: Conditionals, Next: Looping, Prev: Macro management, Up: Features of M4 21.4.3 Conditionals ------------------- Macros which can expand to different strings based on runtime tests are extremely useful-they are used extensively throughout macros in GNU Autotools and third party macros. The macro that we will examine closely is `ifelse'. This macro compares two strings and expands to a different string based on the result of the comparison. The first form of `ifelse' is akin to the `if'/`then'/`else' construct in other programming languages: ifelse(string1, string2, equal, not-equal) The other form is unusual to a beginner because it actually resembles a `case' statement from other programming languages: ifelse(string1, string2, equala, string3, string4, equalb, default) If `string1' and `string2' are equal, this macro expands to `equala'. If they are not equal, `m4' will shift the argument list three positions to the left and try again: ifelse(string3, string4, equalb, default) If `string3' and `string4' are equal, this macro expands to `equalb'. If they are not equal, it expands to `default'. The number of cases that may be in the argument list is unbounded. As it has been mentioned in *Note Macros and macro expansion::, macros that accept arguments may access their arguments through specially named macros like `$1'. If a macro has been defined, no checking of argument counts is performed before it is expanded and the macro may examine the number of arguments given through the `$#' macro. This has a useful result: you may invoke a macro with too few (or too many) arguments and the macro will still be expanded. In the example below, `$2' will expand to the empty string. define([foo], [$1 and $2])dnl foo([a]) =>a and This is useful because `m4' will expand the macro and give the macro the opportunity to test each argument for the empty string. In effect, we have the equivalent of default arguments from other programming languages. The macro can use `ifelse' to provide a default value if, say, `$2' is the empty string. You will notice in much of the documentation for existing Autoconf macros that arguments may be left blank to accept the default value. This is an important idiom that you should practice in your own macros. In this example, we wish to accept the default shell code fragment for the case where `/etc/passwd' is found in the build system's file system, but output `Big trouble!' if it is not. AC_CHECK_FILE([/etc/passwd], [], [echo "Big trouble!"])  File: autobook.info, Node: Looping, Next: Diversions, Prev: Conditionals, Up: Features of M4 21.4.4 Looping -------------- There is no support in M4 for doing traditional iterations (ie. `for-do' loops), however macros may invoke themselves. Thus, it is possible to iterate using recursion. The recursive definition can use conditionals (*Note Conditionals::) to terminate the loop at its completion by providing a trivial case. The GNU M4 manual provides some clever recursive definitions, including a definition for a `forloop' macro that emulates a `for-do' loop. It is conceivable that you might wish to use these M4 constructs when writing macros to generate large amounts of in-line shell code or arbitrarily nested `if; then; fi' statements.  File: autobook.info, Node: Diversions, Next: Including files, Prev: Looping, Up: Features of M4 21.4.5 Diversions ----------------- Diversions are a facility in M4 for diverting text from the input stream into a holding buffer. There is a large number of diversion buffers in GNU M4, limited only by available memory. Text can be diverted into any one of these buffers and then `undiverted' back to the output (diversion number 0) at a later stage. Text is diverted and undiverted using the `divert' and `undivert' macros. They expand to the empty string, with the side effect of setting the diversion. Here is an illustrative example: divert(1)dnl This goes at the end. divert(0)dnl This goes at the beginning. undivert(1)dnl =>This goes at the beginning. =>This goes at the end. It is unlikely that you will want to use diversions in your own macros, and it is difficult to do reliably without understanding the internals of Autoconf. However, it is interesting to note that this is how `autoconf' generates fragments of shell code on-the-fly that must precede shell code at the current point in the `configure' script.  File: autobook.info, Node: Including files, Prev: Diversions, Up: Features of M4 21.4.6 Including files ---------------------- M4 permits you to include files into the input stream using the `include' and `sinclude' macros. They simply expand to the contents of the named file. Of course, the expansion will be rescanned as the normal rules dictate (*Note Fundamentals of M4 processing::). The difference between `include' and `sinclude' is subtle: if the filename given as an argument to `include' is not present, an error will be raised. The `sinclude' macro will instead expand to the empty string--presumably the `s' stands for `silent'. Older GNU Autotools macros that tried to be modular would use the `include' and `sinclude' macros to import libraries of macros from other sources. While this is still a workable mechanism, there is an active effort within the GNU Autotools development community to improve the packaging system for macros. An `--install' option is being developed to improve the mechanism for importing macros from a library.  File: autobook.info, Node: Writing macros within the GNU Autotools framework, Prev: Features of M4, Up: M4 21.5 Writing macros within the GNU Autotools framework ====================================================== With a good grasp of M4 concepts, we may turn our attention to applying these principles to writing `configure.in' files and new `.m4' macro files. There are some differences between writing generic M4 input files and macros within the GNU Autotools framework and these will be covered in this section, along with some useful hints on working within the framework. This section ties in closely with *Note Writing New Macros for Autoconf::. Now that you are familiar with the capabilities of M4, you can forget about the names of the built-in M4 macros-they should be avoided in the GNU Autotools framework. Where appropriate, the framework provides a collection of macros that are laid on top of the M4 built-ins. For instance, the macros in the `AC_' family are just regular M4 macros that take a number of arguments and rely on an extensive library of `AC_' support macros. * Menu: * Syntactic conventions :: * Debugging with M4 ::  File: autobook.info, Node: Syntactic conventions, Next: Debugging with M4, Up: Writing macros within the GNU Autotools framework 21.5.1 Syntactic conventions ---------------------------- Some conventions have grown over the life of the GNU Autotools, mostly as a disciplined way of avoiding M4 pitfalls. These conventions are designed to make your macros more robust, your code easier to read and, most importantly, improve your chances for getting things to work the first time! A brief list of recommended conventions appears below: - Do not use the M4 built-in `changequote'. Any good macro will already perform sufficient quoting. - Never use the argument macros (e.g. `$1') within shell comments and dnl remarks. If such a comment were to be placed within a macro definition, M4 will expand the argument macros leading to strange results. Instead, quote the argument number to prevent unwanted expansion. For instance, you would use `$[1]' in the comment. - Quote the M4 comment character, `#'. This can appear often in shell code fragments and can have undesirable effects if M4 ignores any expansions in the text between the `#' and the next newline. - In general, macros invoked from `configure.in' should be placed one per line. Many of the GNU Autotools macros conclude their definitions with a `dnl' to prevent unwanted whitespace from accumulating in `configure'. - Many of the `AC_' macros, and others which emulate their good behavior, permit default values for unspecified arguments. It is considered good style to explicitly show your intention to use an empty argument by using a pair of quotes, such as `[]'. - Always quote the names of macros used within the definitions of other macros. - When writing new macros, generate a small `configure.in' that uses (and abuses!) the macro--particularly with respect to quoting. Generate a `configure' script with `autoconf' and inspect the results.  File: autobook.info, Node: Debugging with M4, Prev: Syntactic conventions, Up: Writing macros within the GNU Autotools framework 21.5.2 Debugging with M4 ------------------------ After writing a new macro or a `configure.in' template, the generated `configure' script may not contain what you expect. Frequently this is due to a problem in quoting (*note Quoting::), but the interactions between macros can be complex. When you consider that the arguments to GNU Autotools macros are often shell scripts, things can get rather hairy. A number of techniques exist for helping you to debug these kinds of problems. Expansion problems due to over-quoting and under-quoting can be difficult to pinpoint. Autoconf half-heartedly tries to detect this condition by scanning the generated `configure' script for any remaining invocations of the `AC_' and `AM_' families of macros. However, this only works for the `AC_' and `AM_' macros and not for third party macros. M4 provides a comprehensive facility for tracing expansions. This makes it possible to see how macro arguments are expanded and how a macro is finally expanded. Often, this can be half the battle in discovering if the macro definition or the invocation is at fault. Autoconf 2.15 will include this tracing mechanism. To trace the generation of `configure', Autoconf can be invoked like so: $ autoconf --trace=AC_PROG_CC Autoconf provides fine control over which macros are traced and the format of the trace output. You should refer to the Autoconf manual for further details. GNU `m4' also provides a debugging mode that can be helpful in discovering problems such as infinite recursion. This mode is activated with the `-d' option. In order to pass options to `m4', invoke Autoconf like so: $ M4='m4 -dV' autoconf Another situation that can arise is the presence of shell syntax errors in the generated `configure' script. These errors are usually obvious, as the shell will abort `configure' when the syntax error is encountered. The task of then locating the troublesome shell code in the input files can be potentially quite difficult. If the erroneous shell code appears in `configure.in', it should be easy to spot-presumably because you wrote it recently! If the code is imported from a third party macro, though, it may only be present because you invoked that macro. A trick to help locate these kinds of errors is to place some magic text (`__MAGIC__') throughout `configure.in': AC_INIT AC_PROG_CC __MAGIC__ MY_SUSPECT_MACRO __MAGIC__ AC_OUTPUT(Makefile) After `autoconf' has generated `configure', you can search through it for the magic text to determine the extremities of the suspect macro. If your erroneous code appears within the magic text markers, you've found the culprit! Don't be afraid to hack up `configure'. It can easily be regenerated. Finally, due to an error on your part, `m4' may generate a `configure' script that contains semantic errors. Something as simple as inverted logic may lead to a nonsense test result: checking for /etc/passwd... no Semantic errors of this kind are usually easy to solve once you can spot them. A fast and simple way of tracing the shell execution is to use the shell's `-x' and `-v' options to turn on its own tracing. This can be done by explicitly placing the required `set' commands into `configure.in': AC_INIT AC_PROG_CC set -x -v MY_BROKEN_MACRO set +x +v AC_OUTPUT(Makefile) This kind of tracing is invaluable in debugging shell code containing semantic errors.  File: autobook.info, Node: Writing Portable Bourne Shell, Next: Writing New Macros for Autoconf, Prev: M4, Up: Top 22 Writing Portable Bourne Shell ******************************** This chapter is a whistle stop tour of the accumulated wisdom of the free software community, with respect to best practices for portable shell scripting, as encoded in the sources for Autoconf and Libtool, as interpreted and filtered by me. It is by no means comprehensive - entire books have been devoted to the subject - though it is, I hope, authoritative. * Menu: * Why Use the Bourne Shell?:: * Sh Implementation:: * Environment:: * Utilities::  File: autobook.info, Node: Why Use the Bourne Shell?, Next: Sh Implementation, Up: Writing Portable Bourne Shell 22.1 Why Use the Bourne Shell? ============================== Unix has been around for more than thirty years and has splintered into hundreds of small and not so small variants, *Note The Diversity of Unix Systems: Unix Diversity. Much of the subject matter of this book is concerned with how best to approach writing programs which will work on as many of these variants as possible. One of the few programming tools that is absolutely guaranteed to be present on every flavour of Unix in use today is Steve Bourne's original shell, `sh' - the Bourne Shell. That is why Libtool is written as a Bourne Shell script, and why the `configure' files generated by Autoconf are Bourne Shell scripts: they can be executed on all known Unix flavours, and as a bonus on most POSIX based non-Unix operating systems too. However, there are complications. Over the years, OS vendors have improved Steve Bourne's original shell or have reimplemented it in an almost, but not quite, compatible way. There also a great number of Bourne compatible shells which are often used as a system's default `/bin/sh': `ash', `bash', `bsh', `ksh', `sh5' and `zsh' are some that you may come across. For the rest of this chapter, when I say `shell', I mean a Bourne compatible shell. This leads us to the black art known as "portable shell programming", the art of writing a single script which will run correctly through all of these varying implementations of `/bin/sh'. Of course, Unix systems are constantly evolving and new variations are being introduced all the time (and very old systems which have fallen into disuse can perhaps be ignored by the pragmatic). The amount of system knowledge required to write a truly portable shell script is vast, and a great deal of the information that sets a precedent for a given idiom is necessarily second or third (or tenth) hand. Practically, this means that some of the knowledge accumulated in popular portable shell scripts is very probably folklore - but that doesn't really matter too much, the important thing is that if you adhere to these idioms, you shouldn't have any problems from people who can't run your program on their system.  File: autobook.info, Node: Sh Implementation, Next: Environment, Prev: Why Use the Bourne Shell?, Up: Writing Portable Bourne Shell 22.2 Implementation =================== By their very nature, a sizeable part of the functionality of shell scripts, is provided by the many utility programs that they routinely call to perform important subsidiary tasks. Addressing the portability of the script involves issues of portability in the host operating system environment, and portability of the utility programs as well as the portability of the shell implementation itself. This section discusses differences between shell implementations to which you must cater when writing a portable script. It is broken into several subsections, each covering a single aspect of shell programming that needs to be approached carefully to avoid pitfalls with unexpected behaviour in some shell implementations. The following section discusses how to cope with the host environment in a portable fashion. The last section in this chapter addresses the portability of common shell utilities. * Menu: * Size Limitations:: * Magic Numbers:: * Colon:: * Functions:: * Source:: * Test:: * Variables:: * Pattern Matching::  File: autobook.info, Node: Size Limitations, Next: Magic Numbers, Up: Sh Implementation 22.2.1 Size Limitations ----------------------- Quite a lot of the Unix vendor implementations of the Bourne shell have a fixed buffer for storing command lines, as small as 512 characters in the worst cases. You may have an error akin to this: $ ls -d /usr/bin/* | wc -l sh: error: line too long Notice that the limit applies to the _expanded_ command line, not just the characters typed in for the line. A portable way to write this would be: $ ( cd /usr/bin && ls | wc -l ) 1556  File: autobook.info, Node: Magic Numbers, Next: Colon, Prev: Size Limitations, Up: Sh Implementation 22.2.2 #! --------- When the kernel executes a program from the file system, it checks the first few bytes of the file, and compares them with its internal list of known "magic numbers", which encode how the file can be executed. This is a similar, but distinct, system to the `/etc/magic' magic number list used by user space programs. Having determined that the file is a script by examining its magic number, the kernel finds the path of the interpreter by removing the `#!' and any intervening space from the first line of the script. One optional argument is allowed (additional arguments are not ignored, they constitute a syntax error), and the resulting command line is executed. There is a 32 character limit to the significant part of the `#!' line, so you must ensure that the full path to the interpreter plus any switches you need to pass to it do not exceed this limit. Also, the interpreter must be a real binary program, it cannot be a `#!' file itself. It used to be thought, that the semantics between different kernels' idea of the magic number for the start of an interpreted script varied slightly between implementations. In actual fact, all look for `#!' in the first two bytes - in spite of commonly held beliefs, there is no evidence that there are others which require `#! /'. A portable script must give an absolute path to the interpreter, which causes problems when, say, some machines have a better version of Bourne shell in an unusual directory - say `/usr/sysv/bin/sh'. See *Note (): Functions. for a way to re-execute the script with a better interpreter. For example, imagine a script file called `/tmp/foo.pl' with the following first line: #! /usr/local/bin/perl Now, the script can be executed from the `tmp' directory, with the following sequence of commands: $ cd /tmp $ ./foo.pl When executing these commands, the kernel will actually execute the following from the `/tmp' directory directory: /usr/local/bin/perl ./foo.pl This can pose problems of its own though. A script such as the one described above will not work on a machine where the perl interpreter is installed as `/usr/bin/perl'. There is a way to circumvent this problem, by using the `env' program to find the interpreter by looking in the user's `PATH' environment variable. Change the first line of the `foo.pl' to read as follows: #! /usr/bin/env perl This idiom does rely on the `env' command being installed as `/usr/bin/env', and that, in this example, `perl' can be found in the user's `PATH'. But that is indeed the case on the great majority of machines. In contrast, perl is installed in `usr/local/bin' as often as `/usr/bin', so using `env' like this is a net win overall. You can also use this method to get around the 32 character limit if the path to the interpreter is too long. Unfortunately, you lose the ability to pass an option flag to the interpreter if you choose to use `env'. For example, you can't do the following, since it requires two arguments: #! /usr/bin/env guile -s  File: autobook.info, Node: Colon, Next: Functions, Prev: Magic Numbers, Up: Sh Implementation 22.2.3 : -------- In the beginning, the magic number for Bourne shell scripts used to be a colon followed by a newline. Most Unices still support this, and will correctly pass a file with a single colon as its first line to `/bin/sh' for interpretation. Nobody uses this any more and I suspect some very new Unices may have forgotten about it entirely, so you should stick to the more usual `#! /bin/sh' syntax for your own scripts. You may occasionally come across a very old script that starts with a `:' though, and it is nice to know why! In addition, all known Bourne compatible shells have a builtin command, `:' which always returns success. It is equivalent to the system command `/bin/true', but can be used from a script without the overhead of starting another process. When setting a shell variable as a flag, it is good practice to use the commands, `:' and `false' as values, and choose the sense of the variable to be `:' in the common case: When you come to test the value of the variable, you will avoid the overhead of additional processes most of the time. var=: if $var; then foo fi The `:' command described above can take any number of arguments, which it will fastidiously ignore. This allows the `:' character to double up as a comment leader of sorts. Be aware that the characters that follow are not discarded, they are still interpreted by the shell, so metacharacters can have unexpected effects: $ cat foo : : echo foo : `echo bar` : `echo baz >&2' $ ./foo baz You may find very old shell scripts that are commented using `:', or new scripts that exploit this behavior in some esoteric fashion. My advice is, don't: It will bite you later.  File: autobook.info, Node: Functions, Next: Source, Prev: Colon, Up: Sh Implementation 22.2.4 () --------- There are still a great number of shells that, like Steve Bourne's original implementation, do not have functions! So, strictly speaking, you can't use shell functions in your scripts. Luckily, in this day and age, even though `/bin/sh' itself may not support shell functions, it is not too far from the truth to say that almost every machine will have _some_ shell that does. Taking this assumption to its logical conclusion, it is a simple matter of writing your script to find a suitable shell, and then feed itself to that shell so that the rest of the script can use functions with impunity: #! /bin/sh # Zsh is not Bourne compatible without the following: if test -n "$ZSH_VERSION"; then emulate sh NULLCMD=: fi # Bash is not POSIX compliant without the following: test -n "$BASH_VERSION" && set -o posix SHELL="${SHELL-/bin/sh}" if test x"$1" = x--re-executed; then # Functional shell was found. Remove option and continue shift elif "$SHELL" -c 'foo () { exit 0; }; foo' 2>/dev/null; then # The current shell works already! : else # Try alternative shells that (sometimes) support functions for cmd in sh bash ash bsh ksh zsh sh5; do set IFS=:; X="$PATH:/bin:/usr/bin:/usr/afsws/bin:/usr/ucb"; echo $X` for dir shell="$dir/$cmd" if (test -f "$shell" || test -f "$shell.exe") && "$shell" -c 'foo () { exit 0; }; foo 2>/dev/null then # Re-execute with discovered functional shell SHELL="$shell" exec "$shell" "$0" --re-executed ${1+"$@"} fi done done echo "Unable to locate a shell interpreter with function support" >&2 exit 1 fi foo () { echo "$SHELL: ta da!" } foo exit 0 Note that this script finds a shell that supports functions of the following syntax, since the use of the `function' keyword is much less widely supported: foo () { ... } A notable exception to the assertion that all machines have a shell that can handle functions is 4.3BSD, which has only a single shell: a shell function deprived Bourne shell. There are two ways you can deal with this: 1. Ask 4.3BSD users of your script to install a more featureful shell such as bash, so that the technique above will work. 2. Have your script run itself through `sed', chopping itself into pieces, with each function written to it's own script file, and then feed what's left into the original shell. Whenever a function call is encountered, one of the fragments from the original script will be executed in a subshell. If you decide to split the script with `sed', you will need to be careful not to rely on shell variables to communicate between functions, since each `function' will be executed in its own subshell.  File: autobook.info, Node: Source, Next: Test, Prev: Functions, Up: Sh Implementation 22.2.5 . -------- The semantics of `.' are rather peculiar to say the least. Here is a simple script - it just displays its positional parameters: #! /bin/sh echo "$0" ${1+"$@"} Put this in a file, `foo'. Here is another simple script - it calls the first script. Put this in another file, `wrapper': #! /bin/sh . ./foo . ./foo bar baz Observe what happens when you run this from the command line: $ ./wrapper ./wrapper ./wrapper bar baz So `$0' is inherited from the calling script, and the positional parameters are as passed to the command. Observe what happens when you call the wrapper script with arguments: $ ./wrapper 1 2 3 ./wrapper 1 2 3 ./wrapper bar baz So the sourced script has access to the calling scripts positional parameters, _unless you override them in the `.' command_. This can cause no end of trouble if you are not expecting it, so you must either be careful to omit all parameters to any `.' command, or else don't reference the parameters inside the sourced script. If you are reexecuting your script with a shell that understands functions, the best use for the `.' command is to load libraries of functions which can subsequently be used in the calling script. Most importantly, don't forget that, if you call the `exit' command in a script that you load with `.', it will cause the calling script to exit too!  File: autobook.info, Node: Test, Next: Variables, Prev: Source, Up: Sh Implementation 22.2.6 [ -------- Although technically equivalent, `test' is preferable to `[' in shell code written in conjunction with Autoconf, since `[' is also used for M4 quoting in Autoconf. Your code will be much easier to read (and write) if you abstain from the use of `['. Except in the most degenerate shells, `test' is a shell builtin to save the overhead of starting another process, and is no slower than `['. It does mean, however, that there is a huge range of features which are not implemented often enough that you can use them freely within a truly portable script. The less obvious ones to avoid are `-a' and `-o' - the logical `and' and `or' operations. A good litmus test for the portability of any shell feature is to see whether that feature is used in the source of Autoconf, and it turns out that `-a' and `-o' _are_ used here and there, but never more than once in a single command. All the same, to avoid any confusion, I always avoid them entirely. I would not use the following, for example: test foo -a bar Instead I would run test twice, like this: test foo && test bar The negation operator of `test' is quite portable and can be used in portable shell scripts. For example: if test ! foo; then bar; fi The negation operator of `if' is not at all portable and should be avoided. The following would generate a syntax error on some shell implementations: if ! test foo; then bar; fi An implication of this axiom is that when you need to branch if a command fails, and that command is not `test', you cannot use the negation operator. The easiest way to work around this is to use the `else' clause of the un-negated `if', like this: if foo; then :; else bar; fi Notice the use of the `:' builtin as a null operation when `foo' doesn't fail. The `test' command does not cope with missing or additional arguments, so you must take care to ensure that the shell does not remove arguments or introduce new ones during variable and quote expansions. The best way to do that is to enclose any variables in double quotes. You should also add a single character prefix to both sides in case the value of the expansion is a valid option to `test': $ for foo in "" "!" "bar" "baz quux"; do > test x"$foo" = x"bar" && echo 1 || echo 0 > done 0 0 1 0 Here, you can see that using the `x' prefix for the first operand saves `test' from interpreting the `!' argument as a real option, or from choking on an empty string - something you must always be aware of, or else the following behaviour will ensue: $ foo=! $ test "$foo" = "bar" && echo 1 || echo 0 test: argument expected 0 $ foo="" $ test "$foo" = "bar" && echo 1 || echo 0 test: argument expected 0 Also, the double quote marks help `test' cope with strings that contain whitespace. Without the double quotes, you will see this errors: $ foo="baz quux" $ test x$foo = "bar" && echo 1 || echo 0 test: too many arguments 0 You shouldn't rely on the default behaviour of test (to return `true' if its single argument has non-zero length), use the `-n' option to force that behaviour if it is what you want. Beyond that, the other thing you need to know about `test', is that if you use operators other than those below, you are reducing the portability of your code: `-n' STRING STRING is non-empty. `-z' STRING STRING is empty. STRING1 = STRING2 Both strings are identical. STRING1 != STRING2 The strings are not the same. `-d' FILE FILE exists and is a directory. `-f' FILE FILE exists and is a regular file. You can also use the following, provided that you don't mix them within a single invocation of `test': EXPRESSION `-a' EXPRESSION Both expressions evaluate to `true'. EXPRESSION `-o' EXPRESSION Neither expression evaluates to `false'.  File: autobook.info, Node: Variables, Next: Pattern Matching, Prev: Test, Up: Sh Implementation 22.2.7 $ -------- When using shell variables in your portable scripts, you need to write them in a somewhat stylised fashion to maximise the number of shell implementations that will interpret your code as expected: * Convenient though it is, the POSIX `$(command parameters)' syntax for command substitution is not remotely portable. Despite it being more difficult to nest, you must use ``command parameters`' instead. * The most portable way to set a default value for a shell variable is: $ echo ${no_such_var-"default value"} default value If there is any whitespace in the default value, as there is here, you must be careful to quote the entire value, since some shells will raise an error: $ echo ${no_such_var-default value} sh: bad substitution * The `unset' command is not available in many of the degenerate Bourne shell implementations. Generally, it is not too difficult to get by without it, but following the logic that led to the shell script in *Note (): Functions, it would be trivial to extend the test case for confirming a shell's suitability to include a check for `unset'. Although it has not been put to the test, the theory is that all the interesting machines in use today have _some_ shell that supports `unset'. * Be religious about double quoting variable expansions. Using `"$foo"' will avoid trouble with unexpected spaces in filenames, and compression of all whitespace to a single space in unquoted variable expansions. * To avoid accidental interpretation of variable expansions as command options you can use the following technique: $ foo=-n $ echo $foo $ echo x"$foo" | sed -e 's/^x//' -n * If it is set, `IFS' splits words on whitespace by default. If you change it, be sure to put it back when you're done, or the shell may behave very strangely from that point. For example, when you need to examine each element of `$PATH' in turn: # The whitespace at the end of the following line is a space # followed by literal tab and newline characters. save_IFS="${IFS= }"; IFS=":" set dummy $PATH IFS="$save_IFS" shift Alternatively, you can take advantage of the fact that command substitutions occur in a separate subshell, and do not corrupt the environment of the calling shell: set dummy `IFS=:; echo $PATH` shift Strictly speaking, the `dummy' argument is required to stop the `set' command from interpreting the first word of the expanded backquote expression as a command option. Realistically, no one is going to have `-x', for example, as the first element of their `PATH' variable, so the `dummy' could be omitted - as I did earlier in the script in *Note (): Functions. * Some shells expand `$@' to the empty string, even when there are no actual parameters (`$#' is 0). If you need to replicate the parameters that were passed to the executing script, when feeding the script to a more suitable interpreter for example, you must use the following: ${1+"$@"} Similarly, although all known shells do correctly use `$@' as the default argument to a `for' command, you must write it like this: for arg do stuff done When you rely on implicit `$@' like this, it is important to write the `do' keyword on a separate line. Some degenerate shells can not parse the following: for arg; do stuff done  File: autobook.info, Node: Pattern Matching, Prev: Variables, Up: Sh Implementation 22.2.8 * versus .* ------------------ This section compares "file globbing" with "regular expression matching". There are many Unix commands which are regularly used from shell scripts, and which provide some sort of pattern matching mechanism: `expr', `egrep' and `sed', to name a few. Unfortunately they each have different quoting rules regarding whether particular meta-characters must be backslash escaped to revert to their literal meaning and vice-versa. There is no real logic to the particular dialect of regular expressions accepted by these commands. To confirm the correctness of each regular expression, you should always check them from the shell prompt with the relevant tool before committing to a script, so I won't belabour the specifics. Shell globbing however is much more regular (no pun intended), and provides a reasonable and sometimes more cpu efficient solution to many shell matching problems. The key is to make good use of the `case' command, which is easier to use (because it uses globbing rules) and doesn't require additional processes to be spawned. Unfortunately, GNU Bash doesn't handle backslashes correctly in glob character classes - the backslash must be the first character in the class, or else it will never match. For example, if you want to detect absolute directory paths on Unix and Windows using `case', you should write the code like this: case $dir in [\\/]* | ?:[\\/]* ) echo absolute ;; * ) echo relative ;; esac Even though `expr' uses regular expressions rather than shell globbing, it is often(1) a shell builtin, so using it to extract sections of strings can be faster than spawning a sed process to do the same. As with `echo' and `set', for example, you must be careful that variable or command expansions for the first argument to `expr' are not accidentally interpreted as reserved keywords. As with `echo', you can work around this problem by prefixing any expansions with a literal `x', as follows: $ foo=substr $ expr $foo : '.*\(str\)' expr: syntax error $ expr x$foo : '.*\(str\)' str ---------- Footnotes ---------- (1) Notable exceptions are GNU Bash, and both Ksh and the Bourne shell on Solaris.  File: autobook.info, Node: Environment, Next: Utilities, Prev: Sh Implementation, Up: Writing Portable Bourne Shell 22.3 Environment ================ In addition to the problems with portability in shell implementations discussed in the previous section, the behaviour of the shell can also be drastically affected by the contents of certain environment variables, and the operating environment provided by the host machine. It is important to be aware of the behavior of some of the operating systems within which your shell script might run. Although not directly related to the implementation of the shell interpreter, the characteristics of some of target architectures do influence what is considered to be portable. To ensure your script will work on as many shell implementations as possible, you must observe the following points. SCO Unix doesn't like `LANG=C' and friends, but without `LC_MESSAGES=C', Solaris will translate variable values in `set'! Similarly, without `LC_CTYPE=C', compiled C code can behave unexpectedly. The trick is to set the values to `C', except for if they are not already set at all: for var in LANG LC_ALL LC_MESSAGES LC_CTYPES LANGUAGES do if eval test x"\${$var+set}" = xset; then eval $var=C; eval export $var fi done HP-UX `ksh' and all POSIX shells print the target directory to standard output if `CDPATH' is set. if test x"${CDPATH+set}" = xset; then CDPATH=:; export CDPATH; fi The target architecture file system may impose limits on your scripts. IF you want your scripts to run on the architectures which impose these limits, then your script must adhere to these limits: * The ISO9660 filesystem, as used on most CD-ROMs, limits nesting of directories to a maximum depth of twelve levels. * Many old Unix filesystems place a 14 character limit on the length of any filename. If you care about portability to DOS, _that_ has an 8 character limit with an optional extension of 3 or fewer characters (known as 8.3 notation). A useful idiom when you need to determine whether a particular pathname is relative or absolute, which works for DOS targets to follows: case "$file" in [\\/]* | ?:[\\/]*) echo absolute ;; *) echo default ;; esac  File: autobook.info, Node: Utilities, Prev: Environment, Up: Writing Portable Bourne Shell 22.4 Utilities ============== The utility programs commonly executed by shell scripts can have a huge impact on the portability of shell scripts, and it is important to know which utilities are universally available, and any differences certain implementations of these utilities may exhibit. According to the GNU standards document, you can rely on having access to these utilities from your scripts: cat cmp cp diff echo egrep expr false grep install-info ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touch true Here are some things that you must be aware of when using some of the tools listed above: `cat' Host architectures supply `cat' implementations with conflicting interpretations of, or entirely missing, the various command line options. You should avoid using any command line options to this command. `cp' and `mv' Unconditionally duplicated or otherwise open file descriptors can not be deleted on many operating systems, and worse on Windows the destination files cannot even be moved. Constructs like this must be avoided, for example. exec > foo mv foo bar `echo' The `echo' command has at least two flavors: the one takes a `-n' option to suppress the automatic newline at the end of the echoed string; the other uses an embedded `\c' notation as the last character in the echoed string for the same purpose. If you need to emit a string without a trailing newline character, you can use the following script fragment to discover which flavor of `echo' you are using: case echo "testing\c"`,`echo -n testing` in *c*,-n*) echo_n= echo_c='(1) ' ;; *c*,*) echo_n=-n echo_c= ;; *) echo_n= echo_c='\c ;; esac Any `echo' command after the shell fragment above, which shouldn't move the cursor to a new line, can now be written like so: echo $echo_n "prompt:$echo_c" In addition, you should try to avoid backslashes in `echo' arguments unless they are expanded by the shell. Some implementations interpret them and effectively perform another backslash expansion pass, where equally many implementations do not. This can become a really hairy problem if you need to have an `echo' command which doesn't perform backslash expansion, and in fact the first 150 lines of the `ltconfig' script distributed with Libtool are devoted to finding such a command. `ln' Not all systems support soft links. You should use the Autoconf macro `AC_PROG_LN_S' to discover what the target architecture supports, and assign the result of that test to a variable. Whenever you subsequently need to create a link you can use the command stored in the variable to do so. LN_S=@LN_S@ ... $LN_S $top_srcdir/foo $dist_dir/foo Also, you cannot rely on support for the `-f' option from all implementations of `ln'. Use `rm' before calling `ln' instead. `mkdir' Unfortunately, `mkdir -p' is not as portable as we might like. You must either create each directory in the path in turn, or use the `mkinstalldirs' script supplied by Automake. `sed' When you resort to using `sed' (rather, use `case' or `expr' if you can), there is no need to introduce command line scripts using the `-e' option. Even when you want to supply more than one script, you can use `;' as a command separator. The following two lines are equivalent, though the latter is cleaner: $ sed -e 's/foo/bar/g -e '12q' < infile > outfile $ sed 's/foo/bar/g;12q' < infile > outfile Some portability zealots still go to great lengths to avoid "here documents" of more than twelve lines. The twelve line limit is actually a limitation in some implementations of `sed', which has gradually seeped into the portable shell folklore as a general limit in all here documents. Autoconf, however, includes many here documents with far more than twelve lines, and has not generated any complaints from users. This is testament to the fact that at worst the limit is only encountered in very obscure cases - and most likely that it is not a real limit after all. Also, be aware that branch labels of more than eight characters are not portable to some imPlementations of `sed'. "Here documents" are a way of redirecting literal strings into the standard input of a command. You have certainly seen them before if you have looked at other peoples shell scripts, though you may not have realised what they were called: cat >> /tmp/file$$ << _EOF_ This is the text of a "here document" _EOF_ Something else to be aware of is that the temporary files created by your scripts can become a security problem if they are left in `/tmp' or if the names are predictable. A simple way around this is to create a directory in `/tmp' that is unique to the process and owned by the process user. Some machines have a utility program for just this purpose - `mktemp -d' - or else you can always fall back to `umask 077 && mkdir /tmp/$$'. Having created this directory, all of the temporary files for this process should be written to that directory, and its contents removed as soon as possible. Armed with the knowledge of how to write shell code in a portable fashion as discussed in this chapter, in combination with the M4 details from the last chapter, the specifics of combining the two to write your own Autoconf macros are covered in the next chapter. ---------- Footnotes ---------- (1) This is a literal newline.  File: autobook.info, Node: Writing New Macros for Autoconf, Next: Migrating Existing Packages, Prev: Writing Portable Bourne Shell, Up: Top 23 Writing New Macros for Autoconf ********************************** Autoconf is an extensible system which permits new macros to be written and shared between Autoconf users. Although it is possible to perform custom tests by placing fragments of shell code into your `configure.in' file, it is better practice to encapsulate that test in a macro. This encourages macro authors to make their macros more general purpose, easier to test and easier to share with other users. This chapter presents some guidelines for designing and implementing good Autoconf macros. It will conclude with a discussion of the approaches being considered by the Autoconf development community for improving the creation and distribution of macros. A more general discussion of macros can be found in *Note Macros and macro expansion::. * Menu: * Autoconf Preliminaries:: * Reusing Existing Macros:: * Guidelines for writing macros:: * Implementation specifics:: * Future directions for macro writers::  File: autobook.info, Node: Autoconf Preliminaries, Next: Reusing Existing Macros, Up: Writing New Macros for Autoconf 23.1 Autoconf Preliminaries =========================== In a small package which only uses Autoconf, your own macros are placed in the `aclocal.m4' file-this includes macros that you may have obtained from third parties such as the Autoconf macro archive (*note Autoconf macro archive::). If your package additionally uses Automake, then these macros should be placed in `acinclude.m4'. The `aclocal' program from Automake reads in macro definitions from `acinclude.m4' when generating `aclocal.m4'. When using Automake, for instance, `aclocal.m4' will include the definitions of `AM_' macros needed by Automake. In larger projects, it's advisable to keep your custom macros in a more organized structure. Autoconf version 2.15 will introduce a new facility to explicitly include files from your `configure.in' file. The details have not solidified yet, but it will almost certainly include a mechanism for automatically included files with the correct filename extension from a subdirectory, say `m4/'.  File: autobook.info, Node: Reusing Existing Macros, Next: Guidelines for writing macros, Prev: Autoconf Preliminaries, Up: Writing New Macros for Autoconf 23.2 Reusing Existing Macros ============================ It goes without saying that it makes sense to reuse macros where possible-indeed, a search of the Autoconf macro archive might turn up a macro which does exactly what you want, alleviating the need to write a macro at all (*note Autoconf macro archive::). It's more likely, though, that there will be generic, parameterized tests available that you can use to help you get your job done. Autoconf"s `generic' tests provide one such collection of macros. A macro that wants to test for support of a new language keyword, for example, should rely on the `AC_TRY_COMPILE' macro. This macro can be used to attempt to compile a small program and detect a failure due to, say, a syntax error. In any case, it is good practice when reusing macros to adhere to their publicized interface-do not rely on implementation details such as shell variables used to record the test's result unless this is explicitly mentioned as part of the macro's behavior. Macros in the Autoconf core can, and do, change their implementation from time to time. Reusing a macro does not imply that the macro is necessarily invoked from within the definition of your macro. Sometimes you might just want to rely on some action performed by a macro earlier in the configuration run-this is still a form of reuse. In these cases, it is necessary to ensure that this macro has indeed run at least once before your macro is invoked. It is possible to state such a dependency by invoking the `AC_REQUIRE' macro at the beginning of your macro's definition. Should you need to write a macro from scratch, the following sections will provide guidelines for writing better macros.