Back: Writing A Cygwin Friendly Package
Forward: File System Limitations
 
FastBack: DLLs with Libtool
Up: Writing A Cygwin Friendly Package
FastForward: DLLs with Libtool
Top: Autoconf, Automake, and Libtool
Contents: Table of Contents
Index: Index
About: About this document

25.3.1 Text vs Binary Modes

As discussed in 15.3.5.1 Text and Binary Files, text and binary files are different on Windows. Lines in a Windows text files end in a carriage return/line feed pair, but a C program reading the file in text mode will see a single line feed.

Cygwin has several ways to hide this dichotomy, and the solution(s) you choose will depend on how you plan to use your program. I will outline the relative tradeoffs you make with each choice:

mounting
Before installing an operating system to your hard drive, you must first organise the disk into partitions. Under Windows, you might only have a single partition on the disk, which would be called `C:'(63). Provided that some media is present, Windows allows you to access the contents of any drive letter -- that is you can access `A:' when there is a floppy disk in the drive, and `F:' provided you divided you available drives into sufficient partitions for that letter to be in use. With Unix, things are somewhat different: hard disks are still divided into partitions (typically several), but there is only a single filesystem mounted under the root directory. You can use the mount command to hook a partition (or floppy drive or CD-ROM, etc.) into a subdirectory of the root filesystem:

 
$ mount /dev/fd0 /mnt/floppy
$ cd /mnt/floppy

Until the directory is unmounted, the contents of the floppy disk will be available as part of the single Unix filesystem in the directory, `/mnt/floppy'. This is in contrast with Windows' multiple root directories which can be accessed by changing filesystem root -- to access the contents of a floppy disk:

 
C:\WINDOWS\> A:
A:> DIR
...

Cygwin has a mounting facility to allow Cygwin applications to see a single unified file system starting at the root directory, by mounting drive letters to subdirectories. When mounting a directory you can set a flag to determine whether the files in that partition should be treated the same whether they are TEXT or BINARY mode files. Mounting a file system to treat TEXT files the same as BINARY files, means that Cygwin programs can behave in the same way as they might on Unix and treat all files as equal. Mounting a file system to treat TEXT files properly, will cause Cygwin programs to translate between Windows CR-LF line end sequences and Unix CR line endings, which plays havoc with file seeking, and many programs which make assumptions about the size of a char in a FILE stream. However `binmode' is the default method because it is the only way to interoperate between Windows binaries and Cygwin binaries. You can get a list of which drive letters are mounted to which directories, and the modes they are mounted with by running the mount command without arguments:

 
BASH.EXE-2.04$ mount
Device              Directory            Type        flags
C:\cygwin           /                    user        binmode
C:\cygwin\bin       /usr/bin             user        binmode
C:\cygwin\lib       /usr/lib             user        binmode
D:\home             /home                user        binmode

As you can see, the Cygwin mount command allows you to `mount' arbitrary Windows directories as well as simple drive letters into the single filesystem seen by Cygwin applications.

binmode
The CYGWIN environment variable holds a space separated list of setup options which exert some minor control over the way the `cygwin1.dll' (or `cygwinb19.dll' etc.) behaves. One such option is the `binmode' setting; if CYGWIN contains the `binmode' option, files which are opened through `cygwin1.dll' without an explicit text or binary mode, will default to binary mode which is closest to how Unix behaves.

system calls
`cygwin1.dll', GNU libc and other modern C API implementations accept extra flags for fopen and open calls to determine in which mode a file is opened. On Unix it makes no difference, and sadly most Unix programmers are not aware of this subtlety, so this tends to be the first thing that needs to be fixed when porting a Unix program to Cygwin. The best way to use these calls portably is to use the following macros with a package's `configure.in' to be sure that the extra arguments are available:

 
# _AB_AC_FUNC_FOPEN(b | t, USE_FOPEN_BINARY | USE_FOPEN_TEXT)
# -----------------------------------------------------------
define([_AB_AC_FUNC_FOPEN],
[AC_CACHE_CHECK([whether fopen accepts "$1" mode], [ab_cv_func_fopen_$1],
[AC_TRY_RUN([#include <stdio.h>
int
main ()
{
   FILE *fp = fopen ("conftest.bin", "w$1");
   fprintf (fp, "\n");
   fclose (fp);
   return 0;
}],
            [ab_cv_func_fopen_$1=yes],
            [ab_cv_func_fopen_$1=no],
            [ab_cv_func_fopen_$1=no])])
if test x$ab_cv_func_fopen_$1 = xyes; then
  AC_DEFINE([$2], 1,
            [Define this if we can use the "$1" mode for fopen safely.])
fi[]dnl
])# _AB_AC_FUNC_FOPEN

# AB_AC_FUNC_FOPEN_BINARY
# -----------------------
# Test whether fopen accepts a "" in the mode string for binary file
# opening.  This makes no difference on most unices, but some OSes
# convert every newline written to a file to two bytes (CR LF), and
# every CR LF read from a file is silently converted to a newline.
AC_DEFUN([AB_AC_FUNC_FOPEN_BINARY], [_AB_AC_FUNC_FOPEN(b, USE_FOPEN_BINARY)])

# AB_AC_FUNC_FOPEN_TEXT
# ---------------------
# Test whether open accepts a "t" in the mode string for text file
# opening.  This makes no difference on most unices, but other OSes
# use it to assert that every newline written to a file writes two
# bytes (CR LF), and every CR LF read from a file are silently
# converted to a newline.
AC_DEFUN([AB_AC_FUNC_FOPEN_TEXT],   [_AB_AC_FUNC_FOPEN(t, USE_FOPEN_TEXT)])


# _AB_AC_FUNC_OPEN(O_BINARY|O_TEXT)
# ---------------------------------
AC_DEFUN([_AB_AC_FUNC_OPEN],
[AC_CACHE_CHECK([whether fcntl.h defines $1], [ab_cv_header_fcntl_h_$1],
[AC_EGREP_CPP([$1],
              [#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
$1
],
              [ab_cv_header_fcntl_h_$1=no],
              [ab_cv_header_fcntl_h_$1=yes])
if test "x$ab_cv_header_fcntl_h_$1" = xno; then
  AC_EGREP_CPP([_$1],
               [#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
_$1
],
                [ab_cv_header_fcntl_h_$1=0],
                [ab_cv_header_fcntl_h_$1=_$1])
fi])
if test "x$ab_cv_header_fcntl_h_$1" != xyes; then
  AC_DEFINE_UNQUOTED([$1], [$ab_cv_header_fcntl_h_$1],
    [Define this to a usable value if the system provides none])
fi[]dnl
])# _AB_AC_FUNC_OPEN


# AB_AC_FUNC_OPEN_BINARY
# ----------------------
# Test whether open accepts O_BINARY in the mode string for binary
# file opening.  This makes no difference on most unices, but some
# OSes convert every newline written to a file to two bytes (CR LF),
# and every CR LF read from a file is silently converted to a newline.
#
AC_DEFUN([AB_AC_FUNC_OPEN_BINARY], [_AB_AC_FUNC_OPEN([O_BINARY])])


# AB_AC_FUNC_OPEN_TEXT
# --------------------
# Test whether open accepts O_TEXT in the mode string for text file
# opening.  This makes no difference on most unices, but other OSes
# use it to assert that every newline written to a file writes two
# bytes (CR LF), and every CR LF read from a file are silently
# converted to a newline.
#
AC_DEFUN([AB_AC_FUNC_OPEN_TEXT],   [_AB_AC_FUNC_OPEN([O_TEXT])])


Add the following preprocessor code to a common header file that will be included by any sources that use fopen calls:

 
#define fopen	rpl_fopen

Save the following function to a file, and link that into your program so that in combination with the preprocessor magic above, you can always specify text or binary mode to open and fopen, and let this code take care of removing the flags on machines which do not support them:

 
#if HAVE_CONFIG_H
#  include <config.h>
#endif

#include <stdio.h>

/* Use the system size_t if it has one, or fallback to config.h */
#if STDC_HEADERS || HAVE_STDDEF_H
#  include <stddef.h>
#endif
#if HAVE_SYS_TYPES_H
#  include <sys/types.h>
#endif

/* One of the following headers will have prototypes for malloc
   and free on most systems.  If not, we don't add explicit
   prototypes which may generate a compiler warning in some
   cases -- explicit  prototypes would certainly cause
   compilation to fail with a type clash on some platforms. */
#if STDC_HEADERS || HAVE_STDLIB_H
#  include <stdlib.h>
#endif
#if HAVE_MEMORY_H
#  include <memory.h>
#endif

#if HAVE_STRING_H
#  include <string.h>
#else
#  if HAVE_STRINGS_H
#    include <strings.h>
#  endif /* !HAVE_STRINGS_H */
#endif /* !HAVE_STRING_H */

#if ! HAVE_STRCHR

/* BSD based systems have index() instead of strchr() */
#  if HAVE_INDEX
#    define strchr index
#  else /* ! HAVE_INDEX */

/* Very old C libraries have neither index() or strchr() */
#    define strchr rpl_strchr

static inline const char *strchr (const char *str, int ch);

static inline const char *
strchr (const char *str, int ch)
{
  const char *p = str;
  while (p && *p && *p != (char) ch)
    {
      ++p;
    }

  return (*p == (char) ch) ? p : 0;
}
#  endif /* HAVE_INDEX */

#endif /* HAVE_STRCHR */

/* BSD based systems have bcopy() instead of strcpy() */
#if ! HAVE_STRCPY
# define strcpy(dest, src)        bcopy(src, dest, strlen(src) + 1)
#endif

/* Very old C libraries have no strdup(). */
#if ! HAVE_STRDUP
# define strdup(str)                strcpy(malloc(strlen(str) + 1), str)
#endif

char*
rpl_fopen (const char *pathname, char *mode)
{
    char *result = NULL;
    char *p = mode;

    /* Scan to the end of mode until we find 'b' or 't'. */ 
    while (*p && *p != 'b' && *p != 't')
      {
        ++p;
      }

    if (!*p)
      {
        fprintf(stderr,
            "*WARNING* rpl_fopen called without mode 'b' or 't'\n");
      }

#if USE_FOPEN_BINARY && USE_FOPEN_TEXT
    result = fopen(pathname, mode);
#else
    {
        char ignore[3]= "bt";
        char *newmode = strdup(mode);
        char *q       = newmode;

        p = newmode;

#  if ! USE_FOPEN_TEXT
        strcpy(ignore, "b")
#  endif
#  if ! USE_FOPEN_BINARY
        strcpy(ignore, "t")
#  endif

        /* Copy characters from mode to newmode missing out
           b and/or t. */
        while (*p)
          {
            while (strchr(ignore, *p))
              {
                ++p;
              }
            *q++ = *p++;
          }
        *q = '\0';

        result = fopen(pathname, newmode);

        free(newmode);
    }
#endif /* USE_FOPEN_BINARY && USE_FOPEN_TEXT */

    return result;
}

The correct operation of the file above relies on several things having been checked by the configure script, so you will also need to ensure that the following macros are present in your `configure.in' before you use this code:

 
# configure.in -- Process this file with autoconf to produce configure
AC_INIT(rpl_fopen.c)

AC_PROG_CC
AC_HEADER_STDC
AC_CHECK_HEADERS(string.h strings.h, break)
AC_CHECK_HEADERS(stdlib.h stddef.h sys/types.h memory.h)

AC_C_CONST
AC_TYPE_SIZE_T

AC_CHECK_FUNCS(strchr index strcpy strdup)
AB_AC_FUNC_FOPEN_BINARY
AB_AC_FUNC_FOPEN_TEXT


This document was generated by Joost van Baal on August, 23 2005 using texi2html