Type System

Programming languages usually come with a type-system, a term for some algebraic structure whose elements are the types of data that can be manipulated in the language, together with a mapping from the set of objects involved in defining the semantics of the language into the typesystem.

Static vs Dynamic Typing

When the types are known in advance and decisions about them are made at compile-time, the typing is said to be static; when the types can be known only when code is executed at run-time, the typing is said to be dynamic. (Some people from the Scheme community talk about "latent" vs "manifest" types rather than static vs dynamic types, but this vocabulary is not widely recognized.)

Strong vs Weak Typing

When the types detected or declared are strictly enforced by the language's semantics, the language is strongly-typed; when the semantics of the language allows for inconsistencies between the compile-time type and the run-time type, the language is weakly-typed. Sometimes, languages are weakly-typed at one level, but strongly-typed at another. Weak typing means that the invariants (program properties) expressible in the type system are (sometimes much) weaker.

Type Declaration vs Type Inference

Some typed language such as Pascal or Java require that types be explicitly declared for all functions and variables in a program. Other typed languages, such as ML or Haskell, will infer the type of an expression depending on its structure and context, without the need for programmer declaration, except maybe in difficult or ambiguous cases. This is also referred to as explicit vs implicit static typing.

Orthogonal vs Interdependent Typing

In some languages (notably CAML, or Cecil), the semantics and the typesystem can be given independently, the typesystem serving as a restriction of the set of valid programs, that allows eliminating bugs or enabling various compiler optimizations. The typesystem could then be called orthogonal to the dynamic semantics. In other languages (notably Lisp, the C or Java), it is impossible to define the dynamic semantics of the language without involving its typesystem; the typesystem could thus be called "interdependent" with the dynamic semantics of the language. These terms are not very common, as far as I (Fare) know.

Discussion

Strong type systems allow catching a lot of common programming mistakes (typos, thinkos, lack of propagation of program modifications) that usually require a lot of debugging time to catch. static type systems particularly allow catching them quickly, as early as compile-time, without having to even worry about them. Type inference allows for such bug-catching to have no associated development costs, as long as the typesystem is powerful enough to express the considered program without the need for abuse or contortions.

Some argue that you can always determine static or dynamic information about a program, and call it is type. For them, the distinction is between trivial static typing and rich static typing, between trivial dynamic typing and rich dynamic typing. Trivial here means that the compiler (resp. the runtime system) cannot use static (resp. dynamic) information about an expressions' type so as to avoid checks and make early decisions about program behaviour, whereas rich means that it can.

Often, otherwise strongly typed languages have escapes allowing for weak typing (pointer/array dereference in the C or C++, typecasting in C, C++ or Java; magic in OCAML). Such languages can usually still be neatly divided into weakly typed and strongly typed. Indeed, in the case of C and C++, the weak typing features are essential to any reasonably sized project, and their implicit rather than explicit character prevents any easy syntactic distinction between programs that abuse or don't abuse the type-system; hence C and C++ are definitely weakly-typed languages. In the case of OCAML, such abuse is not an essential feature to normal user programs, and made explicit through the use of the keyword "magic" (or of special linking to routines written in C or other languages) that can be easily detected and avoided, - so OCAML is definitely a strongly-typed language. In the case of Java, the "weaknesses" in the type-system are explicit typecasts (but a runtime check ensures that an object can only be cast to a superclass of its class, of which it is a member, and for which there can be no static method discrepancy), and by the special NULL pointer (but the semantics again is well defined, although different for "static" and other methods); Java is thus strongly typed, although the invariants attached to its types may be less obvious and less nicely explained than in other strongly typed languages.

On a different note, otherwise statically typed languages sometimes have escapes that allow for dynamic typing. For instance the CLOS MOP, or RTTI in C++, Java or Mercury, allow programs to inspect and modify objects whose type is not statically known. This enables the development of some generic software infrastructure, but at the same time forces the implementation to keep at runtime information that might otherwise have been thrown away at compile-time (as it is in OCAML for instance), with all the associated space and time overhead. dynamically typed languages have this overhead, anyway.

Inferred Static Strong Types, such as exist in ML, Haskell, Clean or Mercury, are those that bring the most information on programs out of the least programmer efforts, allowing to express the stronger and richer invariants. Because they do not usually require any type declaration, they remove the burden that people used to dynamic programming languages usually feel about rich typing. However, because of language design and implementation issues (notably the simplicity of the typesystem and the guarantee that the process of typing will terminate promptly), most languages with strong static inferred types also restrict the expressiveness of their typesystem in such ways that advanced users of dynamically typed language will find them impossible to use (notably those who like dynamic interactive systems and/or Lisp macros). A notable exception, Cayenne, is an extension Haskell with dependent types (types that depend on values), that allows to express arbitrary computations within the type system at compile-time; unhappily, it is more designed as a proof of concept than as a really usable programming language.

Compilers for languages without a rich static type system can nevertheless infer types from programs that follow common programming styles, and use these types to produce more efficient code, or emit warnings that help catch bugs. A case in point is CMUCL (a Common Lisp compiler). Common Lisp, a strongly dynamically typed language, allows the programmer to declare static types for expressions, so as to help the compiler. CMUCL takes advantage of these declarations as well as of information it can infer from the language's semantics so as to construct precise types for expressions and produce efficient code that avoids runtime type dispatch. The situation of CMUCL is rather peculiar, since, depending on its the programmer-accessible safety settings, the declarations can be used either to tighten the type-safety of the language, or on the contrary to make it weakly-typed and provide an escape for low-level optimizations. Because this escape requires explicit and easily toggled safety settings, and is not essential to normal programs, CMUCL, like Common Lisp in general, is still a strongly dynamically typed language.

Dylan, a fully object-oriented language in the Lisp/Scheme family, has a typing system which operates in a similar fashion, except there is no facility to disable type-checking. In Dylan, all entities are instances of some subclass of a root class which is named <object>. By default, variables and method parameters are assumed only to be an instance of <object>, but the user may specialize variables and method parameters on classes (or more-sophisticated types, such as union types) if they desire. The Dylan compiler will use such user-supplied type information to determine correctness at compile/run-time and to optimize method calls.


Pages in this topic: Dependent Types   Type Systems Varieties  


Also linked from: Axiom   Credits   DML   EuLisp   Felix   Goedel   Gont   Hume   Lego   Luca Cardelli   Self   Sheep   Static   Stella language   Vault   Yarrow