How to Design Data Definitions (HtDD)

This web page summarizes the process of developing a data definition, with a particular focus on the interaction between the different shapes of domain information, program data, test cases and templates.


Data Definitions

A data definition establishes the relationship between problem domain and program, explaining how we intend to represent(/interpret) information(/data):

  • Information in the problem domain is represented by data in the program.
  • Data in the program can be interpreted as information in the problem.

A data definition must describe how to make data that satisfies the data definition and how to tell whether a data value satisfies the data definition. It must also describe how to interpret a data value as information in the program's domain.

Data definitions are the foundation for the HtDP process:

  • Data definitions for the parameters are required to write the signature of a function.
  • A data definition for the WorldState is required to design a World program.
  • In addition, the data definition interacts with the How to Design Functions (HtDF) process for the tests (functional examples) and the template/inventory. The shape of the data definition determines how many tests are required to achieve complete coverage; it also determines the template or inventory of what the body of the function has to work with.

Data definitions come in a number of different kinds, and each kind has different implications for the HtDF recipe. This column goes through the different kinds of data definition, the column to the right discusses how each kind affects the HtDF recipe.

A data definition consists of four or five parts:

  • A possible structure definition (define-struct) (after week 3)
  • A types comment that describes how the information is represented as data.
  • An interpretation comment that describes how data of that form should be interpreted as information.
  • One or more examples of the data.
  • A template for a function operating on the data.

One of the most important points of this course is that:

  • the form of information in the problem's domain determines the form of the data definition,
  • which in turn determines the form of the examples and the templates,
  • and therefore much of the final program design.

We will study several forms of information and data in the course. The remainder of this page presents those different forms of data in the order we will cover them.


  1. Simple Atomic Data
  2. Enumerations
  3. Intervals
  4. Itemizations (Mixed Data/Unions)
  5. Compound data (Structures)
  6. References to other data definitions
  7. Self-referential or recursive


This column indicates connections between a data definition and the process of a designing function to process that data. As refernce, here are the steps again of:

How to Design Functions (HtDF)

Maintain a wish list of desired functions, starting with your initial one.

Recall, the design of a function proceeds in several steps.

  1. Signature, purpose statement, and stub.
  2. Set up functional examples using check-expect, check-error...
  3. Template/inventory. What does the function have to work with?
  4. Code the body. While doing so add new functions to the wish lists to follow the one function per task rule.
  5. Test and debug until working.

As the structure of the data definition changes, the main affected parts of the HtDF recipe are the examples and the template.

Simple Atomic Data

Use simple atomic data when the information to be represented is itself atomic in form, such as the elapsed time since the start of the animation or the x coordinate of a car.

;; Time is a (natural) Number
;; interpretation: ticks since start of game

(define START-TIME 0)
(define OLD-TIME (* 60 60 24 100)) ; 100 years of seconds

(define (time-func t)
  (... t ...))

Template: For simple atomic data the template just includes the parameter.

Examples/Tests: For simple atomic data at least one test case will be required. Be on the lookout for cases where a number is an interval in disguise (see below).

When creating examples/tests for a specific function, write at least two test cases.


Use an enumeration when the information to be represented consists of a fixed number of distinct values, such as colors, letter grades etc. In the case of enumerations it is sometimes redundant to provide an interpretation and nearly always redundant to provide examples. The example below includes the interpretation but not the examples.

;; LightState is one of:
;;  - "red"
;;  - "yellow"
;;  - "green"
;; interp. the current color of the light

(define (light-state-func a-ls)
  (cond [(string=? "red" a-ls)  (...)]
        [(string=? "yellow" a-ls) (...)]
        [(string=? "green" a-ls) (...)]))

Template: For enumerations the template is a cond with one case for each case of the enumeration. Use equality predicates (string=?, key=?, =) to construct the questions of each clause.

Note that some enumerations contain a very large number of elements. A canonical example is KeyEvent provided as part of big-bang. KeyEvent includes all the letters of the alphabet as well as other keys you can press on the keyboard. It is not necessary to write out all the cases for such a data definition. Instead write one or two, as well as a comment saying what the others are, where they are defined etc.

Defer writing templates for such large enumerations until a template is needed for a specific function. At that point include the specific cases that function cares about. Be sure to include an else clause in the template to handle the other cases. As an example, some functions operating on KeyEvent may only care about the space key and justignore all other keys, the following would be an appropriate template for such functions.

Examples/Tests: Functions operating on enumerations should generally have at least as many tests as there are cases in the enumeration.

When creating tests and templates for a particular function include the specific cases that the particular function cares about. Be sure to include an else case to handle the others in the template and to test the else case.


Use an interval when the information to be represented is numbers within a certain range. Integer[0, 10] is all the integers from 0 to 10 inclusive; Number[0, 10) is all the numbers from 0 inclusive to 10 exclusive.

Intervals often appear in itemizations (see below), but can also appear alone, as in:

;; Countdown is an Integer[0, 10]
;; interp. the current point in a countdown clock

(define CLOCK-START 10)
(define CLOCK-MIDDLE 5)
(define CLOCK-END 0)

(define (countdown-func a-cd ...)
  (... a-cd ...))

Template: For the template of an itemization of multiple intervals use a cond with one clause for each interval. As with enumerations, some functions may only care about some of the subclasses of an interval data definition, in which case the tests and cond clauses are reduced.

When writing tests for functions operating on intervals, include cases for the endpoints of closed intervals as well as between the endpoints.

Itemizations (Mixed Data/Unions)

An itemization describes data comprised of two or more subclasses, at least one of which is not a distinct data item (as opposed to enumerations in which the subclasses are all distinct data items). In an itemization the template is similar to that for enumerations: a cond with one clause per subclass. Use an appropriate predicate to construct a question for each clause - isolating the corresponding case of the data definition. In cases where the subclass of data has its own data definition (i.e. there is an "arrow"), the answer part of the cond clause includes a call to a helper template, in other cases it just includes the parameter.

;; Location is one of:
;;  - Number
;;  - Posn
;; interp. location on a number line or a cartesian plane

(define LOC1 3)
(define LOC2 (make-posn 2 3)) 

(define (location-func a-loc ...)
  (cond [(number? a-loc) (... a-loc ...)]
        [(posn? a-loc) (... (posn-func a-loc) ...)]))

A common case is for the itemization to be comprised of two or more intervals. In this case many functions operating on the data definition will want to test all the boundaries and points between the boundaries which may make it worthwhile to define examples for all such values, but that is not absolutely necessary.

;; Reading is one of:
;;  - Number[0, 10)     low
;;  - Number[10, 20)    medium
;;  - Number[20, 30]    high
;; interp. the reading from the pressure gauge, high/medium/low as above

(define R1 0)
(define R2 5)
(define R3 10)
(define R4 15)
(define R5 20)
(define R6 25)
(define R7 30)

(define (reading-func r ...)
  (cond [(and (<= 0 r) (< r 10))   (... r ...)]
        [(and (<= 10 r) (< r 20))  (... r ...)]
        [(and (<= 20 r) (<= r 30)) (... r ...)]))


Functional examples/test cases for itemizations should include an example for each subclass of data.


Use structures when two or more values naturally belong together. The structure definition goes at the beginning of the data definition. The template for function operating on structures consists of selectors for all the structure's fields.

(define-struct ball (x y dx dy))
;; Ball is (make-ball Number Number Number Number)
;; interp. a ball at position x, y moving with velocity dx, dy

(define BALL-1 (make-ball 0 10 1 3))

(define (ball-func a-ball ...)
  (... (ball-x a-ball) ...
   ... (ball-y a-ball) ...
   ... (ball-dx a-ball) ...
   ... (ball-dy a-ball) ...))

References to other data definitions

Some data definitions contain references to other data definitions you have defined. (Imagine an arrow to the other data definition.) One common case is for a compound data definition to comprise other named data definitions. (Or, once lists are introduced, for a list to contain elements that are described by another data definition.) In these cases the template of the first data definition should contain calls to the second data definition's template function wherever the second data appears. For example:

(define-struct game (ball score))
;; Game is (make-game Ball Number)
;; interp. the current ball and score of the game

(define GAME-1 (make-game (make-ball 1 5 7 11) 2))

(define (game-func a-game ...)
  (... (ball-func (game-ball a-game)) ...
   ... (game-score a-game) ... ))



Self-referential or recursive

When the information in the problem domain is of arbitrary size, a self-referential data definition is needed.

In order to be valid, a self-referential data definition must:

  1. have at least two clauses, and
  2. at least one clause must not refer back to the data definition itself.

(At least one clause must refer back to the data definition itself or else it is not self-referential.)

The template contains a base case corresponding to the non-self-referential clause(s) as well as one or more natural recursions corresponding to the self-referential clauses.

;; ListOfString is one of:
;;  - empty
;;  - (cons String ListOfString)
;; interp. a list of strings

(define LOS-1 empty)
(define LOS-2 (cons "a" empty))
(define LOS-3 (cons "b" (cons "c" empty)))

(define (los-func a-los)
  (cond [(empty? a-los) (...)]                       ; BASE CASE
        [(cons? a-los) (... (first a-los) ...
                        ... (los-func (rest a-los)) ...)])) ; NATURAL RECURSION  

Notice that the above template contains:

  • elements of the itemization template (since the data definition has the form of an itemization)
  • elements of the structure template, since conses are like structures (cons is like make-pair, first is like pair-first, rest is pair-rest)
  • the natural recursion which is particular to recursive data definitions

In some cases a types comment can have both self-reference and reference to another type.

(define-struct dot (x y))
;; A Dot is (make-dot Integer Integer)
;; interp. A dot on the screen, w/ x and y coordinates.

(define D1 (make-dot 10 30))

(define (dot-func a-dot)
  (... (dot-x a-dot) ...
   ... (dot-y a-dot) ...))

;; A ListOfDot is one of:
;;  - empty
;;  - (cons Dot ListOfDot)
;; interp. a list of Dot

(define LOD1 empty)
(define LOD2 (cons (make-dot 10 20) (cons (make-dot 3 6) empty)))

(define (lod-func a-lod)
  (cond [(empty? a-lod) (...)]
        [(cons? a-lod)
         (... (dot-func (first a-lod))
              (lod-func (rest a-lod)))]))
Provide examples/test cases for each clause of the data definition. Use examples that extend the data one level to help you understand what the natural recursion gives you ("magically"), and what you need to do to piece together the dots.


(This document adapted from Gregor Kiczales, UBC CPSC 110 with permission.)
Printable copy