With the help of some friends from graduate school, I recently got access to some of my thesis code. There were two main programming projects that became my thesis:

  • Java Analysis: extended existing security-typed language compiler to handle less annotations (added inference)
  • Graph Analysis: perform a graph cut, dominator computation, determine how many different cuts exist in a graph, etc.

I initially wrote the second project in OCaml (it was eventually rewritten in C++ for reasons I may go into later). Its main task was to read XML that was output from Java and transform it into a graph. Unfortunately I wrote this code in a hurry.

Going through old code is liberating: no matter how great I think that my code is great now, it sure was awful two years ago!

I’m not going to assume any knowledge of OCaml – the badness of the below code should hopefully be apparent!

How Not To Write an Interface

If you’ve never programmed in OCaml, a signature is essentially a Java interface. You apply signatures to modules and then interact with the module through the signature. (It gets a little more advanced but I won’t get into that here.)

Here’s the code for the constraint parser – it would read XML generated by the compiler and then output useful information based on that while populating a group of data structures that would be used later. (The “nv” in variable names is domain specific and not important [1].)

module type XMLCONSTRAINTREADER =
  sig
    (* data structures in signature *)
    val nvToExpMap : (string, string) Hashtbl.t
    val nvToPosMap : (string, string) Hashtbl.t
    val nvToWeightMap : (string, int) Hashtbl.t
    val nvNoDeclassify : StringSet.t ref

    (* really bad function type! *)
    val xml_file_to_constraint_set : string ->
               (Simp.cons list * Simp.LatticeGraph.t *
               ((string * int) list))
  end

Smell 1 – Global variables

Making references to the hashtable and string set in the XMLCONSTRAINTREADER basically makes them global variables.

It’s even more awkward to refer to the nvNoDeclassify variable because it’s an immutable data structure that I just created a reference to. That means that any reference to nvNoDeclassify has to be accompanied by a dereference, i.e. !nvNoDeclassify (essentially *nvNoDeclassify for you C-head).

Smell 2 – What is that function doing?

The function’s type is incomprehensible to me. I’m willing to accept that a constraint reader should return a list of constraints and a lattice graph, but what is (string * int) list also doing being returned? [2]

Why is the function called xml_file_to_constraint_set when it is taking a string? (it probably opens the file handle itself)</li>

In statically typed languages, type annotations can be invaluable in finding functions that are doing too much or doing wrong thing. A lack of clarity in the function signature can indicate a lack of clarity about the underlying functionality of the code.

Smell 3 – Why is this a signature?

Taking away all the bad data structures, we are left with a single function whose implementation is likely to be complicated. There is no data to be hidden, making a signature a bad tool to use for this particular abstraction. (This is somewhat counter to how interfaces are used in OO languages.)

If I Had To Write It Again

This is probably how I’d approach a refactoring on this signature (and its corresponding modules’) functionality.

module type PARSEDCONSTRAINTFILE =
  sig
      (* actual type of t must be defined by
         implementation (module) *)
      type t

      (* create a constraint file *)
      val create : unit -> t

      (* replace hashtables -- parsed file might not
         mention an expression *)
      val getExpression : t -> string -> string option
      val getPosition : t -> string -> string option
      val getWeight : t -> string -> string option

      (* imperative update *)
      val addExpression : t -> string -> string -> unit
      val addPosition  : t -> string -> string -> unit
      val addWeight : t -> string -> string -> unit

      (* replace set *)
      val shouldDeclassify : t -> string -> bool

      (* also need to add iterators, etc *)
  end

(* ParsedConstraintFileImpl is an implementation of
    PARSEDCONSTRAINTFILE *)

val parse_xml_constraint_file
       : file_handle -> ParsedConstraintFileImpl.t

Ideally when writing parse_xml_constraint_file that would only use features of PARSEDCONSTRAINTFILE, allowing us to put parse_xml_constraint_file into a functor, but that is a little beyond this post’s scope.

[1] In the Jif compiler/Andrew Myers’ thesis, the NV of an expression is the security label of observing its value.

[2] The returned (string * int) list turned out to be a list of procedures along with the number of constraints generated per procedure. (Heftier methods generated more constraints, as well as those higher up the call graph.)