Published: April 19 2016

General purpose OO languages - like Java or C# - come with a library for collection type objects like arrays, lists, and sets. If you throw in a static type system those languages usually differentiate between collection interfaces (List, Queue, Set) and implementing classes (ArrayList, PriorityQueue, HashSet). It's considered good practice to only expose the interface types in your API and use a suitable implementation under the hood. So far so good.

The Problem

Sometimes however, when designing an API, you feel there's an interface missing that would better epress the semantics of what you want to communicate. One of the more common examples is an OrderedSet. You want to specify that the collection you return contains any value only once: set semantics. But you also want to tell the user in which order those values should be processed. Here's a concrete example taken from JUnit 5's TestIdentifier:

public class TestIdentifier...
    public Set getTags() {
      return tags;
    }

Tests, which can usually be mapped to a method in a test class, may have tags. Those tags can come from an annotation on the method itself, but also from an annotation at class level or even from a superclass. Since we can imagine tags that contradict each other, e.g. "fast" vs "slow", a consumer of <code>getTags</code> is interested in knowing the order in which those tags were added - from superclass to class to method itself - so that they can derive tag evaluation priority. In Java there even exists a set implementation that provides a predictable order of values added to a set: LinkedHashSet. All that seems missing is an OrderedSet interface.

There is no such thing as an unambiguous ordered set

Neither Java nor C# provide an ordered set interface and there is a very good reason for it: Set semantics and ordering semantics do not go well together. Here is why:

  • Set semantics require that any value in a set can be there only once.
  • Order by entry semantics require that a) any value that was added earlier is ordered before any value that was added later and b) any value that was added later is ordered after any value that was added earlier.
  • Now consider the "ordered set" [a, b, c]. What's the ordering after you re-add value a. Is it still [a, b, c] or should it change to [b, c, a]?

If you want to keep set semantics in place you have to break either part a) or part b) of ordering semantics.

What's the Solution, Then?

The solution is simple but might not feel satisfactory: You have to decide for either dropping ordering in your API or keeping duplicate entries. If you can go with the former, it's a simplification of your contract and thereby reduces coupling in your code. If you go with the latter, the client of your code will have to figure out an unambiguous way how to deal with duplicate values and their order.

A variation of keeping the duplicates is to introduce a domain specific interface, e.g. PrioritizedTagSet, and define the concrete and unambiguous semantics yourself, but not for all ordered sets, just for your specific use case of it.

What you should not do is to just have a Set interface and document (in Javadoc or elsewhere) that the underlying implementation will take care of some kind of ordering. That way, you'd be hiding an important part of your contract in documentation only.

What about SortedSet?

Sorting is a different beast, although it sounds similiar. Sorting means "ordering a given collection by an external criterion". In our case we might want to display all tags in alphanumeric "order". Well, natural languages are a mess when it comes to non-ambiguity.

TL;DR

Set semantics and order by entry semantics contradict each other. Choose one.

blog comments powered by Disqus