Tuesday, September 7, 2021

Heterogenous Maps and Dynamic Case Classes in Scala 3

Scala 3 released not too long ago, and it comes packed with lots of tools to introspect your code and generate new code. One of these tools is Mirror, a new given type that can give you information about case classes. Mirror can tell you the name of a case class, the names of its fields, and the types of its fields. This is very powerful considering that you can use it without dipping your toes into scala 3's compile time reflection, and it hints at being able to generate new types programatically that have similar structure to your case class. In order to generate these new types and use them like a normal case class, we need two things: Heterogenous Maps and Dynamic types.

Heterogenous Maps

A heterogenous map is one that contains keys and values of myriad types. With a heterogenous map, you can give the map a key with type K, and if a matching entry is found in the map, a value of type V will be returned. A heterogenous map can store myriad key value type pairs without requiring they share a common subtype, and without losing type information. With a heterogenous map, code like below is possible:


These types were possible in Scala 2 with shapeless, but now they are possible in Scala 3 without any macro usage, just Scala 3's new features. For the purposes of this blog post, I will demonstrate how to create a Heterogenous Map with the following constraints:
  1. Keys must be Singleton types
  2. An HMap that already has a key has to refuse to store new data associated with said key
  3. All operations on the HMap that result in failure must do so at compile time
Our HMap will be backed with a Map[Any, Any] and will store information about its contents in a generic T that's a subtype of Tuple.


The first piece: Improved tuples

For our HMap to be of any use, we need to be able to add data to it. Scala 3 has improved tuples a great amount, and these improvements allow us to write a function for adding new data quite easily:


This new add function takes two types, K and V, and it requires that K is a subtype of Singleton (to meet constraint 1). It returns an HMap[(K,V) *: T], the new HMap with the data added. The type (K,V) *: T is tuple cons in scala 3. Just like how 5 :: list prepends 5 to a list, (5,6) *: (("A", "B")) == ((5,6), ("A", "B")). The *: operator is both a type level and regular operator in scala 3, so we can use it in types to indicate adding data to a tuple.

The second piece: Match types

The add function added to the HMap works well for adding data, but it has no guards to meet constraint 2: We do not want a key that's been stored in the HMap to be allowed to be readded with new data. If you run HMap.empty.add("hello", "world").add("hello", "goodbye"), there is no compile time error at all. Worse yet, the original data associated with "hello" has been thrown out. In order to meet constraint 2 and 3, we'll need a new capability of scala 3: Match Types.


Above, we've added a Match Type called NoMatch, and a using parameter to our add function. The NoMatch type can be used to check if there's a key match in a tuple by deconstructing it and returning the singleton types true and false. The first line of it is similar to a type definition in scala 2, except that we use the match keyword after the Tup generic. Following the first line are the cases for the transformation of the types. In the first case, if we check to see if the tuple is shaped like (K *: ? *: EmptyTuple) *: ?.  The aforementioned shape matches if the head of the tuple is a tuple pair, and the first element of that pair matches the singleton type K we'd like to add. If this case is selected, the NoMatch type transforms into the Singleton type false at compile time, as there is a match in the tuple. The second case handles if the tuple is not the empty tuple, and does not have a K singleton in the tuple at its head. This case results in a recursive usage of NoMatch, with the head of the tuple discarded, and the rest of the tuple checked for K matches. The final case only matches if the Tup generic is an EmptyTuple, indicating we have proven there were no K matches in the original tuple. This case results in the NoMatch type transforming to the singleton type true. 

We use the transformation of the NoMatch type and the using clause together to limit at compile time whether or not a key can be added to the HMap. The compiler will throw an error if evidence cannot be found that NoMatch[T,K] =:= true (for example, because NoMatch resolved to false).

These two additions to our code makes our add function complete. It obeys all 3 constraints: The keys have to be singleton types, they cannot be readded to an HMap, and trying to do so results in a compile-time error.

The third piece: More Match types, and our get functio

We are now able to add data to our HMap, and adding data the wrong way results in compile time errors. Everything is looking great, but we're about to hit our first roadblock: the get function.

The start of our get function is pretty simple. We take a k with type K, and K should be a singleton. However, how do we define the result type? The result type should be the V type associated with our key K, but that can be anywhere inside the tuple type that defines what is and isn't in our HMap. 

Furthermore, what should happen if the HMap has no data associated with the key used to query it? In a normal map we'd return an Option type, but we don't want to do that. We want compile time errors!

Our answer lies with our NoMatch type and a new, special match type:


This match type is like NoMatch, but it has a few important differences. First of all, the EmptyTuple case returns Nothing as the resulting type. The case that matches on our singleton key K also takes the value type, written v here. Finally, the last case recurses if a match was not found. This type is not used with a using clause, but as the return type for get. When get is called with a singleton key K, the tuple type of the HMap is scanned, and the value type v is returned if a match is found. Nothing indicates no match was found, but this is a result that can't happen, as get has a using clause that checks if NoMatch[T,K] resolves to false. If the user passes in a singleton type K that is not in the HMap, evidence cannot be found for NoMatch[T,K] =:= false at compile time, ensuring that trying to retrieve data from an HMap that doesn't exist in the HMap is a compile time failure once again.

Something extra: The delete function

We have our HMap defined and we can add and query it now. However, constraint 2 can be a bit pesky. Imagine you were passing a fully formed HMap and you wanted to replace some data? We can't do that because constraint 2 prevents add from replacing data automatically. However, we can add a delete function and still meet our constraints. A delete function is a bit weird though. For add, we just appended our new data to the front of our tuple of type info, but a delete operation might remove data from any part of this tuple. How do we represent such a return type? A match type once again:


This DropMatch match type returns an empty tuple when Tup is an EmptyTuple, the tail of the tuple when it encounters a k match, and (k, t) *: DropMatch[rest,K] when the head tuple is not a match. In essence it recursively scans the tuple and removes pairs that have a singleton key K. The function's return type is therefore HMap[DropMatch[T,K]], the HMap with its data tuple scrubbed of singleton K. The NoMatch using constraint also makes sure that trying to delete a singleton key K that doesn't exist in our HMap is a compile time error.

More extras: Opaque types and erased

Our HMap is in a useable state now, and we could move on from here. Or we could try to make the type as lightweight as possible. Our HMap only stores one piece of runtime data, the Map m. In scala 2, this class could be made a value class, and then we wouldn't construct an HMap instance all of the time, scala would just use the Map type at runtime. Value types had a number of shortcomings however (had to actually be instantiated in many surprising instances), and so Scala 3 has a concept that replaces Value types in many cases: Opaque types. Changing the code to use opaque types looks like below:




The line opaque type M[T<:Tuple] = Map[Any, Any] creates a new type M that only exists at compile time. This type provides extra information for our map (namely, what's in it), and this type information is erased at runtime.

Considering our opaque HMap will not result in extra object instantiation, we could call it a day here, but we could also use the experimental scala keyword erased. In add, get, and delete, we use given instances as evidence that users are doing the right things with our functions. This evidence is instantiated at runtime and passed into our functions. Because of the generic nature of this evidence, it will almost certainly result in extra object instantiation for each function call. Since we don't actually use the givens though, we can use the erased keyword. This makes the data unavailable at runtime and should in theory mean we do not need to instantiate evidence like NoMatch[T, K] =:= true.

Putting it all together: Dynamic

Now that we have our HMap, it's time to create a pseudo-case class. The Dynamic trait has existed for a long time in Scala 2, but it's not seen much use. We're in a static, strongly-typed language, and the Dynamic trait seems to throw away static typing. This is not the case however.


The Dynamic type has a special method called selectDynamic, and how you define it is pretty unconstrained. The big rule is that it has a single input parameter, which strings can be passed to. When you do a field access on a class that extends Dynamic, but said field does not exist on the class, selectDynamic is used. An example would be "a.foo". If the class A does not have a foo field, selectDynamic("foo") is used. The way selectDynamic is used here, we treat the string as a singleton, look for evidence that said singleton is available in our HMap, and if it is, we get the value associated with the singleton. In essence PseudoCaseClass(HMap.empty.add("hello", 5))is akin to a case class with the field hello defined, and said field returns 5. If you tried to pcc.hi you would get a compile-time error that there's no such field.

4 comments:

  1. This was a very informative article, thanks a lot! One question however, regarding the second constraint:
    "An HMap that already has a key has to refuse to store new data associated with said key"
    Not having run this myself, so from just reading the code, it seems to me that HMap is a bit more restrictive than that in that it doesn't seem it will allow more than one key of the same type, not just the same value. Is that correct? If so, is there a way to overcome that? If not, how is NoMatch comparing specific values of a type to prevent that? Or what am I missing?

    Thanks!

    ReplyDelete
    Replies
    1. "it doesn't seem it will allow more than one key of the same type"

      You're right that the HMap only allows one key of the same type, but it's a bit more restrictive than even that. This HMap design only allows singleton types and values to be keys. 5 is a singleton type and value, and can be used as a key in this HMap. "hello" is a singleton type and value and can be used as a key in this HMap. An instance of Any would not be useable as a key, as Any is not a Singleton type.

      "If so, is there a way to overcome that?" it depends. Part of the strength of this design is that we can determine what the result type for a get or drop at compile time. This is achieved by mapping from type to type. If we allowed non-singleton keys, then determining what the return type of get or drop should be would be impossible except at runtime, since determining which instance of Any you're dealing with would require a runtime equality check.

      Delete
    2. Ah right, the Singleton distinction. That clears it up, thanks!

      Delete
  2. This comment has been removed by the author.

    ReplyDelete