ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering

Luke Hutchison
Hi, I noticed that the method ModuleLayer#layers() uses the following code
to get the module layers in resolution order:

    /**
     * Returns an ordered stream of layers. The first element is this layer,
     * the remaining elements are the parent layers in DFS order.
     *
     * @implNote For now, the assumption is that the number of elements will
     * be very low and so this method does not use a specialized
spliterator.
     */
    Stream<ModuleLayer> layers() {
        List<ModuleLayer> allLayers = this.allLayers;
        if (allLayers != null)
            return allLayers.stream();

        allLayers = new ArrayList<>();
        Set<ModuleLayer> visited = new HashSet<>();
        Deque<ModuleLayer> stack = new ArrayDeque<>();
        visited.add(this);
        stack.push(this);

        while (!stack.isEmpty()) {
            ModuleLayer layer = stack.pop();
            allLayers.add(layer);

            // push in reverse order
            for (int i = layer.parents.size() - 1; i >= 0; i--) {
                ModuleLayer parent = layer.parents.get(i);
                if (!visited.contains(parent)) {
                    visited.add(parent);
                    stack.push(parent);
                }
            }
        }

        this.allLayers = allLayers =
Collections.unmodifiableList(allLayers);
        return allLayers.stream();
    }

This code represents a preorder DFS. The resulting order is *not* a
topological sort ordering. This can lead to an ancestral layer being listed
in the final ordering before a descendant layer. For example, consider
these four layer -> parent relationships:

    A -> [ B, C ]
    B -> [ D ]
    C -> [ D ]
    D -> [ ]

The ordering code as given in ModuleLayer will produce the order [ A, B, D,
C ], causing the layers D and C to be out of order. The order is not a
toplological sort ordering, which guarantees that all descendants of a node
are listed before the node, and all parents of the node are listed after it.

To fix this, the search order should be changed to a *reversed postorder
DFS*. The current code uses a stack to implement iterative DFS. However,
the recursive postorder DFS algorithm needs to do some work after the
recursive call returns (specifically adding the current node to the
beginning of the output ordering), so in effect, the current iterative code
cannot be directly modified to yield a postorder traversal, without having
two different types of nodes pushed onto the stack ("recurse to parent"
nodes and "return from child" nodes), effectively mimicking Continuation
Passing Style. It is simpler therefore to change the code to make it
recursive.

The list allLayers should be changed to a LinkedList, allowing nodes to be
pushed onto the beginning of the list, so that the ordering doesn't have to
be reversed at the end (as required by topological sorting). The
allLayers.add(layer) line should be moved after the iteration returns (to
give postorder traversal), and should be changed to allLayers.push(layer) (to
reverse the output ordering). Also, the loop through parent nodes should no
longer be inverted, since we're not using a FIFO stack to process parents
anymore.

The fixed algorithm is as follows:

    /** Recursively find the topological sort order of ancestral layers. */
    private void findLayerOrder(Set<ModuleLayer> visited,
            LinkedList<ModuleLayer> allLayersOut) {
        if (visited.add(this)) {
            for (int i = 0; i < parents.size(); i++) {
                ModuleLayer parent = parents.get(i);
                parent.findLayerOrder(visited, allLayersOut);
            }
            allLayersOut.push(this);
        }
    }

    /**
     * Returns an ordered stream of layers. The first element is this layer,
     * the remaining elements are the parent layers in topological sort
order.
     *
     * @implNote For now, the assumption is that the number of elements will
     * be very low and so this method does not use a specialized
spliterator.
     */
    Stream<ModuleLayer> layers() {
        if (this.allLayers == null) {
            LinkedList<ModuleLayer> allLayers = new LinkedList<>();
            findLayerOrder(new HashSet<ModuleLayer>(), allLayers);
            this.allLayers = Collections.unmodifiableList(allLayers);
        }
        return this.allLayers.stream();
    }

The same changes should also be made to Configuration#configurations() ,
since it uses the same broken DFS algorithm.

Is there anywhere else in the java.lang.module code that needs to be fixed
to properly perform topological sort?
Reply | Threaded
Open this post in threaded view
|

Re: ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering

David Lloyd
On Thu, Jun 14, 2018 at 10:37 PM, Luke Hutchison <[hidden email]> wrote:

> The list allLayers should be changed to a LinkedList, allowing nodes to be
> pushed onto the beginning of the list, so that the ordering doesn't have to

A small quibble: LinkedList should normally never be used.  You can
push things on to the beginning of an ArrayDeque (it is a double-ended
queue after all).

> longer be inverted, since we're not using a FIFO stack to process parents

A stack would be LIFO.  But at any rate the above comment applies:
anything LinkedList can do, ArrayDeque can probably do better.
--
- DML
Reply | Threaded
Open this post in threaded view
|

Re: ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering

Alan Bateman
In reply to this post by Luke Hutchison
On 15/06/2018 04:37, Luke Hutchison wrote:

> :
>
> This code represents a preorder DFS. The resulting order is *not* a
> topological sort ordering. This can lead to an ancestral layer being listed
> in the final ordering before a descendant layer. For example, consider
> these four layer -> parent relationships:
>
>      A -> [ B, C ]
>      B -> [ D ]
>      C -> [ D ]
>      D -> [ ]
>
> The ordering code as given in ModuleLayer will produce the order [ A, B, D,
> C ], causing the layers D and C to be out of order.
Multi parent configurations is a somewhat niche area but it is specified
to be DFS. If there someone in spec that has lead to you to assume the
above is incorrect? I'm just wondering if there is wording that need to
be improved or fixed somewhere.

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering

Luke Hutchison
(Sorry Alan for the dup -- I forgot to Reply-All to the list)

On Fri, Jun 15, 2018 at 8:50 AM Alan Bateman <[hidden email]>
wrote:

> > The ordering code as given in ModuleLayer will produce the order [ A, B,
> D,
> > C ], causing the layers D and C to be out of order.
> Multi parent configurations is a somewhat niche area but it is specified
> to be DFS. If there someone in spec that has lead to you to assume the
> above is incorrect? I'm just wondering if there is wording that need to
> be improved or fixed somewhere.
>

The alternative algorithm I suggested is still DFS -- it is just
postorder+reversed in its output, rather than pretorder.

It is not in reference to the spec that I make this suggestion, it is due
to the logical argument that the existing code is written not just to
handle tree-structured multi-parenting, but fully DAG-structured ancestral
graphs -- and because the existing code appears to attempt to ensure that
(1) a node is listed before its parent in the output, and (2) if there are
multiple parents, those parents are listed in the same relative order in
the output. These two criteria together would turn the partial ordering of
the DAG into a total ordering without any ambiguity. However, criterion (1)
is violated by the current code in DAG-structured ancestral graphs, in all
but the first directed path to the most distant common ancestor node.

I'm assuming that since more than one module may define classes in the same
package, it is possible to have the same class defined multiple times in
different layers. Therefore, if the intent of specifying that DFS should be
used for module resolution is that class definitions in shallower layers
should mask definitions of the same class in deeper layers, then
topological ordering does in fact matter -- because otherwise, as shown in
the example I gave, it is possible for deeper layers to be reached before
shallower layers.

On Fri, Jun 15, 2018 at 8:26 AM David Lloyd <[hidden email]> wrote:

> > The list allLayers should be changed to a LinkedList, allowing nodes to
> be
> > pushed onto the beginning of the list, so that the ordering doesn't have
> to
>
> A small quibble: LinkedList should normally never be used.  You can
> push things on to the beginning of an ArrayDeque (it is a double-ended
> queue after all).
>

Thank you, I always assumed that ArrayDeque had O(size()) time for
insertion at the head -- but I just looked at the implementation and it's
O(1) -- very clever :-)


On Fri, Jun 15, 2018 at 8:50 AM Alan Bateman <[hidden email]>
wrote:

> On 15/06/2018 04:37, Luke Hutchison wrote:
> > :
> >
> > This code represents a preorder DFS. The resulting order is *not* a
> > topological sort ordering. This can lead to an ancestral layer being
> listed
> > in the final ordering before a descendant layer. For example, consider
> > these four layer -> parent relationships:
> >
> >      A -> [ B, C ]
> >      B -> [ D ]
> >      C -> [ D ]
> >      D -> [ ]
> >
> > The ordering code as given in ModuleLayer will produce the order [ A, B,
> D,
> > C ], causing the layers D and C to be out of order.
> Multi parent configurations is a somewhat niche area but it is specified
> to be DFS. If there someone in spec that has lead to you to assume the
> above is incorrect? I'm just wondering if there is wording that need to
> be improved or fixed somewhere.
>
> -Alan
>
Reply | Threaded
Open this post in threaded view
|

Re: ModuleLayer#layers() and Configuration#configurations() produces non-topological ordering

Luke Hutchison
Hi Alan, I was just wondering if you could please comment on this issue I
raised:

On Mon, Jun 25, 2018 at 6:00 PM Luke Hutchison <[hidden email]> wrote:

> I'm assuming that since more than one module may define classes in the
> same package, it is possible to have the same class defined multiple times
> in different layers. Therefore, if the intent of specifying that DFS should
> be used for module resolution is that class definitions in shallower layers
> should mask definitions of the same class in deeper layers, then
> topological ordering does in fact matter -- because otherwise, as shown in
> the example I gave, it is possible for deeper layers to be reached before
> shallower layers.
>

My specific questions are:

(1) Is it indeed possible for a class to be defined multiple times
(potentially differently) in different module layers, using the same
fully-qualified class name; and

(2) If this is possible, are the intended semantics in this case for the
class definition in descendant (child) layers to mask the class definition
in ancestral (parent) layers -- similarly to how a class definition found
earlier in the traditional classpath will mask other definitions of the
same class later in the classpath?