The split package problem

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

The split package problem

Jochen Theodorou
Hi all,

I do often read about this "split package problem", but I never did see
a proper explanation about why it matters to jigsaw so much that we do
not allow it. Can somebody enlighten me?

bye Jochen
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Remi Forax
There are two issues with split packages,
- if you have the same class in each part of the package, the behavior of your problem depend on the order in the classpath,
  i've experienced this kind of bugs with two different libraries requiring different version of ASM, at runtime, a class of the older version was calling a class of the newer version :(
- security, if you allow split packages, you allow anybody to insert any classes in any packages.

regards,
Rémi

----- Mail original -----
> De: "Jochen Theodorou" <[hidden email]>
> À: [hidden email]
> Envoyé: Vendredi 4 Novembre 2016 09:11:51
> Objet: The split package problem

> Hi all,
>
> I do often read about this "split package problem", but I never did see
> a proper explanation about why it matters to jigsaw so much that we do
> not allow it. Can somebody enlighten me?
>
> bye Jochen
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou
On 04.11.2016 09:25, Remi Forax wrote:
> There are two issues with split packages,
> - if you have the same class in each part of the package, the behavior of your problem depend on the order in the classpath,
>    i've experienced this kind of bugs with two different libraries requiring different version of ASM, at runtime, a class of the older version was calling a class of the newer version :(
> - security, if you allow split packages, you allow anybody to insert any classes in any packages.

ok, not sure if I agree that these are reason enough for the annoyance,
but at least I know the proper reason now ;)

bye Jochen
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Alan Bateman
On 04/11/2016 09:22, Jochen Theodorou wrote:

> On 04.11.2016 09:25, Remi Forax wrote:
>> There are two issues with split packages,
>> - if you have the same class in each part of the package, the
>> behavior of your problem depend on the order in the classpath,
>>    i've experienced this kind of bugs with two different libraries
>> requiring different version of ASM, at runtime, a class of the older
>> version was calling a class of the newer version :(
>> - security, if you allow split packages, you allow anybody to insert
>> any classes in any packages.
>
> ok, not sure if I agree that these are reason enough for the
> annoyance, but at least I know the proper reason now ;)
This is all part of reliable configuration where you can prove
correctness by construction. Alex's "Under the Hood" session from
JavaOne 2016 [1] is a great resource for understanding the science.

-Alan

[1] http://openjdk.java.net/projects/jigsaw/talks/#j1-2016
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Alessio Stalla
In reply to this post by Jochen Theodorou
Also, I think there is a problem related to how class loading works. In the
vanilla world of Java apps with a single classloader, if you load classes
in the same (source) package from different sources, the end up in the same
(runtime) package. But if you have different classloaders at play, and load
a class p.C1 from classloader 1 and p.C2 from classloader 2, even if they
share the same (source) package, p, they end up in different (runtime)
packages, let's call them p(1) and p(2). Therefore, C2 cannot access the
protected members of C1. That can be confusing.

On 4 November 2016 at 10:22, Jochen Theodorou <[hidden email]> wrote:

> On 04.11.2016 09:25, Remi Forax wrote:
>
>> There are two issues with split packages,
>> - if you have the same class in each part of the package, the behavior of
>> your problem depend on the order in the classpath,
>>    i've experienced this kind of bugs with two different libraries
>> requiring different version of ASM, at runtime, a class of the older
>> version was calling a class of the newer version :(
>> - security, if you allow split packages, you allow anybody to insert any
>> classes in any packages.
>>
>
> ok, not sure if I agree that these are reason enough for the annoyance,
> but at least I know the proper reason now ;)
>
> bye Jochen
>
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou
In reply to this post by Alan Bateman

On 04.11.2016 10:33, Alan Bateman wrote:
[...]
> This is all part of reliable configuration where you can prove
> correctness by construction. Alex's "Under the Hood" session from
> JavaOne 2016 [1] is a great resource for understanding the science.
>
> -Alan
>
> [1] http://openjdk.java.net/projects/jigsaw/talks/#j1-2016

I could now go into lengths as of why I do not believe in proven
correctness by construction, but that is probably too offtopic here ;)

What I see mostly is that all the problems you have now on a per class
level, you later have on a per module level again... for example needing
two versions of the same library being active at the same time... and
the movement away from the static module descriptor to a dynamic module
loading system with layers and all kinds of shenanigans that will bypass
these efforts in the end again

bye JCoehn
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Alan Bateman
On 04/11/2016 10:04, Jochen Theodorou wrote:

> :
>
> What I see mostly is that all the problems you have now on a per class
> level, you later have on a per module level again... for example
> needing two versions of the same library being active at the same
> time... and the movement away from the static module descriptor to a
> dynamic module loading system with layers and all kinds of shenanigans
> that will bypass these efforts in the end again
I'm not sure what you mean here but reliable configuration extends to
layers of modules too. When creating the configuration for a layer then
you can't have a module M that reads two other modules (irrespective of
which layer they are in) that export the same package to M.

So what is the background to this thread, are you running into a split
package issue when trying to migrate something to modules?

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Andrew Dinn
In reply to this post by Jochen Theodorou
On 04/11/16 10:04, Jochen Theodorou wrote:
> What I see mostly is that all the problems you have now on a per class
> level, you later have on a per module level again... for example needing
> two versions of the same library being active at the same time... and
> the movement away from the static module descriptor to a dynamic module
> loading system with layers and all kinds of shenanigans that will bypass
> these efforts in the end again

Whatever complexity might be involved in managing module layers I think
it's an overstatement to say that the situation will return to the
status quo:

 - Modules stop you providing two versions of a package in the same
layer, a problem for classpath deployment which, as Remi noted, can
easily lead to you mixing classes from two different versions of a library.

 - You can indeed use a dynamic module loading system based on layers to
introduce versions of the same classes in different layers. However, the
unique package ownership requirements means that those layers cannot
partake in an ancestor relationship again with the result that duplicate
classes cannot then be conflated in the way Remi described.

So, it seems from the above that layers are indeed a structured way to
avoid one of the major causes of 'classloader hell' precisely because
they detect and reject deployments which introduce these ambiguities in
resolution.

Are you suggesting that modules will not provide this specific benefit
or that some other problem will render it of no importance? If the
former can you explain why? If the latter can you explain what that
other problem is?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Andrew Dinn
In reply to this post by Alessio Stalla


On 04/11/16 09:33, Alessio Stalla wrote:
> Also, I think there is a problem related to how class loading works. In the
> vanilla world of Java apps with a single classloader, if you load classes
> in the same (source) package from different sources, the end up in the same
> (runtime) package. But if you have different classloaders at play, and load
> a class p.C1 from classloader 1 and p.C2 from classloader 2, even if they
> share the same (source) package, p, they end up in different (runtime)
> packages, let's call them p(1) and p(2). Therefore, C2 cannot access the
> protected members of C1. That can be confusing.

Modules are not going to stop you having two different runtime versions
of a class. You can do that by loading the same module in two unrelated
module layers. However, what is guaranteed is that those two versions
cannot be conflated thanks to resolution-time ambiguity.

n.b. the above only applies ot code in named modules; classpath
deployment still allows split packages and hence is still susceptible to
resolution-time ambiguity.

regards,


Andrew Dinn
-----------

Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Cédric Champeau
In reply to this post by Alan Bateman
>
>
>
> So what is the background to this thread, are you running into a split
> package issue when trying to migrate something to modules?
>
> -Alan
>
I cannot speak for Jochen but at least this is something we have been
discussing for a long time on the Groovy MLs since we know Jigsaw will
prevent split packages. Basically our problem is binary backwards
compatibility. Groovy has a long history, so before Groovy 2.0, everything
was found in a single jar. We provided several versions of Groovy (we still
do), but that is more a dependency management issue.

The real problem is more that after Groovy 2 we started splitting Groovy
into different jars. But for backwards compatibility we didn't change the
packages. So you can find classes in package groovy.util in both the
`groovy-core` jar and the `groovy-xml` jar. Those jars would be obvious
candidates for modules but as split packages are not allowed this is not
possible. This doesn't give us many options. I was in favor of breaking
everything, since Jigsaw is a long journey in any case, and leverage that
to have Groovy 3 use different packages. Of course not everybody is happy
with this, because it's a huge issue for all existing libraries that rely
on "old" packages. We would basically break every Groovy program out there.
Both at the source and binary level.

It's a very big issue, typically all Gradle plugins written in Groovy would
break. Gradle itself would break. There are things to mitigate that, like
rewriting classes at load time, or an easier solution which is to have a
single Groovy module for all (and come back to the monolith era, sigh...).

We haven't made any decision yet. Not something we can take easy.
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Alan Bateman


On 04/11/2016 11:06, Cédric Champeau wrote:
> :
>
> We haven't made any decision yet. Not something we can take easy.
Understood, and not for me to say, but I would assume the priority has
to be to get Groovy working with JDK 9 first, maybe this is what
Jochen's other thread about Unsafe is about.

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Sander Mak
In reply to this post by Cédric Champeau

> On 04 Nov 2016, at 12:06, Cédric Champeau <[hidden email]> wrote:
>
> It's a very big issue, typically all Gradle plugins written in Groovy would
> break. Gradle itself would break. There are things to mitigate that, like
> rewriting classes at load time, or an easier solution which is to have a
> single Groovy module for all (and come back to the monolith era, sigh...).


Wouldn't a better solution be to provide a groovy-all module that 'requires transitive' the other actual groovy modules. That leaves you, as a library designer, the freedom to modularise freely behind the scenes (probably moving the split packages back together in a module), while not burdening users who just want to grab groovy and run with it. People who do know they actually only use parts are free to require just the modules they need, without using groovy-all. That does admittedly not solve the problem of breaking backward compatibility for non-Java 9 users.


Sander


Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou


On 04.11.2016 12:36, Sander Mak wrote:

>
>> On 04 Nov 2016, at 12:06, Cédric Champeau <[hidden email]> wrote:
>>
>> It's a very big issue, typically all Gradle plugins written in Groovy would
>> break. Gradle itself would break. There are things to mitigate that, like
>> rewriting classes at load time, or an easier solution which is to have a
>> single Groovy module for all (and come back to the monolith era, sigh...).
>
>
> Wouldn't a better solution be to provide a groovy-all module that 'requires transitive' the other actual groovy modules.

I agree in that we should have done this... but

> (probably moving the split packages back together in a module)

that is exactly the problem. if you move them back, you get half of the
dependencies along, which makes the modules itself kind of obsolete.

bye Jochen
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou
In reply to this post by Andrew Dinn


On 04.11.2016 11:50, Andrew Dinn wrote:
[...]
>  - Modules stop you providing two versions of a package in the same
> layer, a problem for classpath deployment which, as Remi noted, can
> easily lead to you mixing classes from two different versions of a library.

but sometimes you have to do something similar to that. Usually you then
start with classloader magic and have to find your way around
"duplicated classes".

What is targeted with modules is that this is done by accident. And
frankly, my experience was that those accidental cases have been
resolved pretty fast.

>  - You can indeed use a dynamic module loading system based on layers to
> introduce versions of the same classes in different layers. However, the
> unique package ownership requirements means that those layers cannot
> partake in an ancestor relationship again with the result that duplicate
> classes cannot then be conflated in the way Remi described.

sure, nobody really needs that on the classpath... except for a poor
mans patching.

> So, it seems from the above that layers are indeed a structured way to
> avoid one of the major causes of 'classloader hell' precisely because
> they detect and reject deployments which introduce these ambiguities in
> resolution.

"classloader hell" goes far beyond the classpath. The problem is rarely
caused by a classloader that loads two versions of the same library with
different classes. But happens then more easily if your classloaders
suddenly have to become forests. Frankly I do not see how modules will
avoid that.

> Are you suggesting that modules will not provide this specific benefit
> or that some other problem will render it of no importance? If the
> former can you explain why? If the latter can you explain what that
> other problem is?

well... assume you have an application and it requires the library A,
which transitively requires B-1. the application also requires library
C, which transitively requires B-2. B-1 and B-2 are not compatible.
library A and D leak instances of classes of B-1 and B-2 to the application.

Problem number 1, how to start this if application, A, D, B-1 and B-2
are modules? You don't care that the module system does not allow for
this, you have to run it in that configuration and you have to have a
way around this. Which means you will have to load modules dynamically
in different layers. At this point the benefit already become a burden
and is nullified as benefit.

Problem number 2... the layers in which B-1 and B-2 reside in still have
their own versions of classes, which are by name equal in B-1 and B-2.
If application is actually using classes directly from B-1 or B-2 you
get the classloader hell problem, just the same as you would without the
module system beyond considering the classpath.

bye Jochen
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Alan Bateman
On 04/11/2016 13:22, Jochen Theodorou wrote:

> :
>
> well... assume you have an application and it requires the library A,
> which transitively requires B-1. the application also requires library
> C, which transitively requires B-2. B-1 and B-2 are not compatible.
> library A and D leak instances of classes of B-1 and B-2 to the
> application.

Assuming B-1 and B-2 export the same packages, A `requires transitive
B-1` (because a method in A's API returns a B type), C `requires
transitive B-2` (because a method in C's API returns a B type) then you
will get an exception when attempting to create the configuration. The
exception will tell you that the application module reads two modules
(B-2 and B-2) that export the same package to the application.
Attempting to do this with multiple configuration + layers isn't going
to help, it's just not safe, and you'll get the same exception.

You can of course go off-piste and use the reflective API to have the
application read both B-1 and B-2, and if it its your own class loaders
then you can go crazy with split delegation, but that is not something
that you get out of the box.

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Andrew Dinn
In reply to this post by Jochen Theodorou
On 04/11/16 13:22, Jochen Theodorou wrote:
> "classloader hell" goes far beyond the classpath. The problem is rarely
> caused by a classloader that loads two versions of the same library with
> different classes. But happens then more easily if your classloaders
> suddenly have to become forests. Frankly I do not see how modules will
> avoid that.

Yes, indeed, "classloader hell" goes far beyond the classpath. Jigsaw
provides a structured way of dealing with the specific aspect of the
classloader problem Remi mentioned viz detecting and rejecting
resolve-time ambiguities. That doesn't avoid the possibility of runtime
ambiguity i.e. you can still pass an instance of Foo as loaded by layer
A to code which has its own version of Foo as loaded by layer B and get
an error. I don't think I claimed Jigsaw was Dr Quack's Patent Snake Oil
Remedy and Cure-All rather that it would be of some help :-)

> well... assume you have an application and it requires the library A,
> which transitively requires B-1. the application also requires library
> C, which transitively requires B-2. B-1 and B-2 are not compatible.
> library A and D leak instances of classes of B-1 and B-2 to the
> application.
>
> Problem number 1, how to start this if application, A, D, B-1 and B-2
> are modules? You don't care that the module system does not allow for
> this, you have to run it in that configuration and you have to have a
> way around this. Which means you will have to load modules dynamically
> in different layers. At this point the benefit already become a burden
> and is nullified as benefit.

Is this any more of a burden than 1) having to have arrange a tree of
classloaders to ensure that A sees B1 and D sees B2 and 2) then ensuring
that instances of B1 and B2 leaked from A and D don't get conflated by
the app? Would that not by the same token nullify the benefits of having
a classpath?

> Problem number 2... the layers in which B-1 and B-2 reside in still have
> their own versions of classes, which are by name equal in B-1 and B-2.
> If application is actually using classes directly from B-1 or B-2 you
> get the classloader hell problem, just the same as you would without the
> module system beyond considering the classpath.

Yes, fixing resolve-time ambiguity doesn't fix runtime ambiguity. So, if
your problem is runtime ambiguity then you'll still have to fix that.
But not being able to fix one mess doesn't mean Jigsaw is of no help for
another mess.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou
In reply to this post by Alan Bateman
On 04.11.2016 15:29, Alan Bateman wrote:

> On 04/11/2016 13:22, Jochen Theodorou wrote:
>
>> :
>>
>> well... assume you have an application and it requires the library A,
>> which transitively requires B-1. the application also requires library
>> C, which transitively requires B-2. B-1 and B-2 are not compatible.
>> library A and D leak instances of classes of B-1 and B-2 to the
>> application.
>
> Assuming B-1 and B-2 export the same packages, A `requires transitive
> B-1` (because a method in A's API returns a B type), C `requires
> transitive B-2` (because a method in C's API returns a B type) then you
> will get an exception when attempting to create the configuration. The
> exception will tell you that the application module reads two modules
> (B-2 and B-2) that export the same package to the application.

I can compile the application if there is only B-1 or B-2 available at
that time, regardless of if that is actually the wrong version for A or
C. jigsaw does after all not care about the versions. This would mean
the application cannot be run with only B-1 or B-2 of course. And it
means I cannot run the application normally either.

> Attempting to do this with multiple configuration + layers isn't going
> to help, it's just not safe, and you'll get the same exception.

One layer with B-1 and C and another with B-2 and D and my application
"reading" from those layers... is that something that cannot be done? So
far I had the impression I can

> You can of course go off-piste and use the reflective API to have the
> application read both B-1 and B-2, and if it its your own class loaders
> then you can go crazy with split delegation, but that is not something
> that you get out of the box.

you rarely get the classloader hell problem out of the box either. And
configurations like the ones above are not all that rare for me... just
so far that had been with classloaders only, now it would be with
classloader and modules and layers

bye Jochen

Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou
In reply to this post by Andrew Dinn
On 04.11.2016 15:54, Andrew Dinn wrote:
[...]

>> Problem number 1, how to start this if application, A, D, B-1 and B-2
>> are modules? You don't care that the module system does not allow for
>> this, you have to run it in that configuration and you have to have a
>> way around this. Which means you will have to load modules dynamically
>> in different layers. At this point the benefit already become a burden
>> and is nullified as benefit.
>
> Is this any more of a burden than 1) having to have arrange a tree of
> classloaders to ensure that A sees B1 and D sees B2 and 2) then ensuring
> that instances of B1 and B2 leaked from A and D don't get conflated by
> the app? Would that not by the same token nullify the benefits of having
> a classpath?

the classpath is not there to solve this.

>> Problem number 2... the layers in which B-1 and B-2 reside in still have
>> their own versions of classes, which are by name equal in B-1 and B-2.
>> If application is actually using classes directly from B-1 or B-2 you
>> get the classloader hell problem, just the same as you would without the
>> module system beyond considering the classpath.
>
> Yes, fixing resolve-time ambiguity doesn't fix runtime ambiguity. So, if
> your problem is runtime ambiguity then you'll still have to fix that.
> But not being able to fix one mess doesn't mean Jigsaw is of no help for
> another mess.

for the other mess I usually use gradle

bye Jochen

Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Alan Bateman
In reply to this post by Jochen Theodorou
On 04/11/2016 16:50, Jochen Theodorou wrote:

> :
>
>> Attempting to do this with multiple configuration + layers isn't going
>> to help, it's just not safe, and you'll get the same exception.
>
> One layer with B-1 and C and another with B-2 and D and my application
> "reading" from those layers... is that something that cannot be done?
> So far I had the impression I can
The application module reads other modules rather than layers, in this
case it results in the application module reading C, B-1, D, and B-2
(assuming that the application modules `requires C` and `requires D` and
both of these modules `requires transitive B`). This will fail because
the application reads B-1 and B-2, both of which export the same
packages to the application module.

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: The split package problem

Jochen Theodorou
On 05.11.2016 08:39, Alan Bateman wrote:

> On 04/11/2016 16:50, Jochen Theodorou wrote:
>
>> :
>>
>>> Attempting to do this with multiple configuration + layers isn't going
>>> to help, it's just not safe, and you'll get the same exception.
>>
>> One layer with B-1 and C and another with B-2 and D and my application
>> "reading" from those layers... is that something that cannot be done?
>> So far I had the impression I can
> The application module reads other modules rather than layers, in this
> case it results in the application module reading C, B-1, D, and B-2
> (assuming that the application modules `requires C` and `requires D` and
> both of these modules `requires transitive B`). This will fail because
> the application reads B-1 and B-2, both of which export the same
> packages to the application module.

And if that is done at runtime ti still fails?

bye Jochen

12