Explicitly empty sourcepath regression

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Explicitly empty sourcepath regression

Pepper Lebeck-Jobe
I'll start with a question, and then give an opinion.

*Question*
Why must the source files which make up a module be on the source path for
the module to be compiled?

*Opinion*
Build tools, especially Gradle, attempt to make reproducible builds a
reality. One thing these tools offer is fine-grained control over the set
of java files which will be passed to javac for compilation. Historically,
we have even explicitly set the `-sourcepath` to be empty to tell the
compiler not to look for source files on the classpath or in the current
working directory. Combining this with an exact specification of which java
files should be compiled supports a few important use cases:

   1. You can exclude experimental sources (not yet ready to compile)
   easily to unblock local development without fear that they will break the
   build.
   2. You can set up build logic to dynamically include or exclude specific
   files without having to copy those files around to various directories
   which are or are not on the source path, so that you can produce different
   variations of a library from the original sources.

We feel that having to isolate the files which make up a module into
directories on the `-sourcepath` would limit the flexibility of the build
system, and, possibly hurt reproducibility of the builds.

*Reproduction*
In case anyone on the list doesn't understand what I mean when I say that
module source files are required to be on the source path, I've created a
tiny GitHub repo <https://github.com/eljobe/modules/blob/master/README.md>,
with instructions on reproducing what I'm seeing.

Thanks for your consideration,
Pepper

[image: [hidden email]]

Pepper Lebeck-Jobe

Principal Engineer

Gradle Inc.

P. +1 (919) 439-7557 <(919)%20439-7557>

W. gradle.com <https://www.gradle.com/>

Gradle Summit is now open

Click HERE to register <https://summit.gradle.com/>
Reply | Threaded
Open this post in threaded view
|

Re: Explicitly empty sourcepath regression

Alan Bateman
On 16/06/2017 05:07, Pepper Lebeck-Jobe wrote:

> I'll start with a question, and then give an opinion.
>
> *Question*
> Why must the source files which make up a module be on the source path for
> the module to be compiled?
>
> *Opinion*
> Build tools, especially Gradle, attempt to make reproducible builds a
> reality. One thing these tools offer is fine-grained control over the set
> of java files which will be passed to javac for compilation. Historically,
> we have even explicitly set the `-sourcepath` to be empty to tell the
> compiler not to look for source files on the classpath or in the current
> working directory. Combining this with an exact specification of which java
> files should be compiled supports a few important use cases:
>
>     1. You can exclude experimental sources (not yet ready to compile)
>     easily to unblock local development without fear that they will break the
>     build.
>     2. You can set up build logic to dynamically include or exclude specific
>     files without having to copy those files around to various directories
>     which are or are not on the source path, so that you can produce different
>     variations of a library from the original sources.
>
> We feel that having to isolate the files which make up a module into
> directories on the `-sourcepath` would limit the flexibility of the build
> system, and, possibly hurt reproducibility of the builds.
>
> *Reproduction*
> In case anyone on the list doesn't understand what I mean when I say that
> module source files are required to be on the source path, I've created a
> tiny GitHub repo <https://github.com/eljobe/modules/blob/master/README.md>,
> with instructions on reproducing what I'm seeing.
>
Jon or Jan might want to comment on this but there does appear to be a
corner case in javac when none of the paths specified to -sourcepath
exist and the source to compile includes a module-info.java.

That said, the javac command looks fishy and not clear why -cp and
-sourcepath are specified when compiling the module. Are these just
inherited from existing build code that hasn't been updated for modules?
Have you looked at using -implicit:none rather than setting using empty
paths?

-Alan
Reply | Threaded
Open this post in threaded view
|

Re: Explicitly empty sourcepath regression

Pepper Lebeck-Jobe
On Fri, Jun 16, 2017 at 2:49 PM Alan Bateman <[hidden email]>
wrote:

> On 16/06/2017 05:07, Pepper Lebeck-Jobe wrote:
> > I'll start with a question, and then give an opinion.
> >
> > *Question*
> > Why must the source files which make up a module be on the source path
> for
> > the module to be compiled?
> >
> > *Opinion*
> > Build tools, especially Gradle, attempt to make reproducible builds a
> > reality. One thing these tools offer is fine-grained control over the set
> > of java files which will be passed to javac for compilation.
> Historically,
> > we have even explicitly set the `-sourcepath` to be empty to tell the
> > compiler not to look for source files on the classpath or in the current
> > working directory. Combining this with an exact specification of which
> java
> > files should be compiled supports a few important use cases:
> >
> >     1. You can exclude experimental sources (not yet ready to compile)
> >     easily to unblock local development without fear that they will
> break the
> >     build.
> >     2. You can set up build logic to dynamically include or exclude
> specific
> >     files without having to copy those files around to various
> directories
> >     which are or are not on the source path, so that you can produce
> different
> >     variations of a library from the original sources.
> >
> > We feel that having to isolate the files which make up a module into
> > directories on the `-sourcepath` would limit the flexibility of the build
> > system, and, possibly hurt reproducibility of the builds.
> >
> > *Reproduction*
> > In case anyone on the list doesn't understand what I mean when I say that
> > module source files are required to be on the source path, I've created a
> > tiny GitHub repo <
> https://github.com/eljobe/modules/blob/master/README.md>,
> > with instructions on reproducing what I'm seeing.
> >
> Jon or Jan might want to comment on this but there does appear to be a
> corner case in javac when none of the paths specified to -sourcepath
> exist and the source to compile includes a module-info.java.
>
> That said, the javac command looks fishy and not clear why -cp and
> -sourcepath are specified when compiling the module. Are these just
> inherited from existing build code that hasn't been updated for modules?
>

Not exactly. Let me take them each in turn.

The reason I explicitly "unset CLASSPATH" and "-cp ''" in the
"bin/worksJ9.sh" script is because the documentation says that when
"-sourcepath" isn't specified the class path will be searched for source
files and that when "-cp" is not specified, the default value for the
classpath is the current working directory. We consider it too "fast and
loose" to coincidentally include source files which are in the current
working directory and want to ensure that the only files seen by the
compiler are the ones we explicitly pass it by fully qualified filesystem
paths. That way, those complete set of reachable source files have to be
included among the specified inputs to the task we are running, and we can
use that specified set to calculate whether or not there have been changes
in those files since the previous run of the build and avoid calling javac
at all if we know none of the inputs have changed. I'll put it another way.
Suppose, we didn't explicitly empty the sourcepath, and there was a source
file in the current working directory (say, "Parent.java") which was needed
for compiling one of the sources we explicitly passed to javac (say,
"Child.java"). The first time we run the build, it would succeed. But,
then, suppose someone changes "Parent.java" and runs the build again. Since
only "Child.java" is explicitly known to us, and we know that the source
file has not changed since we last ran the task, we will incorrectly skip
running the task again. I say, "incorrectly" because the result would be a
successful build, but the user would be confused as to why the changes to
"Parent.java" weren't being picked up by the build.

I only left "-cp ''" in "bin/failsJ9.sh" to show that I didn't change
anything between the two invocations except for explicitly setting the
"-sourcepath" to be empty. In fact, given the documentation that when
"-sourcepath" is not set the classpath will be searched for sources, I
actually don't understand why the same failure isn't happening without
explicitly setting "-sourcepath". Whatever rule is being violated by not
having the source files on the source path when compiling the module,
should logically be violated as well when the source path == the classpath
as both are set to the same empty path string.


> Have you looked at using -implicit:none rather than setting using empty
> paths?
>

If I understand the documentation of that option correctly, then it doesn't
alter the scenario described above. Essentially, it would change whether or
not "Parent.class" was generated during the original execution of the
"javac" command, but we still wouldn't know to watch "Parent.java" for
changes so we could know to recompile.


> -Alan
>

Thanks for the timely response.
Reply | Threaded
Open this post in threaded view
|

Re: Explicitly empty sourcepath regression

Jonathan Gibbons
In reply to this post by Pepper Lebeck-Jobe


On 06/15/2017 09:07 PM, Pepper Lebeck-Jobe wrote:
> *Question*
> Why must the source files which make up a module be on the source path for
> the module to be compiled?

There are a number of aspects to the answer for this.

First, some background.

It has been a decision of long standing that the module membership of a
compilation
unit is determined by the "host system", which in the case of
command-line javac,
means by the position of the file in the file system. The alternative,
which was rejected
early on, was to have to edit every single compilation unit in code that
was being
modularized to have a module declaration at the top, preceding the
package declaration.

While it may seem to follow that for any compilation unit, you could
just look in some
appropriate enclosing directory to find the module declaration, there
are some
important use cases where that is not enough. Generally, these use cases
are when
the source for a module is spread across different directory hierarchies
with different
directories for the package root. The two most common cases of this are when
some of the source for a module is generated, or when the source is a
combination
of some platform-independent code and some platform specific code. The
ability to
merge directory hierarchies like this is managed by using paths, as in
source paths
or class paths.

Now, to some more specific reasons for the design decision.

First ... consistency. It has always been the case for Jigsaw javac that
when compiling
code from multiple modules together, all the source files given on the
command line
had to be present on the module source path .. meaning, on the source
path for
a module on the module source path. That was always required for the
reasons described
earlier, to be able to determine the module membership of each source
file specified on
the comment line.  That was initially different from the case of
compiling a single module,
which initially was more like compiling "traditional" non-modular code.
In time, it became
clear that was a bad choice and it was better to have the compilation
requirements
for all modular code be more consistent, whether compiling one module or
many.

Second ... to avoid obscure errors. In the initial design, when
compiling a single module,
javac tried to infer the module being compiled from the presence of a
module declaration
in a compilation unit specified on the command line, or on the source
path or a module
declaration on the class path.  That led to the possibility of errors in
which the module
membership  of a compilation unit specified on the command  line (as
determined by its
position in the file system) could be different from the inferred module
being compiled.
For example, consider the case where the source path is empty, and the
command line
contains the source files for the module declaration for one module, and
the class
declarations for different module. There is no way, in that case, for
javac to detect the
misconfiguration and give a meaningful message. The solution was to
require that when
compiling modular code for a single module, all source files must appear
on the source
path so that javac can ensure that all sources files are part of the
same module.


That all being said, I understand the concerns that this sets up the
possibility of
files being implicitly compiled when that is not desired, by virtue of
being found
on the source path. I also agree that -implicit:none is not an ideal
solution to the
issue.  In time, we could consider a new option to better address the
problem.
In the meantime, the short term options are to either synthesize a
suitable source
directory with links, or to use the Compiler API to provide a custom
file manager that
uses an equivalent synthetic source path.

-- Jon
Reply | Threaded
Open this post in threaded view
|

Re: Explicitly empty sourcepath regression

Pepper Lebeck-Jobe
A few (fairly inconsequential comments in-line and then an unresolved
question at the bottom. Sorry to put the most important part last. Feel
free to skip ahead to that question as I think it is the most important
thing in the mail.

On Mon, Jun 19, 2017 at 12:30 AM Jonathan Gibbons <
[hidden email]> wrote:

>
>
> On 06/15/2017 09:07 PM, Pepper Lebeck-Jobe wrote:
> > *Question*
> > Why must the source files which make up a module be on the source path
> for
> > the module to be compiled?
>
> There are a number of aspects to the answer for this.
>
> First, some background.
>
> It has been a decision of long standing that the module membership of a
> compilation
> unit is determined by the "host system", which in the case of
> command-line javac,
> means by the position of the file in the file system. The alternative,
> which was rejected
> early on, was to have to edit every single compilation unit in code that
> was being
> modularized to have a module declaration at the top, preceding the
> package declaration.
>

I'm wondering if another alternative was considered. Namely, require
explicit declaration in the module declaration for all packages which make
up the module (even if they aren't exported by the module.) This would
provide the same mapping as having each compilation unit declare module
membership, but it would be consolidated in the module description and not
require specification for every class declaration as membership in a
package would imply membership in the module which has declared ownership
of that package.


> While it may seem to follow that for any compilation unit, you could
> just look in some
> appropriate enclosing directory to find the module declaration, there
> are some
> important use cases where that is not enough. Generally, these use cases
> are when
> the source for a module is spread across different directory hierarchies
> with different
> directories for the package root. The two most common cases of this are
> when
> some of the source for a module is generated, or when the source is a
> combination
> of some platform-independent code and some platform specific code. The
> ability to
> merge directory hierarchies like this is managed by using paths, as in
> source paths
> or class paths.
>
> Now, to some more specific reasons for the design decision.
>
> First ... consistency. It has always been the case for Jigsaw javac that
> when compiling
> code from multiple modules together, all the source files given on the
> command line
> had to be present on the module source path .. meaning, on the source
> path for
> a module on the module source path. That was always required for the
> reasons described
> earlier, to be able to determine the module membership of each source
> file specified on
> the comment line.  That was initially different from the case of
> compiling a single module,
> which initially was more like compiling "traditional" non-modular code.
> In time, it became
> clear that was a bad choice and it was better to have the compilation
> requirements
> for all modular code be more consistent, whether compiling one module or
> many.
>

But, right now, I don't think these requirements actually are consistent
between compiling one module or many. For example, imagine this directory
layout.

src/
  module-info.java
  com/
    foo/
      ModuleClass.java
otherSrc/
  com/
    bar/
      NotAModuleClass.java

If I compile with this command line:
javac -d build/classes/foo.module -sourcepath src:otherSrc $(find . -name
"*.java")

Then, javac will assume that com.bar.NotAModuleClass is a member of
`foo.module` (the module declared in src/module-info.java) even though it
is not rooted in the same filesystem directory with the module declaration
file or in a directory with the same name as the declared module. Is the
reasoning behind the design of this behavior summed up by this rule:

if (there is exactly one module declaration among the sources on the
command line) {
  all source files visible on the sourcepath must be part of that module
}

I'm not saying that this is a totally unreasonable rule. But, it does seem
less strict than what would happen if multiple modules were being compiled
with the --module-source-path argument and there was some rouge class file
which wasn't on the --module-source-path but was specified on the
command-line.

Second ... to avoid obscure errors. In the initial design, when

> compiling a single module,
> javac tried to infer the module being compiled from the presence of a
> module declaration
> in a compilation unit specified on the command line, or on the source
> path or a module
> declaration on the class path.  That led to the possibility of errors in
> which the module
> membership  of a compilation unit specified on the command  line (as
> determined by its
> position in the file system) could be different from the inferred module
> being compiled.
> For example, consider the case where the source path is empty, and the
> command line
> contains the source files for the module declaration for one module, and
> the class
> declarations for different module. There is no way, in that case, for
> javac to detect the
> misconfiguration and give a meaningful message. The solution was to
> require that when
> compiling modular code for a single module, all source files must appear
> on the source
> path so that javac can ensure that all sources files are part of the
> same module.
>

I guess this answers my previous question. So, yes, the rule is: if a
source file is both specified on the command line with a module declaration
and in the sourcepath, then it must be meant to be part of the module being
declared.

I do believe this makes the semantics of the sourcepath difficult to
understand because it has a different/additional meaning depending on
whether or not a module-info.java file is among the sources being compiled.


> That all being said, I understand the concerns that this sets up the
> possibility of
> files being implicitly compiled when that is not desired, by virtue of
> being found
> on the source path. I also agree that -implicit:none is not an ideal
> solution to the
> issue.  In time, we could consider a new option to better address the
> problem.
> In the meantime, the short term options are to either synthesize a
> suitable source
> directory with links, or to use the Compiler API to provide a custom
> file manager that
> uses an equivalent synthetic source path.
>

In the short-term, we can probably synthesize a temporary directory with
all the source files we know about and use that directory as the only
element on the sourcepath when we know we are compiling a single module.

-- Jon
>

I really appreciate your detailed explanation of why the design decisions
have been made the way they have, but I feel like I'm still missing one
important piece of the puzzle.

According to this documentation:
http://docs.oracle.com/javase/9/tools/javac.htm#JSWOR627

---
If -classpath, -classpath,, or -cp aren’t specified, then the user class
path is the current directory.

If the -sourcepath option isn’t specified, then the user class path is also
searched for source files.
---

So, if the default behavior (when -sourcepath is not specified) is
essentially to treat the classpath as the sourcepath.

Why in my example project here: https://github.com/eljobe/modules
does this work
javac -cp '' -d build/classes $(find src -name "*.java")
but this fail
javac -sourcepath '' -cp '' -d build/classes $(find src -name "*.java")

It seems to me, that according to the documentation, this is only the
difference between an implicitly (because the -cp option is empty) empty
sourcepath and an explicitly empty sourcepath.

One possibility is that the documentation is wrong and the default
sourcepath includes some directories in addition to the classpath. In which
case, I'd like to know what those directories are.

I'm not sure what other explanation would cause this difference in behavior.

Thanks again,
Pepper
Reply | Threaded
Open this post in threaded view
|

Re: Explicitly empty sourcepath regression

Pepper Lebeck-Jobe
I'm still very interested in hearing your thoughts on the issues I brought
up in the previous post to this thread, but I wanted to share an update
with this group about how the Gradle team has decided to handle the
sourcepath situation for our 4.1 release.

We have decided to just drop the `-sourcepath ""` arguments from our
`javac` invocations when we know that we are compiling for Java 9 AND that
there is a `mould-info.java` on the command-line.

We considered writing our own `JavaFileManager` implementation to wrap the
standard `JavaFileManager` and filter results from calls to `list()` with
`SOURCE_PATH` to eliminate any source files which were not already known to
Gradle. However, this would have meant that we would need to synthesize a
sourcepath which contained all of the source files we were passing on the
command-line so that the standard `JavaFileManager` could find the files in
the first place. Rather than tackle this complexity at this point, we
decided to settle for the implicitly empty `sourcepath` which seems to work
when compiling a single module.

If you change the behavior of the implicitly empty `sourcepath` to match
that of the explicitly empty `sourcepath` Gradle will not work with the
release of the JDK that makes that change. So, it would be nice to get a
warning if the decision is made to change those semantics to match.

Thanks,
Pepper

On Mon, Jun 19, 2017 at 11:07 AM Pepper Lebeck-Jobe <[hidden email]>
wrote:

> A few (fairly inconsequential comments in-line and then an unresolved
> question at the bottom. Sorry to put the most important part last. Feel
> free to skip ahead to that question as I think it is the most important
> thing in the mail.
>
> On Mon, Jun 19, 2017 at 12:30 AM Jonathan Gibbons <
> [hidden email]> wrote:
>
>>
>>
>> On 06/15/2017 09:07 PM, Pepper Lebeck-Jobe wrote:
>> > *Question*
>> > Why must the source files which make up a module be on the source path
>> for
>> > the module to be compiled?
>>
>> There are a number of aspects to the answer for this.
>>
>> First, some background.
>>
>> It has been a decision of long standing that the module membership of a
>> compilation
>> unit is determined by the "host system", which in the case of
>> command-line javac,
>> means by the position of the file in the file system. The alternative,
>> which was rejected
>> early on, was to have to edit every single compilation unit in code that
>> was being
>> modularized to have a module declaration at the top, preceding the
>> package declaration.
>>
>
> I'm wondering if another alternative was considered. Namely, require
> explicit declaration in the module declaration for all packages which make
> up the module (even if they aren't exported by the module.) This would
> provide the same mapping as having each compilation unit declare module
> membership, but it would be consolidated in the module description and not
> require specification for every class declaration as membership in a
> package would imply membership in the module which has declared ownership
> of that package.
>
>
>> While it may seem to follow that for any compilation unit, you could
>> just look in some
>> appropriate enclosing directory to find the module declaration, there
>> are some
>> important use cases where that is not enough. Generally, these use cases
>> are when
>> the source for a module is spread across different directory hierarchies
>> with different
>> directories for the package root. The two most common cases of this are
>> when
>> some of the source for a module is generated, or when the source is a
>> combination
>> of some platform-independent code and some platform specific code. The
>> ability to
>> merge directory hierarchies like this is managed by using paths, as in
>> source paths
>> or class paths.
>>
>> Now, to some more specific reasons for the design decision.
>>
>> First ... consistency. It has always been the case for Jigsaw javac that
>> when compiling
>> code from multiple modules together, all the source files given on the
>> command line
>> had to be present on the module source path .. meaning, on the source
>> path for
>> a module on the module source path. That was always required for the
>> reasons described
>> earlier, to be able to determine the module membership of each source
>> file specified on
>> the comment line.  That was initially different from the case of
>> compiling a single module,
>> which initially was more like compiling "traditional" non-modular code.
>> In time, it became
>> clear that was a bad choice and it was better to have the compilation
>> requirements
>> for all modular code be more consistent, whether compiling one module or
>> many.
>>
>
> But, right now, I don't think these requirements actually are consistent
> between compiling one module or many. For example, imagine this directory
> layout.
>
> src/
>   module-info.java
>   com/
>     foo/
>       ModuleClass.java
> otherSrc/
>   com/
>     bar/
>       NotAModuleClass.java
>
> If I compile with this command line:
> javac -d build/classes/foo.module -sourcepath src:otherSrc $(find . -name
> "*.java")
>
> Then, javac will assume that com.bar.NotAModuleClass is a member of
> `foo.module` (the module declared in src/module-info.java) even though it
> is not rooted in the same filesystem directory with the module declaration
> file or in a directory with the same name as the declared module. Is the
> reasoning behind the design of this behavior summed up by this rule:
>
> if (there is exactly one module declaration among the sources on the
> command line) {
>   all source files visible on the sourcepath must be part of that module
> }
>
> I'm not saying that this is a totally unreasonable rule. But, it does seem
> less strict than what would happen if multiple modules were being compiled
> with the --module-source-path argument and there was some rouge class file
> which wasn't on the --module-source-path but was specified on the
> command-line.
>
> Second ... to avoid obscure errors. In the initial design, when
>> compiling a single module,
>> javac tried to infer the module being compiled from the presence of a
>> module declaration
>> in a compilation unit specified on the command line, or on the source
>> path or a module
>> declaration on the class path.  That led to the possibility of errors in
>> which the module
>> membership  of a compilation unit specified on the command  line (as
>> determined by its
>> position in the file system) could be different from the inferred module
>> being compiled.
>> For example, consider the case where the source path is empty, and the
>> command line
>> contains the source files for the module declaration for one module, and
>> the class
>> declarations for different module. There is no way, in that case, for
>> javac to detect the
>> misconfiguration and give a meaningful message. The solution was to
>> require that when
>> compiling modular code for a single module, all source files must appear
>> on the source
>> path so that javac can ensure that all sources files are part of the
>> same module.
>>
>
> I guess this answers my previous question. So, yes, the rule is: if a
> source file is both specified on the command line with a module declaration
> and in the sourcepath, then it must be meant to be part of the module being
> declared.
>
> I do believe this makes the semantics of the sourcepath difficult to
> understand because it has a different/additional meaning depending on
> whether or not a module-info.java file is among the sources being compiled.
>
>
>> That all being said, I understand the concerns that this sets up the
>> possibility of
>> files being implicitly compiled when that is not desired, by virtue of
>> being found
>> on the source path. I also agree that -implicit:none is not an ideal
>> solution to the
>> issue.  In time, we could consider a new option to better address the
>> problem.
>> In the meantime, the short term options are to either synthesize a
>> suitable source
>> directory with links, or to use the Compiler API to provide a custom
>> file manager that
>> uses an equivalent synthetic source path.
>>
>
> In the short-term, we can probably synthesize a temporary directory with
> all the source files we know about and use that directory as the only
> element on the sourcepath when we know we are compiling a single module.
>
> -- Jon
>>
>
> I really appreciate your detailed explanation of why the design decisions
> have been made the way they have, but I feel like I'm still missing one
> important piece of the puzzle.
>
> According to this documentation:
> http://docs.oracle.com/javase/9/tools/javac.htm#JSWOR627
>
> ---
> If -classpath, -classpath,, or -cp aren’t specified, then the user class
> path is the current directory.
>
> If the -sourcepath option isn’t specified, then the user class path is
> also searched for source files.
> ---
>
> So, if the default behavior (when -sourcepath is not specified) is
> essentially to treat the classpath as the sourcepath.
>
> Why in my example project here: https://github.com/eljobe/modules
> does this work
> javac -cp '' -d build/classes $(find src -name "*.java")
> but this fail
> javac -sourcepath '' -cp '' -d build/classes $(find src -name "*.java")
>
> It seems to me, that according to the documentation, this is only the
> difference between an implicitly (because the -cp option is empty) empty
> sourcepath and an explicitly empty sourcepath.
>
> One possibility is that the documentation is wrong and the default
> sourcepath includes some directories in addition to the classpath. In which
> case, I'd like to know what those directories are.
>
> I'm not sure what other explanation would cause this difference in
> behavior.
>
> Thanks again,
> Pepper
>