JMOD, native libraries and the packaging of JavaFX

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

JMOD, native libraries and the packaging of JavaFX

Mike Hearn
Hello,

The JMOD format is not documented directly, but is essentially a JAR-like
format which can also contain native libraries, license texts, man pages
and config files. However, JMODs are not just JARs with extra features.
They are incompatible, because class files go under a classes/ directory.

JEP 261 says:

JMOD files can be used at compile time and link time, but not at run time.
To support them at run time would require, in general, that we be prepared
to extract and link native-code libraries on-the-fly. This is feasible on
most platforms, though it can be very tricky, and we have not seen many use
cases that require this capability, so for simplicity we have chosen to
limit the utility of JMOD files in this release.


I was a bit surprised when I first read this because JARs that contain
native libraries, along with hacky custom code to extract and load them,
are actually quite common. One example is Conscrypt but there are others.
An extended JAR format in which the JVM took care of loading the right
native library would be a very helpful thing to have. Whilst the task can
be tricky if you want to do it as efficiently as possible e.g. not save the
DLL / DSO to disk, a "good enough" approach can't be that hard because the
community has implemented it many times.

I'm thinking about this issue now because I quite like JavaFX and its
future is clearly as a regular Java library, albeit a big one, distributed
either via (not ideal) an SDK or (better) a set of regular libraries
published to a Maven repository.

Publishing JavaFX as a set of modules that developers can depend on with
Maven or Gradle will require either that JavaFX include the sort of hacky
extract-to-a-temp-dir-and-load code that is standard today, or that it's
not published as ordinary JARs at all, or that the JPMS is extended in time
for Java 11 to provide a uniform solution.

As far as I can see the JMOD format is probably not going to gain adoption.
Whilst it can be used by developers in theory:

   - There's no support in Maven or Gradle.
   - The format isn't documented.
   - Using it requires jlinking, which isn't a part of the regular
   developer workflow.
   - jlink doesn't do much for the vast majority of apps that can't be
   fully modularised yet.
   - It's not clear why it's better than an extended JAR format.
   - It behaves in puzzling ways, for example the "jar" tool can print the
   file listing of a jmod but not extract it.

The bulk of the JMOD feature set could be delivered with two small
extensions to the JAR format:

   1. A common directory structure for storing native libraries. That would
   allow native library extraction and loading to be provided via a small
   library, if not provided by the JVM itself.
   2. A common directory structure for including license files (this just
   has to be announced, as nothing needs to load them).
   3. Metadata linking command line program names to main class names and
   JVM parameters.

For example the jdk.javadoc.jmod contains a program bin/javadoc which is a
native binary specific to the host platform of the JDK. But all it does is
run the JavaDoc program itself, which is written in Java. This sort of
startup program could be easily generated on the fly given startup
parameters in the same way the javapackager tool does it.

If the JAR format was extended in this way, it would become possible to
write a tool that given a Maven coordinate would resolve and then install
into a bin/ directory on the path a starter program with the suggested
name. This would be a nice feature to have (I am writing such a tool
already, but it'd be good to standardise the way to express a desired
'short name').

I realise there isn't much time left until the Java 11 ship sails. But for
JavaFX users at least, it'd be convenient to have this small extension.
Even if the JVM isn't extended to support it in 11, a common format would
allow it to take over responsibility later.

thanks,
-mike
Reply | Threaded
Open this post in threaded view
|

Re: JMOD, native libraries and the packaging of JavaFX

Gregg Wonderly
Yes, I, like many others have carried around various JNI code in jar files (javax.comm and my own as well), and then copied these out of the jar, into “temp” space, and then used load() to load and bind them in class wrappers.  This works quite well, but it is highly customized in how it’s implemented, but the effect is the same.  Having the module system support modules with JNI resources would be extremely beneficial given the staleness and complete lack of work on bring more native functionality into the JVM to support things like JavaFX and even JMF which has died on the vine due to the complexity of users tasks to get JMF for use of such based applications.

It’s clear that the desktop and non-web server JEE or such servers is not interesting to Oracle, but it’s quite actively pursued on the platform and ignoring that fact will just continue to erode the use of Java for portable solutions that it once was trumpeted at being the key to.

Gregg

> On Apr 23, 2018, at 12:21 PM, Mike Hearn <[hidden email]> wrote:
>
> Hello,
>
> The JMOD format is not documented directly, but is essentially a JAR-like
> format which can also contain native libraries, license texts, man pages
> and config files. However, JMODs are not just JARs with extra features.
> They are incompatible, because class files go under a classes/ directory.
>
> JEP 261 says:
>
> JMOD files can be used at compile time and link time, but not at run time.
> To support them at run time would require, in general, that we be prepared
> to extract and link native-code libraries on-the-fly. This is feasible on
> most platforms, though it can be very tricky, and we have not seen many use
> cases that require this capability, so for simplicity we have chosen to
> limit the utility of JMOD files in this release.
>
>
> I was a bit surprised when I first read this because JARs that contain
> native libraries, along with hacky custom code to extract and load them,
> are actually quite common. One example is Conscrypt but there are others.
> An extended JAR format in which the JVM took care of loading the right
> native library would be a very helpful thing to have. Whilst the task can
> be tricky if you want to do it as efficiently as possible e.g. not save the
> DLL / DSO to disk, a "good enough" approach can't be that hard because the
> community has implemented it many times.
>
> I'm thinking about this issue now because I quite like JavaFX and its
> future is clearly as a regular Java library, albeit a big one, distributed
> either via (not ideal) an SDK or (better) a set of regular libraries
> published to a Maven repository.
>
> Publishing JavaFX as a set of modules that developers can depend on with
> Maven or Gradle will require either that JavaFX include the sort of hacky
> extract-to-a-temp-dir-and-load code that is standard today, or that it's
> not published as ordinary JARs at all, or that the JPMS is extended in time
> for Java 11 to provide a uniform solution.
>
> As far as I can see the JMOD format is probably not going to gain adoption.
> Whilst it can be used by developers in theory:
>
>   - There's no support in Maven or Gradle.
>   - The format isn't documented.
>   - Using it requires jlinking, which isn't a part of the regular
>   developer workflow.
>   - jlink doesn't do much for the vast majority of apps that can't be
>   fully modularised yet.
>   - It's not clear why it's better than an extended JAR format.
>   - It behaves in puzzling ways, for example the "jar" tool can print the
>   file listing of a jmod but not extract it.
>
> The bulk of the JMOD feature set could be delivered with two small
> extensions to the JAR format:
>
>   1. A common directory structure for storing native libraries. That would
>   allow native library extraction and loading to be provided via a small
>   library, if not provided by the JVM itself.
>   2. A common directory structure for including license files (this just
>   has to be announced, as nothing needs to load them).
>   3. Metadata linking command line program names to main class names and
>   JVM parameters.
>
> For example the jdk.javadoc.jmod contains a program bin/javadoc which is a
> native binary specific to the host platform of the JDK. But all it does is
> run the JavaDoc program itself, which is written in Java. This sort of
> startup program could be easily generated on the fly given startup
> parameters in the same way the javapackager tool does it.
>
> If the JAR format was extended in this way, it would become possible to
> write a tool that given a Maven coordinate would resolve and then install
> into a bin/ directory on the path a starter program with the suggested
> name. This would be a nice feature to have (I am writing such a tool
> already, but it'd be good to standardise the way to express a desired
> 'short name').
>
> I realise there isn't much time left until the Java 11 ship sails. But for
> JavaFX users at least, it'd be convenient to have this small extension.
> Even if the JVM isn't extended to support it in 11, a common format would
> allow it to take over responsibility later.
>
> thanks,
> -mike

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Mike Hearn
In reply to this post by Mike Hearn
I did a bit of experimentation to learn how different operating systems
support loading shared libraries in-memory. I also did a bit of thinking on
the topic of "native classloaders". Here's a braindump, which may lead
nowhere but at least it'll be written down.

Linux: This OS has the best support. The memfd_create syscall was added in
Linux 3.17 (released in 2014). It's not exposed by glibc but is easy to
invoke by hand. It creates a file descriptor that supports all normal
operations from an in-memory region. After creation it can be fed to the
rtld using dlopen(/proc/self/fd/num). I've tried this and it works fine.

Windows: Runner up support. An in-memory file can be created using
FILE_ATTRIBUTE_TEMPORARY
| FILE_FLAG_DELETE_ON_CLOSE passed to CreateFile. However, it still
occupies a location in the namespace, probably permission checks still
apply. Additionally the file is not truly memory only. Under memory
pressure the VMM may flush it to disk to free up resources. I haven't tried
this API myself.

macOS: Worst support. There is a deprecated NSCreateObjectFromMemoryFile
API but internally all it does is save the contents to a file and load it
again. The core rtld cannot load from anything except a file and macOS does
not appear to have any way to create in-memory fds. shm_open doesn't work,
the SHM implementation is too broken; you can't write to such an fd, you
can mmap it but then trying to open /dev/fd/x doesn't work on the resulting
fd.

Obviously with any such approach you face the problem of dependencies. Even
if the DLL/DSO itself is in memory, the rtld will still try to load any
dependent libraries from disk.

Playing around with this led me to start pondering something more
ambitious, namely, making native code loading customisable by Java code,
via some sort of NativeLoader API that's conceptually similar to
ClassLoader. Methods on it would be called to resolve a library given a
name and to look up symbols in the returned code module (returning a native
pointer to an entry point that uses a calling convention passed into the
lookup). System.load() would be redirected to call into this class and the
JNI linker would upcall into it. NativeLoaders could be exposed via a SPI.

This might sound extreme, but we can see some advantages:

   - The default NativeLoader would just use the platform APIs as today,
   meaning little behaviour or code change on the JDK side.
   - Samuel could write a NativeLoader for his unpack+cache implementation
   that standardises this behaviour in an ordinary library anyone can use.
   - A Linux implementation could be written that gives faster performance
   and more robustness using memfd_create, again, it could be done by the
   community instead of HotSpot implementors.
   - It opens the possibility of the community developing a new,
   platform-independent native code format by itself. Why might this be
   interesting?
   - ELF, PE and Mach-O do not vary significantly in feature set, and
      differ only due to the underlying platforms evolving separately. From the
      perspective of a Java developer who just wants to package some
native code
      for distribution the distinction is annoying and unnecessary. It means
      building the same code three times, on three different platforms
with three
      different toolchains, each time the native code changes, even if
the change
      itself has no platform specific parts.
      - A new format could potentially simplify (e.g. do we still need
      support for relocation given the relatively poor success of ASLR and the
      big success of 64 bit platforms?), but it could also have features Java
      developers might want, like:
      - The ability to have OS specific symbols. If your library differs
         between platforms only in a few places, do you really need to
ship three
         artifacts for that or would one artifact that contained 3
versions of the
         underlying function be sufficient?
         - The ability to smoothly up-call to Java by linking dependent
         symbols back to Java via the callback capabilities in Panama.
         - The ability to internally link symbols based on information
         discovered by the JVM, e.g. if the JVM is able and willing to
use AVX512
         the new format could simply use that information instead of
requiring the
         developer to write native code to detect CPU capabilities.
         - Fat binaries that support different CPU architectures inside a
         single file, and/or LLVM bitcode. If you have LLVM as a supported
         "architecture" then the NativeLoader could perhaps bridge to
Sulong, so
         Java code that wasn't written with the Graal/Truffle API in
mind can still
         benefit from calling into JIT-compiled bitcode. This probably
would require
         NativeLoader to return a MethodHandle of some sort rather
than a 'long'
         pointer.
         - Alternatively write a cross platform ELF loader. The Java IO
      APIs provide everything needed already, as far as I know.
      - The Arabica project explored sandboxing JNI libraries using Google
      NativeClient. It appeared to work and calls from the sandboxed code into
      the OS were transparently redirected back into the JVM where normal
      permission checks occurred. Sandboxed, cross platform native
code would be
      a powerful capability. http://www.cse.psu.edu/~gxt29/paper/arabica.pdf
      - It opens the possibility of "assembly spinning" as an analogue to
   "bytecode spinning", as there would be no requirement that the NativeLoader
   return a pointer to code that it loaded off disk. Java-level JIT compilers
   like Graal could potentially be used to spin little snippets of code at
   runtime, or when only small amounts are needed (e.g. to invoke a syscall
   that isn't exposed via glibc, like, say, memfd_create) the bytes can simply
   be prepared ahead of time and read from the constant pool.

Getting back to JavaFX for a moment, it sounds like it's too late for
anything to go into JDK11 at this point, which is a pity. It will be up to
the community to find a solution like Samuel's cache for now.

thanks,
-mike
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Samuel Audet
Wow, this sounds really ambitious. A lot of this overlaps with what
Graal, LLVM, and Panama are trying to do.
Which is great, but we /really/ need to come up with some sort of
roadmap and get everyone on the same page...

Anyway, here's what else is in JavaCPP today that works and provides a
good start in solving a lot of these issues:

  * Builds for multiple platforms are done by CI servers, so building
    for 3+ platforms isn't an issue with services like AppVeyor and
    Travis CI: http://bytedeco.org/builds/ BTW, building everything on a
    single platform would require compatibility with MSVC, which no one,
    not even LLVM has fully achieved yet
    <https://clang.llvm.org/docs/MSVCCompatibility.html>, while nobody
    even cares about building binaries for iOS or Mac on Linux! And even
    if you do figure out a technical way to do it, you'll need to
    navigate Apple's lawyers...
  * Access to system APIs
    <https://github.com/bytedeco/javacpp-presets/tree/master/systems#sample-usage>,
    to detect things like AVX512, among other things
    <http://bytedeco.org/news/2018/01/17/java-for-systems/>, is there.
  * Support for "fat binaries" is provided in the form of JAR files that
    can be turned easily into uber JARs with Maven: Search Central for
    "org.bytedeco.javacpp-presets opencv"
    <http://search.maven.org/#search%7Cga%7C1%7Corg.bytedeco.javacpp-presets%20opencv>
    for an example.
  * JavaCPP itself is less than 400 KB and we can use only the parts
    that we are interested in. For example, JavaFX could bundle the
    native libraries in JAR files in a custom fashion, but rely on
    JavaCPP to load them using org.bytedeco.javacpp.Loader.load()
    <http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Loader.html#loadLibrary-java.net.URL:A-java.lang.String-java.lang.String...->,
    which has already been battle-tested on a lot of platforms by all
    users of JavaCPP <https://github.com/bytedeco/javacpp>, JavaCV
    <https://github.com/bytedeco/javacv>, and Deeplearning4j
    <https://github.com/deeplearning4j/deeplearning4j>, among others.

FWIW, I think Sulong has the right approach in bridging Java with
Graal/Truffle and LLVM.

Now let's actually start doing something about this and get everything
standardized, please. :)

Samuel

On 05/08/2018 12:39 AM, Cyprien Noel wrote:

> That's really interesting, particularly if it enables a two-phase
> deployment like Android does. Native code could be stored in a repo as
> LLVM bitcode, and compiled and cached at install time. It allows code
> memory sharing between apps, only loading code that is actually used
> at runtime, long running compilation that optimises for the local
> platform, and maybe also do sandboxing transformations and inline them
> at install time?
>
> On Mon, May 7, 2018, 4:19 AM Mike Hearn <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I did a bit of experimentation to learn how different operating
>     systems support loading shared libraries in-memory. I also did a
>     bit of thinking on the topic of "native classloaders". Here's a
>     braindump, which may lead nowhere but at least it'll be written down.
>
>     Linux: This OS has the best support. The memfd_create syscall was
>     added in Linux 3.17 (released in 2014). It's not exposed by glibc
>     but is easy to invoke by hand. It creates a file descriptor that
>     supports all normal operations from an in-memory region. After
>     creation it can be fed to the rtld using
>     dlopen(/proc/self/fd/num). I've tried this and it works fine.
>
>     Windows: Runner up support. An in-memory file can be created using
>     FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE passed to
>     CreateFile. However, it still occupies a location in the
>     namespace, probably permission checks still apply. Additionally
>     the file is not truly memory only. Under memory pressure the VMM
>     may flush it to disk to free up resources. I haven't tried this
>     API myself.
>
>     macOS: Worst support. There is a deprecated
>     NSCreateObjectFromMemoryFile API but internally all it does is
>     save the contents to a file and load it again. The core rtld
>     cannot load from anything except a file and macOS does not appear
>     to have any way to create in-memory fds. |shm_open|doesn't work,
>     the SHM implementation is too broken; you can't write to such an
>     fd, you can mmap it but then trying to open /dev/fd/x doesn't work
>     on the resulting fd.
>
>     Obviously with any such approach you face the problem of
>     dependencies. Even if the DLL/DSO itself is in memory, the rtld
>     will still try to load any dependent libraries from disk.
>
>     Playing around with this led me to start pondering something more
>     ambitious, namely, making native code loading customisable by Java
>     code, via some sort of NativeLoader API that's conceptually
>     similar to ClassLoader. Methods on it would be called to resolve a
>     library given a name and to look up symbols in the returned code
>     module (returning a native pointer to an entry point that uses a
>     calling convention passed into the lookup). System.load() would be
>     redirected to call into this class and the JNI linker would upcall
>     into it. NativeLoaders could be exposed via a SPI.
>
>     This might sound extreme, but we can see some advantages:
>
>       * The default NativeLoader would just use the platform APIs as
>         today, meaning little behaviour or code change on the JDK side.
>       * Samuel could write a NativeLoader for his unpack+cache
>         implementation that standardises this behaviour in an ordinary
>         library anyone can use.
>       * A Linux implementation could be written that gives faster
>         performance and more robustness using memfd_create, again, it
>         could be done by the community instead of HotSpot implementors.
>       * It opens the possibility of the community developing a new,
>         platform-independent native code format by itself. Why might
>         this be interesting?
>           o ELF, PE and Mach-O do not vary significantly in feature
>             set, and differ only due to the underlying platforms
>             evolving separately. From the perspective of a Java
>             developer who just wants to package some native code for
>             distribution the distinction is annoying and unnecessary.
>             It means building the same code three times, on three
>             different platforms with three different toolchains, each
>             time the native code changes, even if the change itself
>             has no platform specific parts.
>           o A new format could potentially simplify (e.g. do we still
>             need support for relocation given the relatively poor
>             success of ASLR and the big success of 64 bit platforms?),
>             but it could also have features Java developers might
>             want, like:
>               + The ability to have OS specific symbols. If your
>                 library differs between platforms only in a few
>                 places, do you really need to ship three artifacts for
>                 that or would one artifact that contained 3 versions
>                 of the underlying function be sufficient?
>               + The ability to smoothly up-call to Java by linking
>                 dependent symbols back to Java via the callback
>                 capabilities in Panama.
>               + The ability to internally link symbols based on
>                 information discovered by the JVM, e.g. if the JVM is
>                 able and willing to use AVX512 the new format could
>                 simply use that information instead of requiring the
>                 developer to write native code to detect CPU capabilities.
>               + Fat binaries that support different CPU architectures
>                 inside a single file, and/or LLVM bitcode. If you have
>                 LLVM as a supported "architecture" then the
>                 NativeLoader could perhaps bridge to Sulong, so Java
>                 code that wasn't written with the Graal/Truffle API in
>                 mind can still benefit from calling into JIT-compiled
>                 bitcode. This probably would require NativeLoader to
>                 return a MethodHandle of some sort rather than a
>                 'long' pointer.
>           o Alternatively write a cross platform ELF loader. The Java
>             IO APIs provide everything needed already, as far as I know.
>           o The Arabica project explored sandboxing JNI libraries
>             using Google NativeClient. It appeared to work and calls
>             from the sandboxed code into the OS were transparently
>             redirected back into the JVM where normal permission
>             checks occurred. Sandboxed, cross platform native code
>             would be a powerful capability.
>             http://www.cse.psu.edu/~gxt29/paper/arabica.pdf
>             <http://www.cse.psu.edu/%7Egxt29/paper/arabica.pdf>
>       * It opens the possibility of "assembly spinning" as an analogue
>         to "bytecode spinning", as there would be no requirement that
>         the NativeLoader return a pointer to code that it loaded off
>         disk. Java-level JIT compilers like Graal could potentially be
>         used to spin little snippets of code at runtime, or when only
>         small amounts are needed (e.g. to invoke a syscall that isn't
>         exposed via glibc, like, say, memfd_create) the bytes can
>         simply be prepared ahead of time and read from the constant pool.
>
>     Getting back to JavaFX for a moment, it sounds like it's too late
>     for anything to go into JDK11 at this point, which is a pity. It
>     will be up to the community to find a solution like Samuel's cache
>     for now.
>
>     thanks,
>     -mike
>

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Mike Hearn
Thanks Samuel! I wasn't familiar with JavaCPP before, that sounds like a
great project.

You are right that there's a lot of overlap here with other efforts, and
that standardising some basic things like JAR locations is the right place
to begin. I suspect a JEP requires actual changes to OpenJDK to be valid,
so a JEP that just proposes whatever JavaCPP does as a convention wouldn't
go anywhere.

Perhaps integrating JavaCPP's loading mechanism with JavaFX is a good next
step, as the community can then learn about it through that and may follow
the lead of JavaFX. I suppose someone would have to convince Kevin
Rushforth.

Samuel - what you could also do is write a one-page "standards document"
that describes where exactly JavaCPP puts things on the file system, the
algorithm it uses for selecting locations and cache keys, etc, so other
projects that unpack libraries to disk can share the same cache location.
That would lay the groundwork for it either becoming a widely adopted
convention, and/or becoming a future Java standard, and/or being
encapsulated in a NativeLoader in future if such an API is added to the
Java platform. The Loader class could also be split out into a separate
module/project.

The Panama/nicl JEP makes mention of improvements to native code loading
and discovery. It seems most of the effort in Panama is currently related
to vector support. If I were Mr Rose or Mr Reinhold I'd be tempted to try
and un-bundle better loading from the rest of the nicl project so it can
ship earlier. A NativeLoader style API would be a smaller change to the JVM
than all of the binding layer together.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Samuel Audet
Hi,

Thanks for your interest! I'm always trying to do all that I can, but
I'm pretty much just one guy working on this part-time...

Besides, this is the kind of thing that should be standardized in the
JDK, and Oracle isn't exactly low in resources ($9 billion net income
last year, wow), so what's the issue? I'm still trying to figure out the
politics and I doubt that one more page about something is going to make
a great deal of a difference...

Samuel

On 05/08/2018 09:40 PM, Mike Hearn wrote:

> Thanks Samuel! I wasn't familiar with JavaCPP before, that sounds like a
> great project.
>
> You are right that there's a lot of overlap here with other efforts, and
> that standardising some basic things like JAR locations is the right
> place to begin. I suspect a JEP requires actual changes to OpenJDK to be
> valid, so a JEP that just proposes whatever JavaCPP does as a convention
> wouldn't go anywhere.
>
> Perhaps integrating JavaCPP's loading mechanism with JavaFX is a good
> next step, as the community can then learn about it through that and may
> follow the lead of JavaFX. I suppose someone would have to convince
> Kevin Rushforth.
>
> Samuel - what you could also do is write a one-page "standards document"
> that describes where exactly JavaCPP puts things on the file system, the
> algorithm it uses for selecting locations and cache keys, etc, so other
> projects that unpack libraries to disk can share the same cache
> location. That would lay the groundwork for it either becoming a widely
> adopted convention, and/or becoming a future Java standard, and/or being
> encapsulated in a NativeLoader in future if such an API is added to the
> Java platform. The Loader class could also be split out into a separate
> module/project.
>
> The Panama/nicl JEP makes mention of improvements to native code loading
> and discovery. It seems most of the effort in Panama is currently
> related to vector support. If I were Mr Rose or Mr Reinhold I'd be
> tempted to try and un-bundle better loading from the rest of the nicl
> project so it can ship earlier. A NativeLoader style API would be a
> smaller change to the JVM than all of the binding layer together.
>

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Mike Hearn
I don't think there's any politics to it. This sort of thing is scoped as
part of Panama and funded already. Maybe there's a debate to be had about
ordering of tasks, but that's a separate thing.

I'm working on a side project that might be relevant to this - I'll email
you about it off list Sam.


On Wed, May 09, 2018 at 07:09:27, Samuel Audet<[hidden email]>wrote:

> Hi,
>
> Thanks for your interest! I'm always trying to do all that I can, but I'm
> pretty much just one guy working on this part-time...
>
> Besides, this is the kind of thing that should be standardized in the JDK,
> and Oracle isn't exactly low in resources ($9 billion net income last year,
> wow), so what's the issue? I'm still trying to figure out the politics and
> I doubt that one more page about something is going to make a great deal of
> a difference...
>
> Samuel
>
> On 05/08/2018 09:40 PM, Mike Hearn wrote:
>
> Thanks Samuel! I wasn't familiar with JavaCPP before, that sounds like a
> great project.
>
> You are right that there's a lot of overlap here with other efforts, and
> that standardising some basic things like JAR locations is the right place
> to begin. I suspect a JEP requires actual changes to OpenJDK to be valid,
> so a JEP that just proposes whatever JavaCPP does as a convention wouldn't
> go anywhere.
>
> Perhaps integrating JavaCPP's loading mechanism with JavaFX is a good next
> step, as the community can then learn about it through that and may follow
> the lead of JavaFX. I suppose someone would have to convince Kevin
> Rushforth.
>
> Samuel - what you could also do is write a one-page "standards document"
> that describes where exactly JavaCPP puts things on the file system, the
> algorithm it uses for selecting locations and cache keys, etc, so other
> projects that unpack libraries to disk can share the same cache location.
> That would lay the groundwork for it either becoming a widely adopted
> convention, and/or becoming a future Java standard, and/or being
> encapsulated in a NativeLoader in future if such an API is added to the
> Java platform. The Loader class could also be split out into a separate
> module/project.
>
> The Panama/nicl JEP makes mention of improvements to native code loading
> and discovery. It seems most of the effort in Panama is currently related
> to vector support. If I were Mr Rose or Mr Reinhold I'd be tempted to try
> and un-bundle better loading from the rest of the nicl project so it can
> ship earlier. A NativeLoader style API would be a smaller change to the JVM
> than all of the binding layer together.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: JMOD, native libraries and the packaging of JavaFX

Mark Raynsford
In reply to this post by Mike Hearn
On 2018-05-07T04:19:55 -0700
Mike Hearn <[hidden email]> wrote:

> I did a bit of experimentation to learn how different operating systems
> support loading shared libraries in-memory. I also did a bit of thinking on
> the topic of "native classloaders". Here's a braindump, which may lead
> nowhere but at least it'll be written down.

How do the BSDs cope with this?

I suspect that OpenBSD will not have any support for this at all, but
FreeBSD might.

--
Mark Raynsford | http://www.io7m.com

Reply | Threaded
Open this post in threaded view
|

Re: JMOD, native libraries and the packaging of JavaFX

Mark Raynsford
On 2018-05-09T18:53:32 +0100
Mark Raynsford <[hidden email]> wrote:

> On 2018-05-07T04:19:55 -0700
> Mike Hearn <[hidden email]> wrote:
>
> > I did a bit of experimentation to learn how different operating systems
> > support loading shared libraries in-memory. I also did a bit of thinking on
> > the topic of "native classloaders". Here's a braindump, which may lead
> > nowhere but at least it'll be written down.  
>
> How do the BSDs cope with this?
>
> I suspect that OpenBSD will not have any support for this at all, but
> FreeBSD might.

Apologies for the cross-post, I didn't realize both addresses were on
the reply.

--
Mark Raynsford | http://www.io7m.com

Reply | Threaded
Open this post in threaded view
|

Re: JMOD, native libraries and the packaging of JavaFX

Mike Hearn
In reply to this post by Mark Raynsford
I couldn't find any support in FreeBSD, although there is "fdlopen", which
opens a shared library direct from a file descriptor. I haven't tried it.

Loading a library from a file or memory region is an obvious use case
that'd be helpful for anyone who wants to distribute programs in the form
of single files (whether jars or exes or elf binaries), but it's not well
supported. Here's a bit of background on why not.

The blame can mostly be laid at the feet of the ubiquitous performance
optimisation mmap/MapViewOfFile. The idea is, map your shared library into
memory and let the kernel lazily load only what's needed instead of the
whole thing. In the days when memory was very scarce and disks were very
slow this could be a big help. Unfortunately it imposes some strict limits
on what you can do. Kernels really want mmaps to be page-aligned at every
level, so, you can't tell the kernel to map a shared library starting from
some arbitrary offset in the file. This *could* be supported, but isn't,
presumably to simplify kernel code.

I was curious if it's still the case that mmap is so important. Putting
aside the question of OS support, would you lose a lot of performance by
just loading the file into memory all at once with regular file IO and then
adjusting the page permissions using mmap afterwards?

Shared libraries are not very large by modern standards. HotSpot libjvm is
13mb on macOS, and the largest DSO I found in my Linux
/usr/lib/x86_64-linux-gnu directory was libicudata.so.57.1 which weighs in
at a generous 25 megabytes. ICU is rare (it's mostly Unicode data tables
which are enormous). The next largest is libgs (ghostscript) which is 16mb.
So it seems plausible that 15-20mb is about the largest shared library Java
users are likely to want to load (that's a LOT of C++!).

Running a simple benchmark on a cheap Linode VM:

root@plan99:/usr/lib/x86_64-linux-gnu# echo 3 > /proc/sys/vm/drop_caches
root@plan99:/usr/lib/x86_64-linux-gnu# time cat libicudata.so.57.1
>/dev/null
real 0m0.046s
user 0m0.000s
sys 0m0.013s

46 msec to load a 25 megabyte DSO into memory from disk? 8msec to do it
again when hot in the cache? mmap is surely instant, but it's not clear to
me that mmap matters much anymore if you're already paying the cost of
interpreting/jit compiling. In an age where people routinely ship apps
as *entire
operating systems* (Docker images), it feels like we're being held back
here by obsolete optimisations.

Unfortunately on most platforms the system dynamic linker has special
privileges. Debuggers handshake with it, and on Windows only the OS linker
can produce an HMODULE even though HMODULE is just a pointer to the base
address of the mapped image. HMODULEs are in turn required by a few old
Windows APIs. So, using a custom linker imposes some small sacrifices.

I'll leave the topic here.
Reply | Threaded
Open this post in threaded view
|

Re: JMOD, native libraries and the packaging of JavaFX

Samuel Audet
The issue isn't just about /loading/ shared libraries, it's also about
/linking/ with these libraries using native toolchains. That usually
requires at least the header files, and possibly other files for
pkg-config and what not. Unless we really want to revamp how GCC, Clang,
and MSVC work as well, it's probably a good idea to stick with files.
And the JDK should provide a standard way of caching resources to files!

That's what is happening with, for example, Caffe depending on OpenCV,
OpenBLAS, and HDF5 here:
https://github.com/bytedeco/javacpp-presets/blob/master/caffe/pom.xml#L17-L37

Some developers might want to use only OpenCV, others might want to use
Caffe, which depends on OpenCV, and yet others (Cyprien, for example)
actually do need to write native code in C++ and link with *both* OpenCV
and Caffe as well as write Java classes that also use *both* OpenCV and
Caffe... That's pretty basic stuff! But I'm getting the impression that
we're not thinking about this here. Let's please keep this in mind!

Samuel

On 05/11/2018 12:05 AM, Mike Hearn wrote:

> I couldn't find any support in FreeBSD, although there is "fdlopen", which
> opens a shared library direct from a file descriptor. I haven't tried it.
>
> Loading a library from a file or memory region is an obvious use case
> that'd be helpful for anyone who wants to distribute programs in the form
> of single files (whether jars or exes or elf binaries), but it's not well
> supported. Here's a bit of background on why not.
>
> The blame can mostly be laid at the feet of the ubiquitous performance
> optimisation mmap/MapViewOfFile. The idea is, map your shared library into
> memory and let the kernel lazily load only what's needed instead of the
> whole thing. In the days when memory was very scarce and disks were very
> slow this could be a big help. Unfortunately it imposes some strict limits
> on what you can do. Kernels really want mmaps to be page-aligned at every
> level, so, you can't tell the kernel to map a shared library starting from
> some arbitrary offset in the file. This *could* be supported, but isn't,
> presumably to simplify kernel code.
>
> I was curious if it's still the case that mmap is so important. Putting
> aside the question of OS support, would you lose a lot of performance by
> just loading the file into memory all at once with regular file IO and then
> adjusting the page permissions using mmap afterwards?
>
> Shared libraries are not very large by modern standards. HotSpot libjvm is
> 13mb on macOS, and the largest DSO I found in my Linux
> /usr/lib/x86_64-linux-gnu directory was libicudata.so.57.1 which weighs in
> at a generous 25 megabytes. ICU is rare (it's mostly Unicode data tables
> which are enormous). The next largest is libgs (ghostscript) which is 16mb.
> So it seems plausible that 15-20mb is about the largest shared library Java
> users are likely to want to load (that's a LOT of C++!).
>
> Running a simple benchmark on a cheap Linode VM:
>
> root@plan99:/usr/lib/x86_64-linux-gnu# echo 3 > /proc/sys/vm/drop_caches
> root@plan99:/usr/lib/x86_64-linux-gnu# time cat libicudata.so.57.1
>> /dev/null
> real 0m0.046s
> user 0m0.000s
> sys 0m0.013s
>
> 46 msec to load a 25 megabyte DSO into memory from disk? 8msec to do it
> again when hot in the cache? mmap is surely instant, but it's not clear to
> me that mmap matters much anymore if you're already paying the cost of
> interpreting/jit compiling. In an age where people routinely ship apps
> as *entire
> operating systems* (Docker images), it feels like we're being held back
> here by obsolete optimisations.
>
> Unfortunately on most platforms the system dynamic linker has special
> privileges. Debuggers handshake with it, and on Windows only the OS linker
> can produce an HMODULE even though HMODULE is just a pointer to the base
> address of the mapped image. HMODULEs are in turn required by a few old
> Windows APIs. So, using a custom linker imposes some small sacrifices.
>
> I'll leave the topic here.
>