-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[WIP] Add JDK Flight Recorder support for Java native images #3070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Andrew Dinn <[email protected]> Co-authored-by: Roman Kennke <[email protected]> Co-authored-by: Josh Matsuoka <[email protected]> Co-authored-by: Tako Schotanus <[email protected]> Co-authored-by: Simon Tooke <[email protected]>
Hello jkang, thanks for contributing a PR to our project! We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address jkang -(at)- redhat -(dot)- com. You can sign it at that link. If you think you've already signed it, please comment below and we'll check. |
Note that @jiekang is covered by the Red Hat OCA. He works for Red Hat (as do other contributors). |
Nice one guys! :) |
Thanks for the PR, we will have a look at it. Due to the imminent winter break, holidays, vacation, ... a realistic ETA for the first round of comments on our side is middle of January. I'm also glad that you explicitly mark the PR as "WIP". Still, it is a lot of work that you already did on it (about 9000 lines of code added). I do not think it is a sustainable approach to make such large contributions without discussing the overall architecture beforehand. Ideally, the architecture discussions should start before writing the first line of code. Slack and Zoom discussions upfront are far more time effective than after-the-fact reviews. |
Hi Christian, Yes, this is a work in progress -- although I do feel we have made a lot of progress to arrive where we are now. I understand why you might have preferred us to discuss this PR in advance. However, we had two problems doing that before now. First, we felt we had to have something tangible to discuss. Second, we wanted to be able to enter any discussion from a position where we understood enough of the complexities of how to implement a solution to make that discussion realistic and not just end up offloading onto you work that we had chosen to undertake. We have released this PR with a basic set of useful, fully working features, extra features being left for further work. So, this really constitutes a 'pilot' implementation i.e. one that tests the feasibility of a partial solution right through to completion and at the same time tries to establish an acceptable path forward to a full solution. We have no belief that this implementation is the only way or the right way to do this and certainly have no expectation that you will simply rubber-stamp our work. That explains why we wanted to be sure our approach really did work for those basic cases before asking you to look at it and help us determine if there is a better approach. In getting to this point we have uncovered and resolved a lot of problems shoe-horning JFR into Graal native, many to do with the change from dynamic to static (build/runtime) operation, just as many to do with the need to uncouple the close linkage between JFR and Hotspot and identify/recreate an equivalent in GraalVM. We have also understood a lot about the problems we have not yet addressed. I really don't believe we could have had a useful discussion with you without working this solution through as far as we have. So, please don't feel we are trying to push this through a hasty review with no desire to listen to your opinion or follow your advice. Of course, we want to get a working version released as soon as possible in order for our users to be able to benefit from having a monitoring solution. However, we also want to get it right, so that we can maintain the functionality we have implemented so far and add the missing capabilities that we also need. We very much welcome your feedback and hope you can help us steer this towards a better design and better implementation. |
Thanks for the PR. So far, I only had a very brief look at the code but here a couple high-level comments that will probably have a large impact on your work in progress:
As @christianwimmer already mentioned, I will have a closer look at the PR in January. |
Hi all, now that all the holidays are over we have some discussions and decisions to make. We have also worked on some initial JFR support in the last months, and we have the approvals to make it public. @christianhaeubl is working on that, and will have that ready soon. Our work is also heavy work-in-progress. Some parts are more complete than this PR (like avoiding problems with GC and safepoints), some parts are less complete. Our hope would be to combine both implementations and work together on a combined implementation. As as starting point, I suggest that we schedule a Zoom with all contributors. To find a time for that, and also to have more interactive control, we created a dedicated Slack channel in the GraalVM slack workspace: #native-image-jfr. This link should get you directly to it: https://graalvm.slack.com/archives/C01K81PSVLM |
@christianwimmer Hi Christian. Thanks for the update and enormous thanks to the Oracle team for deciding to open up your work and combine forces with the rest of the community to ensure JFR works in CE releases. Red Hat very much appreciates your offer of help and welcomes the benefit of your expertise in areas like GC and safepoints where our experience is still weak. Looking forward to the Zoom call and following up on this effort on Slack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a partial and not very detailed review. I skipped many classes where I think that large changes will be necessary anyways due to the differences between the HotSpot and the Native Image safepoint and VM thread handling.
Major issues:
- With the current implementation, it won't be possible to support many of the native events (i.e., the ones that are defined in the metadata.xml file).
- In Native Image, every thread can trigger JFR events. So, even if a thread is at a safepoint and executes a VM operation, it can still trigger JFR events (e.g., because it allocates a Java object). This is a major difference to HotSpot and breaks some of the assumptions that the HotSpot code has.
package com.oracle.svm.core.jdk.jfr; | ||
|
||
public class JavaJfrUtils { | ||
public static boolean isJFRClass(Class<?> clazz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clearly separate between code that can be executed at runtime and code that is only executed at image build time, see @Platforms(Platform.HOSTED_ONLY.class)
(affects the whole PR).
import org.graalvm.compiler.serviceprovider.JavaVersionUtil; | ||
|
||
public class JfrAvailability { | ||
public static boolean withJfr = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an existing (AllowVMInspection
) or a new HostedOption
instead.
import jdk.internal.event.Event; | ||
|
||
@AutomaticFeature | ||
public class JfrFeature implements Feature { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, JfrRuntimeAccessImpl
is always getting registered. Is there a reason for that? I think this whole feature should only be enabled conditionally:
@Override
public boolean isInConfiguration(IsInConfigurationAccess access) {
return JfrAvailability.withJfr;
}
If the whole feature is enabled conditionally, the invidual checks for JfrAvailability.withJfr
(at the begin of each method) can be removed.
Set<Module> modules = new HashSet<>(); | ||
for (Class<?> clazz : reachableClasses) { | ||
if (JfrTraceId.getTraceId(clazz) == -1) { | ||
JfrTraceId.assign(clazz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary or can DynamicHub.getTypeID()
be used as the trace id?
@@ -581,6 +581,24 @@ | |||
"spotbugs": "false", | |||
}, | |||
|
|||
"com.oracle.svm.core.jdk.jfr.test": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JFR implementation should be in a separate project as well. Then it is easier to exclude the files from compilation when a JDK < 11 is used.
public final int bit; | ||
|
||
JfrMsg(int msg) { | ||
this.msg = msg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg
doesn't seem to be used. I think it would be more readable if the shifted bit was passed into the constructor, either in hex or binary representation (e.g., 0x80
or 0b10000000
) as this is a well-known pattern for bit-based enums.
// is complete as it should shutdown cleanly as well | ||
t.setDaemon(true); | ||
t.start(); | ||
Jfr.excludeThread(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excluding a thread is something that will cause issues on Native Image (i.e., JFR events will be missing). In Native Image, any thread can trigger JFR events. This even includes the VM thread when it is at a safepoint. This is a key difference to HotSpot and therefore breaks a lot of the assumptions that are in the HotSpot code.
A few examples that can happen when the JfrRecorderThread
is excluded:
- The
JfrRecorderThread
may allocate new Java objects. The JFR events for those object allocations would be suppressed. - The
JfrRecorderThread
allocates a new Java object. This triggers a GC, so theJfrRecorderThread
is temporarily promoted to the VM thread. So, all GC-related events would be suppressed.
@@ -59,6 +60,9 @@ | |||
@Inject @RecomputeFieldValue(kind = RecomputeFieldValue.Kind.Reset)// | |||
boolean wasStartedByCurrentIsolate; | |||
|
|||
@Inject @RecomputeFieldValue(kind = RecomputeFieldValue.Kind.NewInstance, declClass = JfrThreadLocal.class)// | |||
JfrThreadLocal jfrThreadLocal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to store thread-local values directly on the Java Thread
object. We use a different concept for thread-locals, see FastThreadLocal
.
* | ||
*/ | ||
|
||
public class JfrBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already mentioned, everything related to JFR storage infrastructure must be uninterruptible. So, no Java memory may be allocated and if Java objects are used, they must live in the image heap. Otherwise, it will be impossible to support many of the native events, e.g., allocation and GC-related events.
import com.oracle.svm.core.jdk.jfr.recorder.checkpoint.types.traceid.JfrTraceIdEpoch; | ||
import com.oracle.svm.core.jdk.jfr.recorder.storage.operations.JfrBufferOperations.BufferOperation; | ||
|
||
public class JfrMemorySpace<U extends JfrMemorySpaceRetrieval, V extends JfrMemorySpace.Client> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it makes sense to port the HotSpot C++ templates to Java generics. In HotSpot, the templates kill readability but at least provide good performance. In Native Image, the Java generics may reduce both the readability and the performance as they introduce virtual calls.
jkang has signed the Oracle Contributor Agreement (based on email address jkang -(at)- redhat -(dot)- com) so can contribute to this repository. |
Closing as the work has been done in #3444 |
This PR adds infrastructure to SubstrateVM supporting JDK Flight Recorder for Java native images (#3018). This is a work in progress and the intention is to welcome feedback from Oracle’s GraalVM team and the GraalVM community during the development process leading up to initial integration.
Completed Features
-H:+FlightRecorder
).jfr
) during native image run (jdk.jfr.Recording
and related classes)jdk.jfr.Event
and related classesIn Progress Features
Other Issues in progress
Along with the major features above, many minor issues that will be addressed are tracked here: https://github.com/rh-jmc-team/graal/issues Issues labelled “jfr-initial-release” are targeted for eventual completion in this PR. This issue tracker is temporary and used to cleanly separate work items for this integration in a publicly accessible location. Note, as this is WIP, new issues will surely appear and be tracked there as well. Once this PR is accepted, we will follow upstream practices for all future work. As well, more long-term additions are described in the original issue: #3018
Testing
This implementation has been tested against native images produced from sources in this repository: https://github.com/rh-jmc-team/jfr-tests At this time, tests are manually run. We intend to add automated test infrastructure to compile test sources into native images, run them, and verify the resulting JFR file. Tests against other applications have been and will continue to be run as well.
As an example from the jfr-tests repository, TestConcurrentEvents starts a recording to disk, creates 8 threads that emit 1024*1024 events each, stops the recording and prints the location of the produced .jfr file. See
substratevm/JFR.md
in this branch for instructions on building with FlightRecorder enabled.Contributors:
Andrew Dinn [email protected]
Jie Kang [email protected]
Roman Kennke [email protected]
Josh Matsuoka [email protected]
Tako Schotanus [email protected]
Simon Tooke [email protected]