-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[java] Metrics memoization tests #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3061778
to
fb64e13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
return false; | ||
} | ||
return version.equals(that.version); | ||
return this == o; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this can be done, since we have a factory method.
Please double check, that we do not run into issues, if we serialize/deserialize instances of this class - Object deserialization would be the last possibility, where new instances are created, that this class doesn't control (though could be solved by "readResolve" implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a final
non-Serializable class? Curious, where is Serialization currently used in PMD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the metrics framework not yet, but later on, when we integrate it with the analysis cache.
Currently "RuleViolations" are serialized into a cache file, and - if the file has not been modified since the last pmd run (and pmd is run with the same rules) - the file is not analyzed again and the RuleViolations from the cache are simply returned.
|
||
@Override | ||
public ClassMetric getCalculator() { | ||
return new RandomMetric(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea with the RandomMetric for tests.
I like it, that you can now define such metrics adhoc for testing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like idiomatic support would be nice for these things, so assuming Java 8:
MetricKey<ClassMetric> classMetricKey = MetricKey.of(null, RandomMetric::new)
With a MetricKey static method:
static <T extends Metric> MetricKey<T> MetricKey.of(String name, Supplier<T> supplier)
Also, curious, are MetricKey intended to use identity based equality? The JavaDoc doesn't say, seems useful for a "key" object. The non-test implementations I see are enum
based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice idea! It's much cleaner than the current anonymous class strategy.
Hmm well bottom line, keys should implement hashcode, which is why it's convenient to use enums. They're not used directly to index memoization maps, they're incorporated into a ParameterizedMetricKey
(metrickey + metricversion).
As parameterized metric keys are instance controlled, they can use identity based equality, but in order to find the right instance of parameterized metric key, we use the hashcode of the key (see ParameterizedMetricKey#code(MetricKey, MetricVersion)
). So I guess keys need not even implement equals, even though it's a good idea
boolean force, MetricVersion version, ResultOption option) { | ||
|
||
List<ASTMethodOrConstructorDeclaration> ops = findOperations(node, false); | ||
|
||
List<Double> values = new ArrayList<>(); | ||
for (ASTMethodOrConstructorDeclaration op : ops) { | ||
if (key.supports(op)) { | ||
if (key.getCalculator().supports(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to define the "supports" method also in the MetricKey interface. Would be a minor improvement. I guess, it would need to be a bit more generic, than it is now, e.g.
boolean supports(AccessNode node);
It would then be the responsibility of the metric to return false, if the node type is not supported.
Hm... we basically have already a "supports(AccessNode node)" method in AbstractMetric.
Downside is, that each MetricKey implementation needs to delegate one additional method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think that through generifying the interfaces we could achieve an even more interesting results. If you generify Metric
into Metric<N>
, where N
stands for the type of node the metric can be computed on, that would look like so:
interface Metric<N extends Node> {
boolean supports(N node);
double computeFor(N node, MetricVersion v);
}
interface ClassMetric extends Metric<ASTAnyTypeDeclaration> {}
interface OperationMetric extends Metric<ASTMethodOrConstructorDeclaration> {}
interface MetricKey<N> {
boolean supports(N node);
Metric<N> getCalculator();
String name();
}
The downside is that no class could implement ClassMetric or OperationMetric at the same time, as that would make Metric be inherited with different type arguments. Though I think it could be good to separate ClassMetrics' implementations from their OperationMetric, as Metric objects represent behaviour (they could nearly be seen as a lambda if it wasn't for support
). The fact that some concrete metrics implement both interfaces as of now seems contrary to that design, as it mixes conceptually unrelated behaviours.
Would you let me work on that for the PR after the next ? I already have a few commits after these ones in my local repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea. A design question... What is the purpose of having two subtypes, ClassMetric
and OperationMetric
, given the have identical method signatures? Is the distinction between these metrics best expressed using classes/inheritence, or using other means (e.g. property of the Metric, say "type" with an enum {CLASS, OPERATION})?
Apologies for the odd question, I've not pulled the code and dug through it.
Also, I'm a bit confused by the supports
methods. In your example, it uses N
variable, and not Node
or ? extends Node
. Is it intended to compile time constrain to the specific type, and dynamically determine if different instances of the type are supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inheritance is fine to express the differences between them. That allows us to have one type for class metrics and another for operation metrics, which can be used to put compile time constraints on what metric can be calculated on what node. For example, we can forbid calculating a class metric on an operation node (see the get
overloads in Metrics
).
The supports
method aims at even finer capability checking. For instance some class metrics may not be calculated on interfaces, even though the compile time check above would allow it (as interfaces, classes, enums and annotation type declarations are grouped under ASTAnyTypeDeclaration
). You can see the defaults of this method in AbstractMetric
.
The fact that supports
had an AccessNode
parameter was just poor design. I did that so that metrics, whether class or operation metrics, could all extend a single AbstractMetric
class. To do that, supports
had to be declared in the Metric
interface, which is why its argument's type could not be too specific. But ideally the interfaces shouldn't even hint that a class metric could support any AccessNode
, so it would be good to change that.
@@ -159,14 +159,14 @@ | |||
* | |||
* @return The result of the computation, or {@code Double.NaN} if it couldn't be performed | |||
*/ | |||
/* default */ double computeWithResultOption(OperationMetricKey key, ASTAnyTypeDeclaration node, | |||
/* default */ double computeWithResultOption(MetricKey<OperationMetric> key, ASTAnyTypeDeclaration node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the use of boilerplate /* default */
comments for Java default scope (colloquially known as package scope) PMD style?
This is something I tend to see with newer Java programmers unfamiliar with the language default behaviors, similar to declaring interface
methods public
or abstract
, etc. I wouldn't think we would want it. Alternatively, there are cases were descriptive comments indicating why a class does not have broader scope are useful, but that's not this, and not sure it's warranted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's an interesting point. I began to do that since I saw this comment on a PR for type resolution. I thought it was PMD style, and I don't think it's too noisy so I kept it.
The important thing is to have a consistent style though, so I guess that should be discussed in #361
@@ -123,7 +124,7 @@ public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclarati | |||
* @return The value of the metric, or {@code Double.NaN} if the value couln't be computed or {@code option} is | |||
* {@code null} | |||
*/ | |||
public static double get(OperationMetricKey key, ASTAnyTypeDeclaration node, ResultOption option) { | |||
public static double get(MetricKey<OperationMetric> key, ASTAnyTypeDeclaration node, ResultOption option) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth pointing out that these get
overloads are working because of the use of different 2nd, 3rd, etc. argument types. This method pattern may run into issues if a ClassMetric
and OperationMetric
needed the same node and supplementary argument types due to type erasure on the MetricKey
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how an operation metric and class metric could use the same node. ASTAnyTypeDeclaration and ASTMethodOrConstructorDeclaration are unrelated types, and the compiler type checks before it erases the type arguments. Perhaps the only instance where I could see a problem happening would be using reflection to call these methods, because the type argument would have been erased, but then that's highly unlikely.
@@ -16,7 +16,7 @@ | |||
* | |||
* @author Clément Fournier | |||
*/ | |||
public final class ParameterizedMetricKey { | |||
/* default */ final class ParameterizedMetricKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a MetricKey
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it's an internal structure (not part of the API) to represent a metric key coupled with a metric version. It's used to index memoization maps, so that several versions of the same metric can be cached. And because it doesn't care about where its metrickey and version comes from, it allows metrics defined outside the framework (custom ones) to be cached too
@@ -28,21 +28,21 @@ | |||
*/ | |||
public class CyclomaticComplexityRule extends AbstractJavaMetricsRule { | |||
|
|||
public static final IntegerProperty REPORT_LEVEL_DESCRIPTOR = new IntegerProperty( | |||
private static final IntegerProperty REPORT_LEVEL_DESCRIPTOR = new IntegerProperty( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would imply subclasses are not allowed to interrogate these properties using the descriptors. Is this intended? Is subclassing not permitted (the class is not final
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm well maybe we should make them protected, but I think there's no point in having them public. The class is not designed for inheritance though, so maybe we should make it final? I haven't seen many final
rules in PMD...
@Before | ||
public void setUp() { | ||
pack = new PackageStats(); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've personally never see a 2-blank line style. :)
Stop reading now if you have more important things to do...
Back in the day I used to prefer {
on a separate line, but I've since changed and come to value information density more. I even tilt my monitors to portrait mode so I can see more LOC at once. Not that I want to write huge classes or methods, but it does seem to greatly help my ability to grok code faster, scrolling/navigation is my arch-nemesis. :) I take mildly perverse pleasure whenever I write a complete meaningful class, JavaDoc and all, and it entirely fits within a single visual window on my monitor. The definition is clearly arbitrary, but let's call this OSM, the one-screen-metric! ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another metric! You're welcome to make a feature request :)
I used to cramp up my classes too, writing { }
on one line, skipping braces for if
statements, etc. because I have a small monitor. But I found out it's much easier to skim a class and find info when you can find out if hunks of code are their own methods at a glance: two blank lines => two different methods, one blank line => inside a method. I think I might have a very tubular vision too because anything more than a few cm away from where I look on the monitor is just annoying noise to me, which is why I like to keep things clearly separated
Doing that actually helped me focus on the actual code, which is in the methods (a java class is so much boilerplate after all). For me information density is a relevant metric inside a single method, not so much a class. I totally get that feeling of "mildly perverse pleasure" when I get to write the most elegant method I could, but the more satisfaction I get, the more frustration I feel when Checkstyle bashes me with "assignments in guard clauses are not allowed" or whatever :)
Btw I had never seen the word grok before but I'll keep it, thanks!
} | ||
}, null); | ||
|
||
System.out.println(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the visitation above dumping console output such that this is needed?
|
||
for (int i = 0; i < res.size(); i++) { | ||
assertNotEquals(res.get(i), cmp.get(i)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cmp.size() > res.size()
allowed?
} | ||
}, null); | ||
|
||
assertEquals(res, cmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the names res
and cmp
mean something to you? I'm drawing a blank.
cmp.add((int) toplevel.compute(classMetricKey, node, true, Version.STANDARD)); | ||
return super.visit(node, data); | ||
} | ||
}, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like 4 of these anonymous classes? Do they differ besides the List
they populate and the boolean
parameter? The nature of the tests might be more obvious if there was one class (or factory method?) with whose constructor took those two arguments, and then used 4 times by these two tests. I suspect the tests are really about verifying behavior for interesting combinations of variations of those two parameters.
Without doing this, there's 2 fairly complex test cases spread over nearly 100 lines. Intent is difficult to discern, due to the size and degree of code analysis needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'll refactor that
@@ -101,7 +102,7 @@ public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclarati | |||
* | |||
* @return The value of the metric, or {@code Double.NaN} if the value couln't be computed | |||
*/ | |||
public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclaration node, MetricVersion version) { | |||
public static double get(MetricKey<OperationMetric> key, ASTMethodOrConstructorDeclaration node, MetricVersion version) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried finding more details on the MetricVersion
digging through /master here on github, but didn't any classes. Not in IDE right now, so discovery is merely manual. Are there usages? The Metric enums didn't seem use this per the comments on the interface. Anyway, If the metric key and version are related, I'm curious why it makes sense to separate them in argument order for some of these overloads. For example here, the argument order is Key, Node, Version
.
I have other questions, but I think I need to understand the relationships in this object model better before asking. I'll find some time to do so. Feel free to point me toward any resources I could use to enlighten myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see an example in CycloMetric
, there's a nested Version
enum defining an alternative version. In CyclomaticComplexityRule
, the user can set the version via a property.
It was not my intention to separate them, but I added versions after the fact so I just appended the version parameter to the parameter list. Overall I think it's best to keep it on the right because it's optional. I find it less confusing that invariably the two first arguments are needed here
Unfortunately I don't have much more than the javadoc to document the framework for now. There is a page in my fork's wiki which roughly describes the architecture of the framework, but most code snippets are outdated and there have been many additions to the framework since I wrote that (for my application to GSoC, around April). The part about signature matching is still accurate, so I recommend you to read it nevertheless (it's a short read). I'll try to list a few things that might interest you here:
PackageStats
represents a package hierarchy. It stores references to its subpackages (other PackageStats) and the classes it contains (ClassStats). ClassStats models a class, stores an index of its fields and a reference to the methods it contains (OperationStats
)- We can query package stats about a specific operation or class using its
QualifiedName
(see packageast
). Qualified names are great because we can iterate on their package names or class names to find the correct class stats. ThegetSubPackage
andgetClassStats
methods of PackageStats implement that navigation logic. - Signature matching (see the wiki page) is not yet used in the framework but will soon be, but for that we need [java] RFC: new layer of abstraction atop PrimaryExpressions #497
- MetricVersions are a way to use a slightly modified metric, so that it computes its results in a different way.
Metrics
is the façade of the framework. You can compute a metric using a node and a MetricKey identifying the metric you'd like to compute.Metrics
delegates the computation of metrics to the toplevel PackageStats, which delegates to classStats. ClassStats computes the metric if it's a class metric, and manages a memoization map to cache results. If it's an operation metric, then ClassStats delegates to OperationStats which stores a similar map and returns the result.- These memoization maps are indexed by ParameterizedMetricKeys, which allow several versions of the same metric to be cached (essentially a MetricKey + a MetricVersion). If no version was provided as the place of the call to
Metrics
, we useMetric.Version.STANDARD
. - Concrete metrics implement the Metric interface (essentially a marker interface), and other more specific interfaces (
ClassMetric
,OperationMetric
). They have a single abstract class for now, which eg provides defaults forsupports
. That's about to change (discussion here.
Anyway I don't know if that's the type of info you were looking for. Don't hesitate to ask questions ;)
@@ -53,7 +54,7 @@ private Metrics() { // Cannot be instantiated | |||
* | |||
* @return The value of the metric, or {@code Double.NaN} if the value couln't be computed | |||
*/ | |||
public static double get(ClassMetricKey key, ASTAnyTypeDeclaration node) { | |||
public static double get(MetricKey<ClassMetric> key, ASTAnyTypeDeclaration node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a static
singleton pattern is concerning to me. I've come to see it as an antiquated style which causes far more problems than it's worth. I find it generally better to code cleanly without statics, and then directly address the scoping problem outside of the implementation. Thus, if PMD needs to use a single instance of this state for some scope of processing, or we need Rules to have convenient access, then directly address those problems. It's not the implementation's issue (i.e. metrics).
See PMDConfiguration
and RuleContext
as existing mechanisms for the propagation/scoping of state, and the high level coordination performed by MultiThreadProcessor
and PmdRunnable
. PMD juggles quite a bit of complex state internally, from multi-threaded source file processing to reporting rendering.
It's also usually a pain to unit test static
stuff... mocking is harder, you have to be very careful to cleanup state between tests, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get your point. Metrics
is really just a user interface, why would it need to be something other than a singleton? Btw testing for metrics is quite straightforward, see #482
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think I got your point. Your strategy may directly address a bigger problem which is the abstraction of this framework into core
so that similar frameworks can be built for other languages. Such frameworks would ideally have the same structure, but would address different AST nodes, and might represent a system differently (PackageStats is strongly fitted to Java's package system and might not be reusable for other languages as is). Having one static Façade class per language is okay, but ideally should delegate to an instance of a language specific implementation of Metrics
. Plus you're right that this singleton pattern has caused some problems when testing, because it wasn't reinitialized between each test. This looks like a much robuster solution than what's currenly used
I'll deal with the abstraction part of the project in a few weeks I think, but I'll keep that in mind, thanks :)
This test uses a metric that returns a random number to check whether the results are memoized or not.
This metric was defined ad hoc in the test class. To do that, some parts of the framework have been abstracted to use the interfaces of the API. That means it's now possible for any user to define a custom metric and use it in the standard way.
The PR includes tests for ParameterizedMetricKey too