Skip to content

[java] Metrics memoization tests #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 19, 2017
Merged

Conversation

oowekyala
Copy link
Member

@oowekyala oowekyala commented Jul 11, 2017

This test uses a metric that returns a random number to check whether the results are memoized or not.
This metric was defined ad hoc in the test class. To do that, some parts of the framework have been abstracted to use the interfaces of the API. That means it's now possible for any user to define a custom metric and use it in the standard way.

The PR includes tests for ParameterizedMetricKey too

@oowekyala oowekyala force-pushed the memoization-tests branch from 3061778 to fb64e13 Compare July 11, 2017 19:02
@adangel adangel added the in:metrics Affects the metrics framework label Jul 12, 2017
@adangel adangel added this to the 6.0.0 milestone Jul 12, 2017
Copy link
Member

@adangel adangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

return false;
}
return version.equals(that.version);
return this == o;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this can be done, since we have a factory method.
Please double check, that we do not run into issues, if we serialize/deserialize instances of this class - Object deserialization would be the last possibility, where new instances are created, that this class doesn't control (though could be solved by "readResolve" implementation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a final non-Serializable class? Curious, where is Serialization currently used in PMD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the metrics framework not yet, but later on, when we integrate it with the analysis cache.

Currently "RuleViolations" are serialized into a cache file, and - if the file has not been modified since the last pmd run (and pmd is run with the same rules) - the file is not analyzed again and the RuleViolations from the cache are simply returned.


@Override
public ClassMetric getCalculator() {
return new RandomMetric();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea with the RandomMetric for tests.
I like it, that you can now define such metrics adhoc for testing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like idiomatic support would be nice for these things, so assuming Java 8:

MetricKey<ClassMetric> classMetricKey = MetricKey.of(null, RandomMetric::new)

With a MetricKey static method:

static <T extends Metric> MetricKey<T> MetricKey.of(String name, Supplier<T> supplier)

Also, curious, are MetricKey intended to use identity based equality? The JavaDoc doesn't say, seems useful for a "key" object. The non-test implementations I see are enum based.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea! It's much cleaner than the current anonymous class strategy.

Hmm well bottom line, keys should implement hashcode, which is why it's convenient to use enums. They're not used directly to index memoization maps, they're incorporated into a ParameterizedMetricKey (metrickey + metricversion).

As parameterized metric keys are instance controlled, they can use identity based equality, but in order to find the right instance of parameterized metric key, we use the hashcode of the key (see ParameterizedMetricKey#code(MetricKey, MetricVersion)). So I guess keys need not even implement equals, even though it's a good idea

boolean force, MetricVersion version, ResultOption option) {

List<ASTMethodOrConstructorDeclaration> ops = findOperations(node, false);

List<Double> values = new ArrayList<>();
for (ASTMethodOrConstructorDeclaration op : ops) {
if (key.supports(op)) {
if (key.getCalculator().supports(op)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to define the "supports" method also in the MetricKey interface. Would be a minor improvement. I guess, it would need to be a bit more generic, than it is now, e.g.

boolean supports(AccessNode node);

It would then be the responsibility of the metric to return false, if the node type is not supported.
Hm... we basically have already a "supports(AccessNode node)" method in AbstractMetric.
Downside is, that each MetricKey implementation needs to delegate one additional method.

Copy link
Member Author

@oowekyala oowekyala Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think that through generifying the interfaces we could achieve an even more interesting results. If you generify Metric into Metric<N>, where N stands for the type of node the metric can be computed on, that would look like so:

interface Metric<N extends Node> {
  boolean supports(N node);
  double computeFor(N node, MetricVersion v);
}

interface ClassMetric extends Metric<ASTAnyTypeDeclaration> {}
interface OperationMetric extends Metric<ASTMethodOrConstructorDeclaration> {}

interface MetricKey<N> {
  boolean supports(N node);
  Metric<N> getCalculator();
  String name();
}

The downside is that no class could implement ClassMetric or OperationMetric at the same time, as that would make Metric be inherited with different type arguments. Though I think it could be good to separate ClassMetrics' implementations from their OperationMetric, as Metric objects represent behaviour (they could nearly be seen as a lambda if it wasn't for support). The fact that some concrete metrics implement both interfaces as of now seems contrary to that design, as it mixes conceptually unrelated behaviours.

Would you let me work on that for the PR after the next ? I already have a few commits after these ones in my local repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea. A design question... What is the purpose of having two subtypes, ClassMetric and OperationMetric, given the have identical method signatures? Is the distinction between these metrics best expressed using classes/inheritence, or using other means (e.g. property of the Metric, say "type" with an enum {CLASS, OPERATION})?

Apologies for the odd question, I've not pulled the code and dug through it.

Also, I'm a bit confused by the supports methods. In your example, it uses N variable, and not Node or ? extends Node. Is it intended to compile time constrain to the specific type, and dynamically determine if different instances of the type are supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inheritance is fine to express the differences between them. That allows us to have one type for class metrics and another for operation metrics, which can be used to put compile time constraints on what metric can be calculated on what node. For example, we can forbid calculating a class metric on an operation node (see the get overloads in Metrics).

The supports method aims at even finer capability checking. For instance some class metrics may not be calculated on interfaces, even though the compile time check above would allow it (as interfaces, classes, enums and annotation type declarations are grouped under ASTAnyTypeDeclaration). You can see the defaults of this method in AbstractMetric.

The fact that supports had an AccessNode parameter was just poor design. I did that so that metrics, whether class or operation metrics, could all extend a single AbstractMetric class. To do that, supports had to be declared in the Metric interface, which is why its argument's type could not be too specific. But ideally the interfaces shouldn't even hint that a class metric could support any AccessNode, so it would be good to change that.

@@ -159,14 +159,14 @@
*
* @return The result of the computation, or {@code Double.NaN} if it couldn't be performed
*/
/* default */ double computeWithResultOption(OperationMetricKey key, ASTAnyTypeDeclaration node,
/* default */ double computeWithResultOption(MetricKey<OperationMetric> key, ASTAnyTypeDeclaration node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of boilerplate /* default */ comments for Java default scope (colloquially known as package scope) PMD style?

This is something I tend to see with newer Java programmers unfamiliar with the language default behaviors, similar to declaring interface methods public or abstract, etc. I wouldn't think we would want it. Alternatively, there are cases were descriptive comments indicating why a class does not have broader scope are useful, but that's not this, and not sure it's warranted here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's an interesting point. I began to do that since I saw this comment on a PR for type resolution. I thought it was PMD style, and I don't think it's too noisy so I kept it.

The important thing is to have a consistent style though, so I guess that should be discussed in #361

@@ -123,7 +124,7 @@ public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclarati
* @return The value of the metric, or {@code Double.NaN} if the value couln't be computed or {@code option} is
* {@code null}
*/
public static double get(OperationMetricKey key, ASTAnyTypeDeclaration node, ResultOption option) {
public static double get(MetricKey<OperationMetric> key, ASTAnyTypeDeclaration node, ResultOption option) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth pointing out that these get overloads are working because of the use of different 2nd, 3rd, etc. argument types. This method pattern may run into issues if a ClassMetric and OperationMetric needed the same node and supplementary argument types due to type erasure on the MetricKey argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how an operation metric and class metric could use the same node. ASTAnyTypeDeclaration and ASTMethodOrConstructorDeclaration are unrelated types, and the compiler type checks before it erases the type arguments. Perhaps the only instance where I could see a problem happening would be using reflection to call these methods, because the type argument would have been erased, but then that's highly unlikely.

@@ -16,7 +16,7 @@
*
* @author Clément Fournier
*/
public final class ParameterizedMetricKey {
/* default */ final class ParameterizedMetricKey {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a MetricKey ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's an internal structure (not part of the API) to represent a metric key coupled with a metric version. It's used to index memoization maps, so that several versions of the same metric can be cached. And because it doesn't care about where its metrickey and version comes from, it allows metrics defined outside the framework (custom ones) to be cached too

@@ -28,21 +28,21 @@
*/
public class CyclomaticComplexityRule extends AbstractJavaMetricsRule {

public static final IntegerProperty REPORT_LEVEL_DESCRIPTOR = new IntegerProperty(
private static final IntegerProperty REPORT_LEVEL_DESCRIPTOR = new IntegerProperty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply subclasses are not allowed to interrogate these properties using the descriptors. Is this intended? Is subclassing not permitted (the class is not final)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well maybe we should make them protected, but I think there's no point in having them public. The class is not designed for inheritance though, so maybe we should make it final? I haven't seen many final rules in PMD...

@Before
public void setUp() {
pack = new PackageStats();
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've personally never see a 2-blank line style. :)

Stop reading now if you have more important things to do...

Back in the day I used to prefer { on a separate line, but I've since changed and come to value information density more. I even tilt my monitors to portrait mode so I can see more LOC at once. Not that I want to write huge classes or methods, but it does seem to greatly help my ability to grok code faster, scrolling/navigation is my arch-nemesis. :) I take mildly perverse pleasure whenever I write a complete meaningful class, JavaDoc and all, and it entirely fits within a single visual window on my monitor. The definition is clearly arbitrary, but let's call this OSM, the one-screen-metric! ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another metric! You're welcome to make a feature request :)

I used to cramp up my classes too, writing { } on one line, skipping braces for if statements, etc. because I have a small monitor. But I found out it's much easier to skim a class and find info when you can find out if hunks of code are their own methods at a glance: two blank lines => two different methods, one blank line => inside a method. I think I might have a very tubular vision too because anything more than a few cm away from where I look on the monitor is just annoying noise to me, which is why I like to keep things clearly separated

Doing that actually helped me focus on the actual code, which is in the methods (a java class is so much boilerplate after all). For me information density is a relevant metric inside a single method, not so much a class. I totally get that feeling of "mildly perverse pleasure" when I get to write the most elegant method I could, but the more satisfaction I get, the more frustration I feel when Checkstyle bashes me with "assignments in guard clauses are not allowed" or whatever :)

Btw I had never seen the word grok before but I'll keep it, thanks!

}
}, null);

System.out.println();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the visitation above dumping console output such that this is needed?


for (int i = 0; i < res.size(); i++) {
assertNotEquals(res.get(i), cmp.get(i));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cmp.size() > res.size() allowed?

}
}, null);

assertEquals(res, cmp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the names res and cmp mean something to you? I'm drawing a blank.

cmp.add((int) toplevel.compute(classMetricKey, node, true, Version.STANDARD));
return super.visit(node, data);
}
}, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like 4 of these anonymous classes? Do they differ besides the List they populate and the boolean parameter? The nature of the tests might be more obvious if there was one class (or factory method?) with whose constructor took those two arguments, and then used 4 times by these two tests. I suspect the tests are really about verifying behavior for interesting combinations of variations of those two parameters.

Without doing this, there's 2 fairly complex test cases spread over nearly 100 lines. Intent is difficult to discern, due to the size and degree of code analysis needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'll refactor that

@@ -101,7 +102,7 @@ public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclarati
*
* @return The value of the metric, or {@code Double.NaN} if the value couln't be computed
*/
public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclaration node, MetricVersion version) {
public static double get(MetricKey<OperationMetric> key, ASTMethodOrConstructorDeclaration node, MetricVersion version) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried finding more details on the MetricVersion digging through /master here on github, but didn't any classes. Not in IDE right now, so discovery is merely manual. Are there usages? The Metric enums didn't seem use this per the comments on the interface. Anyway, If the metric key and version are related, I'm curious why it makes sense to separate them in argument order for some of these overloads. For example here, the argument order is Key, Node, Version.

I have other questions, but I think I need to understand the relationships in this object model better before asking. I'll find some time to do so. Feel free to point me toward any resources I could use to enlighten myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see an example in CycloMetric, there's a nested Version enum defining an alternative version. In CyclomaticComplexityRule, the user can set the version via a property.

It was not my intention to separate them, but I added versions after the fact so I just appended the version parameter to the parameter list. Overall I think it's best to keep it on the right because it's optional. I find it less confusing that invariably the two first arguments are needed here

Unfortunately I don't have much more than the javadoc to document the framework for now. There is a page in my fork's wiki which roughly describes the architecture of the framework, but most code snippets are outdated and there have been many additions to the framework since I wrote that (for my application to GSoC, around April). The part about signature matching is still accurate, so I recommend you to read it nevertheless (it's a short read). I'll try to list a few things that might interest you here:

  • PackageStats represents a package hierarchy. It stores references to its subpackages (other PackageStats) and the classes it contains (ClassStats). ClassStats models a class, stores an index of its fields and a reference to the methods it contains (OperationStats)
  • We can query package stats about a specific operation or class using its QualifiedName (see package ast). Qualified names are great because we can iterate on their package names or class names to find the correct class stats. The getSubPackage and getClassStats methods of PackageStats implement that navigation logic.
  • Signature matching (see the wiki page) is not yet used in the framework but will soon be, but for that we need [java] RFC: new layer of abstraction atop PrimaryExpressions #497
  • MetricVersions are a way to use a slightly modified metric, so that it computes its results in a different way.
  • Metrics is the façade of the framework. You can compute a metric using a node and a MetricKey identifying the metric you'd like to compute. Metrics delegates the computation of metrics to the toplevel PackageStats, which delegates to classStats. ClassStats computes the metric if it's a class metric, and manages a memoization map to cache results. If it's an operation metric, then ClassStats delegates to OperationStats which stores a similar map and returns the result.
  • These memoization maps are indexed by ParameterizedMetricKeys, which allow several versions of the same metric to be cached (essentially a MetricKey + a MetricVersion). If no version was provided as the place of the call to Metrics, we use Metric.Version.STANDARD.
  • Concrete metrics implement the Metric interface (essentially a marker interface), and other more specific interfaces (ClassMetric, OperationMetric). They have a single abstract class for now, which eg provides defaults for supports. That's about to change (discussion here.

Anyway I don't know if that's the type of info you were looking for. Don't hesitate to ask questions ;)

@@ -53,7 +54,7 @@ private Metrics() { // Cannot be instantiated
*
* @return The value of the metric, or {@code Double.NaN} if the value couln't be computed
*/
public static double get(ClassMetricKey key, ASTAnyTypeDeclaration node) {
public static double get(MetricKey<ClassMetric> key, ASTAnyTypeDeclaration node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of a static singleton pattern is concerning to me. I've come to see it as an antiquated style which causes far more problems than it's worth. I find it generally better to code cleanly without statics, and then directly address the scoping problem outside of the implementation. Thus, if PMD needs to use a single instance of this state for some scope of processing, or we need Rules to have convenient access, then directly address those problems. It's not the implementation's issue (i.e. metrics).

See PMDConfiguration and RuleContext as existing mechanisms for the propagation/scoping of state, and the high level coordination performed by MultiThreadProcessor and PmdRunnable. PMD juggles quite a bit of complex state internally, from multi-threaded source file processing to reporting rendering.

It's also usually a pain to unit test static stuff... mocking is harder, you have to be very careful to cleanup state between tests, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get your point. Metrics is really just a user interface, why would it need to be something other than a singleton? Btw testing for metrics is quite straightforward, see #482

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I got your point. Your strategy may directly address a bigger problem which is the abstraction of this framework into core so that similar frameworks can be built for other languages. Such frameworks would ideally have the same structure, but would address different AST nodes, and might represent a system differently (PackageStats is strongly fitted to Java's package system and might not be reusable for other languages as is). Having one static Façade class per language is okay, but ideally should delegate to an instance of a language specific implementation of Metrics. Plus you're right that this singleton pattern has caused some problems when testing, because it wasn't reinitialized between each test. This looks like a much robuster solution than what's currenly used

I'll deal with the abstraction part of the project in a few weeks I think, but I'll keep that in mind, thanks :)

adangel added a commit to adangel/pmd that referenced this pull request Jul 19, 2017
@adangel adangel merged commit f44a08d into pmd:master Jul 19, 2017
@oowekyala oowekyala deleted the memoization-tests branch July 20, 2017 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:metrics Affects the metrics framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants