[java] Metrics memoization tests #499

oowekyala · 2017-07-11T18:52:31Z

This test uses a metric that returns a random number to check whether the results are memoized or not.
This metric was defined ad hoc in the test class. To do that, some parts of the framework have been abstracted to use the interfaces of the API. That means it's now possible for any user to define a custom metric and use it in the standard way.

The PR includes tests for ParameterizedMetricKey too

adangel

Great!

adangel · 2017-07-12T19:30:01Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/ParameterizedMetricKey.java

-            return false;
-        }
-        return version.equals(that.version);
+        return this == o;


Right, this can be done, since we have a factory method.
Please double check, that we do not run into issues, if we serialize/deserialize instances of this class - Object deserialization would be the last possibility, where new instances are created, that this class doesn't control (though could be solved by "readResolve" implementation).

Isn't this a final non-Serializable class? Curious, where is Serialization currently used in PMD?

In the metrics framework not yet, but later on, when we integrate it with the analysis cache.

Currently "RuleViolations" are serialized into a cache file, and - if the file has not been modified since the last pmd run (and pmd is run with the same rules) - the file is not analyzed again and the RuleViolations from the cache are simply returned.

adangel · 2017-07-12T19:35:31Z

pmd-java/src/test/java/net/sourceforge/pmd/lang/java/oom/DataStructureTest.java

+
+        @Override
+        public ClassMetric getCalculator() {
+            return new RandomMetric();


Nice idea with the RandomMetric for tests.
I like it, that you can now define such metrics adhoc for testing!

Seems like idiomatic support would be nice for these things, so assuming Java 8:

MetricKey<ClassMetric> classMetricKey = MetricKey.of(null, RandomMetric::new)

With a MetricKey static method:

static <T extends Metric> MetricKey<T> MetricKey.of(String name, Supplier<T> supplier)

Also, curious, are MetricKey intended to use identity based equality? The JavaDoc doesn't say, seems useful for a "key" object. The non-test implementations I see are enum based.

That's a nice idea! It's much cleaner than the current anonymous class strategy.

Hmm well bottom line, keys should implement hashcode, which is why it's convenient to use enums. They're not used directly to index memoization maps, they're incorporated into a ParameterizedMetricKey (metrickey + metricversion).

As parameterized metric keys are instance controlled, they can use identity based equality, but in order to find the right instance of parameterized metric key, we use the hashcode of the key (see ParameterizedMetricKey#code(MetricKey, MetricVersion)). So I guess keys need not even implement equals, even though it's a good idea

adangel · 2017-07-12T19:44:08Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/ClassStats.java

                                                 boolean force, MetricVersion version, ResultOption option) {

        List<ASTMethodOrConstructorDeclaration> ops = findOperations(node, false);

        List<Double> values = new ArrayList<>();
        for (ASTMethodOrConstructorDeclaration op : ops) {
-            if (key.supports(op)) {
+            if (key.getCalculator().supports(op)) {


Maybe it makes sense to define the "supports" method also in the MetricKey interface. Would be a minor improvement. I guess, it would need to be a bit more generic, than it is now, e.g.

boolean supports(AccessNode node);

It would then be the responsibility of the metric to return false, if the node type is not supported.
Hm... we basically have already a "supports(AccessNode node)" method in AbstractMetric.
Downside is, that each MetricKey implementation needs to delegate one additional method.

Hmm I think that through generifying the interfaces we could achieve an even more interesting results. If you generify Metric into Metric<N>, where N stands for the type of node the metric can be computed on, that would look like so:

interface Metric<N extends Node> { boolean supports(N node); double computeFor(N node, MetricVersion v); } interface ClassMetric extends Metric<ASTAnyTypeDeclaration> {} interface OperationMetric extends Metric<ASTMethodOrConstructorDeclaration> {} interface MetricKey<N> { boolean supports(N node); Metric<N> getCalculator(); String name(); }

The downside is that no class could implement ClassMetric or OperationMetric at the same time, as that would make Metric be inherited with different type arguments. Though I think it could be good to separate ClassMetrics' implementations from their OperationMetric, as Metric objects represent behaviour (they could nearly be seen as a lambda if it wasn't for support). The fact that some concrete metrics implement both interfaces as of now seems contrary to that design, as it mixes conceptually unrelated behaviours.

Would you let me work on that for the PR after the next ? I already have a few commits after these ones in my local repo

This is an interesting idea. A design question... What is the purpose of having two subtypes, ClassMetric and OperationMetric, given the have identical method signatures? Is the distinction between these metrics best expressed using classes/inheritence, or using other means (e.g. property of the Metric, say "type" with an enum {CLASS, OPERATION})?

Apologies for the odd question, I've not pulled the code and dug through it.

Also, I'm a bit confused by the supports methods. In your example, it uses N variable, and not Node or ? extends Node. Is it intended to compile time constrain to the specific type, and dynamically determine if different instances of the type are supported?

I think inheritance is fine to express the differences between them. That allows us to have one type for class metrics and another for operation metrics, which can be used to put compile time constraints on what metric can be calculated on what node. For example, we can forbid calculating a class metric on an operation node (see the get overloads in Metrics).

The supports method aims at even finer capability checking. For instance some class metrics may not be calculated on interfaces, even though the compile time check above would allow it (as interfaces, classes, enums and annotation type declarations are grouped under ASTAnyTypeDeclaration). You can see the defaults of this method in AbstractMetric.

The fact that supports had an AccessNode parameter was just poor design. I did that so that metrics, whether class or operation metrics, could all extend a single AbstractMetric class. To do that, supports had to be declared in the Metric interface, which is why its argument's type could not be too specific. But ideally the interfaces shouldn't even hint that a class metric could support any AccessNode, so it would be good to change that.

ryan-gustafson · 2017-07-13T04:02:47Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/ClassStats.java

@@ -159,14 +159,14 @@
     *
     * @return The result of the computation, or {@code Double.NaN} if it couldn't be performed
     */
-    /* default */ double computeWithResultOption(OperationMetricKey key, ASTAnyTypeDeclaration node,
+    /* default */ double computeWithResultOption(MetricKey<OperationMetric> key, ASTAnyTypeDeclaration node,


Is the use of boilerplate /* default */ comments for Java default scope (colloquially known as package scope) PMD style?

This is something I tend to see with newer Java programmers unfamiliar with the language default behaviors, similar to declaring interface methods public or abstract, etc. I wouldn't think we would want it. Alternatively, there are cases were descriptive comments indicating why a class does not have broader scope are useful, but that's not this, and not sure it's warranted here.

Hmm that's an interesting point. I began to do that since I saw this comment on a PR for type resolution. I thought it was PMD style, and I don't think it's too noisy so I kept it.

The important thing is to have a consistent style though, so I guess that should be discussed in #361

ryan-gustafson · 2017-07-13T04:32:39Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/Metrics.java

@@ -123,7 +124,7 @@ public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclarati
     * @return The value of the metric, or {@code Double.NaN} if the value couln't be computed or {@code option} is
     * {@code null}
     */
-    public static double get(OperationMetricKey key, ASTAnyTypeDeclaration node, ResultOption option) {
+    public static double get(MetricKey<OperationMetric> key, ASTAnyTypeDeclaration node, ResultOption option) {


It's worth pointing out that these get overloads are working because of the use of different 2nd, 3rd, etc. argument types. This method pattern may run into issues if a ClassMetric and OperationMetric needed the same node and supplementary argument types due to type erasure on the MetricKey argument.

I don't see how an operation metric and class metric could use the same node. ASTAnyTypeDeclaration and ASTMethodOrConstructorDeclaration are unrelated types, and the compiler type checks before it erases the type arguments. Perhaps the only instance where I could see a problem happening would be using reflection to call these methods, because the type argument would have been erased, but then that's highly unlikely.

ryan-gustafson · 2017-07-13T04:39:14Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/ParameterizedMetricKey.java

@@ -16,7 +16,7 @@
 *
 * @author Clément Fournier
 */
-public final class ParameterizedMetricKey {
+/* default */ final class ParameterizedMetricKey {


Is this a MetricKey ?

Nope, it's an internal structure (not part of the API) to represent a metric key coupled with a metric version. It's used to index memoization maps, so that several versions of the same metric can be cached. And because it doesn't care about where its metrickey and version comes from, it allows metrics defined outside the framework (custom ones) to be cached too

ryan-gustafson · 2017-07-13T04:44:23Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/rule/CyclomaticComplexityRule.java

@@ -28,21 +28,21 @@
 */
 public class CyclomaticComplexityRule extends AbstractJavaMetricsRule {

-    public static final IntegerProperty REPORT_LEVEL_DESCRIPTOR = new IntegerProperty(
+    private static final IntegerProperty REPORT_LEVEL_DESCRIPTOR = new IntegerProperty(


This would imply subclasses are not allowed to interrogate these properties using the descriptors. Is this intended? Is subclassing not permitted (the class is not final)?

Hmm well maybe we should make them protected, but I think there's no point in having them public. The class is not designed for inheritance though, so maybe we should make it final? I haven't seen many final rules in PMD...

ryan-gustafson · 2017-07-13T05:00:45Z

pmd-java/src/test/java/net/sourceforge/pmd/lang/java/oom/DataStructureTest.java

    @Before
    public void setUp() {
        pack = new PackageStats();
    }

+


I've personally never see a 2-blank line style. :)

Stop reading now if you have more important things to do...

Back in the day I used to prefer { on a separate line, but I've since changed and come to value information density more. I even tilt my monitors to portrait mode so I can see more LOC at once. Not that I want to write huge classes or methods, but it does seem to greatly help my ability to grok code faster, scrolling/navigation is my arch-nemesis. :) I take mildly perverse pleasure whenever I write a complete meaningful class, JavaDoc and all, and it entirely fits within a single visual window on my monitor. The definition is clearly arbitrary, but let's call this OSM, the one-screen-metric! ;)

Yet another metric! You're welcome to make a feature request :)

I used to cramp up my classes too, writing { } on one line, skipping braces for if statements, etc. because I have a small monitor. But I found out it's much easier to skim a class and find info when you can find out if hunks of code are their own methods at a glance: two blank lines => two different methods, one blank line => inside a method. I think I might have a very tubular vision too because anything more than a few cm away from where I look on the monitor is just annoying noise to me, which is why I like to keep things clearly separated

Doing that actually helped me focus on the actual code, which is in the methods (a java class is so much boilerplate after all). For me information density is a relevant metric inside a single method, not so much a class. I totally get that feeling of "mildly perverse pleasure" when I get to write the most elegant method I could, but the more satisfaction I get, the more frustration I feel when Checkstyle bashes me with "assignments in guard clauses are not allowed" or whatever :)

Btw I had never seen the word grok before but I'll keep it, thanks!

ryan-gustafson · 2017-07-13T05:02:06Z

pmd-java/src/test/java/net/sourceforge/pmd/lang/java/oom/DataStructureTest.java

+            }
+        }, null);
+
+        System.out.println();


Is the visitation above dumping console output such that this is needed?

ryan-gustafson · 2017-07-13T05:04:48Z

pmd-java/src/test/java/net/sourceforge/pmd/lang/java/oom/DataStructureTest.java

+
+        for (int i = 0; i < res.size(); i++) {
+            assertNotEquals(res.get(i), cmp.get(i));
+        }


Is cmp.size() > res.size() allowed?

ryan-gustafson · 2017-07-13T05:07:35Z

pmd-java/src/test/java/net/sourceforge/pmd/lang/java/oom/DataStructureTest.java

+            }
+        }, null);
+
+        assertEquals(res, cmp);


Do the names res and cmp mean something to you? I'm drawing a blank.

ryan-gustafson · 2017-07-13T05:19:24Z

pmd-java/src/test/java/net/sourceforge/pmd/lang/java/oom/DataStructureTest.java

+                cmp.add((int) toplevel.compute(classMetricKey, node, true, Version.STANDARD));
+                return super.visit(node, data);
+            }
+        }, null);


Looks like 4 of these anonymous classes? Do they differ besides the List they populate and the boolean parameter? The nature of the tests might be more obvious if there was one class (or factory method?) with whose constructor took those two arguments, and then used 4 times by these two tests. I suspect the tests are really about verifying behavior for interesting combinations of variations of those two parameters.

Without doing this, there's 2 fairly complex test cases spread over nearly 100 lines. Intent is difficult to discern, due to the size and degree of code analysis needed.

Hmm I'll refactor that

ryan-gustafson · 2017-07-13T05:25:13Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/Metrics.java

@@ -101,7 +102,7 @@ public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclarati
     *
     * @return The value of the metric, or {@code Double.NaN} if the value couln't be computed
     */
-    public static double get(OperationMetricKey key, ASTMethodOrConstructorDeclaration node, MetricVersion version) {
+    public static double get(MetricKey<OperationMetric> key, ASTMethodOrConstructorDeclaration node, MetricVersion version) {


I tried finding more details on the MetricVersion digging through /master here on github, but didn't any classes. Not in IDE right now, so discovery is merely manual. Are there usages? The Metric enums didn't seem use this per the comments on the interface. Anyway, If the metric key and version are related, I'm curious why it makes sense to separate them in argument order for some of these overloads. For example here, the argument order is Key, Node, Version.

I have other questions, but I think I need to understand the relationships in this object model better before asking. I'll find some time to do so. Feel free to point me toward any resources I could use to enlighten myself.

You can see an example in CycloMetric, there's a nested Version enum defining an alternative version. In CyclomaticComplexityRule, the user can set the version via a property.

It was not my intention to separate them, but I added versions after the fact so I just appended the version parameter to the parameter list. Overall I think it's best to keep it on the right because it's optional. I find it less confusing that invariably the two first arguments are needed here

Unfortunately I don't have much more than the javadoc to document the framework for now. There is a page in my fork's wiki which roughly describes the architecture of the framework, but most code snippets are outdated and there have been many additions to the framework since I wrote that (for my application to GSoC, around April). The part about signature matching is still accurate, so I recommend you to read it nevertheless (it's a short read). I'll try to list a few things that might interest you here:

PackageStats represents a package hierarchy. It stores references to its subpackages (other PackageStats) and the classes it contains (ClassStats). ClassStats models a class, stores an index of its fields and a reference to the methods it contains (OperationStats)

We can query package stats about a specific operation or class using its QualifiedName (see package ast). Qualified names are great because we can iterate on their package names or class names to find the correct class stats. The getSubPackage and getClassStats methods of PackageStats implement that navigation logic.

Signature matching (see the wiki page) is not yet used in the framework but will soon be, but for that we need [java] RFC: new layer of abstraction atop PrimaryExpressions #497

MetricVersions are a way to use a slightly modified metric, so that it computes its results in a different way.

Metrics is the façade of the framework. You can compute a metric using a node and a MetricKey identifying the metric you'd like to compute. Metrics delegates the computation of metrics to the toplevel PackageStats, which delegates to classStats. ClassStats computes the metric if it's a class metric, and manages a memoization map to cache results. If it's an operation metric, then ClassStats delegates to OperationStats which stores a similar map and returns the result.

These memoization maps are indexed by ParameterizedMetricKeys, which allow several versions of the same metric to be cached (essentially a MetricKey + a MetricVersion). If no version was provided as the place of the call to Metrics, we use Metric.Version.STANDARD.

Concrete metrics implement the Metric interface (essentially a marker interface), and other more specific interfaces (ClassMetric, OperationMetric). They have a single abstract class for now, which eg provides defaults for supports. That's about to change (discussion here.

Anyway I don't know if that's the type of info you were looking for. Don't hesitate to ask questions ;)

ryan-gustafson · 2017-07-13T05:51:05Z

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/oom/Metrics.java

@@ -53,7 +54,7 @@ private Metrics() { // Cannot be instantiated
     *
     * @return The value of the metric, or {@code Double.NaN} if the value couln't be computed
     */
-    public static double get(ClassMetricKey key, ASTAnyTypeDeclaration node) {
+    public static double get(MetricKey<ClassMetric> key, ASTAnyTypeDeclaration node) {


The use of a static singleton pattern is concerning to me. I've come to see it as an antiquated style which causes far more problems than it's worth. I find it generally better to code cleanly without statics, and then directly address the scoping problem outside of the implementation. Thus, if PMD needs to use a single instance of this state for some scope of processing, or we need Rules to have convenient access, then directly address those problems. It's not the implementation's issue (i.e. metrics).

See PMDConfiguration and RuleContext as existing mechanisms for the propagation/scoping of state, and the high level coordination performed by MultiThreadProcessor and PmdRunnable. PMD juggles quite a bit of complex state internally, from multi-threaded source file processing to reporting rendering.

It's also usually a pain to unit test static stuff... mocking is harder, you have to be very careful to cleanup state between tests, etc.

I don't really get your point. Metrics is really just a user interface, why would it need to be something other than a singleton? Btw testing for metrics is quite straightforward, see #482

Hmm I think I got your point. Your strategy may directly address a bigger problem which is the abstraction of this framework into core so that similar frameworks can be built for other languages. Such frameworks would ideally have the same structure, but would address different AST nodes, and might represent a system differently (PackageStats is strongly fitted to Java's package system and might not be reusable for other languages as is). Having one static Façade class per language is okay, but ideally should delegate to an instance of a language specific implementation of Metrics. Plus you're right that this singleton pattern has caused some problems when testing, because it wasn't reinitialized between each test. This looks like a much robuster solution than what's currenly used

I'll deal with the abstraction part of the project in a few weeks I think, but I'll keep that in mind, thanks :)

oowekyala added 3 commits July 10, 2017 21:05

Cleaned OperationSignature

1d5ec7f

Abstracted Metrics to use MetricKeys

341d66b

Memoization tests

fb64e13

oowekyala force-pushed the memoization-tests branch from 3061778 to fb64e13 Compare July 11, 2017 19:02

adangel added the in:metrics Affects the metrics framework label Jul 12, 2017

adangel added this to the 6.0.0 milestone Jul 12, 2017

adangel reviewed Jul 12, 2017

View reviewed changes

ryan-gustafson reviewed Jul 13, 2017

View reviewed changes

Fix memo tests

f44a08d

adangel added a commit to adangel/pmd that referenced this pull request Jul 19, 2017

Update changelog, refs pmd#499

b5316b4

adangel merged commit f44a08d into pmd:master Jul 19, 2017

oowekyala deleted the memoization-tests branch July 20, 2017 14:54

Uh oh!

[java] Metrics memoization tests #499

[java] Metrics memoization tests #499

Uh oh!

Conversation

oowekyala commented Jul 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adangel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oowekyala Jul 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oowekyala commented Jul 11, 2017 •

edited

Loading

oowekyala Jul 12, 2017 •

edited

Loading