On 15 February 2012 15:28, Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>> wrote: <div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Over the last six months a lot of feature work has happened in Mesa, and the load has been carried by a lot of different people / organization. In the process, we discovered a number of development process issues that made things more difficult than they needed to be. The biggest problem the we encountered was the number of features marked "50% done" in GL3.txt. To complete these features, a significant amount of time had to be spent figuring out what was done, and what was left to do. Since the changes had landed on master a very long time ago, this was a frustrating and error prone process. Transform feedback was the worst case of this. One developer spent several weeks trying to assess the state of development. In the process, a couple items were missed and not detected until early January. PROPOSAL: No feature will be merged to master until a vertical slice[1] of functionality is complete. To be specific, this means that some useful workload should be able to correctly execute. This doesn't mean that everything is done or even that any particular real application needs to work. At the very least there should be a demo or piglit test that uses the functionality in its intended manner that works. This means that some incomplete features may sit on branches of a long time. That's okay! It's really easy to see what has been done on a branch because you can 'git log origin/master..EXT_some_feature'. This trivializes the assessment of the state of development. We encountered similar problems with pieces of functionality that were ostensibly done. Many elements of OpenGL functionality are like Koch snowflakes: everything is a corner case. Integer textures and floating-point textures are prime cases of this. Even though the implementation was done and enabled in several drivers, we had no way to assess the quality. The same problem holds in cases of features that are known to be incomplete, even if a vertical slice is functional. PROPOSAL: No feature will be merged to master until a robust set of tests are implemented or at least documented. We don't necessarily need 10,000 tests for some feature, but there should be some. There should also be a list of "test this corner case, test that corner case, check this error condition, etc." As an example, we've come up with a list of missing test cases for EXT_framebuffer_multisample: - Test turning multisample on and off on a MSAA buffer. - Test multisample points smooth - Test multisample points non-smooth - Test multisample lines smooth - Test multisample lines non-smooth - Test multisample line stipple - Test multisample polygon smooth - Test multisample polygon stipple - Test multisample glBitmap - Test sample alpha to one - Test sample coverage - Test sample coverage invert - Test sample alpha-to-coverage, with alpha-to-one - Test sample alpha-to-coverage, without alpha-to-one - Test blit multisample-to-multisample (matching sample count) - Test blit multisample-to-multisample (non-matching sample count) - Test blit multisample-to-multisample (non-matching format) - Test blit singlesample-to-multisample - Test blit multisample-to-multisample (scaling blit) - Iterate all the above for all the sample counts - Iterate all internal formats for some basic MSAA test - "All of the above" for depth and stencil - Try really hard to hit HiZ resolves too This list represents about 30 or 40 test cases. By giving some indication of areas that need testing, we communicate the parts of the feature that are known to be incomplete or just plain broken (Keith's 1st Law: Any piece of code that isn't tested is broken.). The test list also need not necessarily come from the developer. It is completely reasonable and desirable for reviewers to suggest or implement tests. All of that information just needs to be captured, and consensus must be achieved. The onus of enforcing these proposals will fall on the reviewers. This still leaves room for sensible things to happen while preventing the problems that we encountered. For example, it should be fine to merge Marek and nobled's ARB_debug work with few tests implemented or specified. The same could not be said for the proposed implementation of ARB_uniform_buffer_object. [1] <a href="http://en.wikipedia.org/wiki/Vertical_slice" target="_blank">http://en.wikipedia.org/wiki/Vertical_slice</a> _______________________________________________ mesa-dev mailing list <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a> <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> </blockquote></div> To add some concreteness to the discussion, I went back through all of my git commits from the last several months to see where, specifically, I had to make unexpected fixes to code that others had only partially finished. The majority of the cases boiled down to the following problem areas: 1. Missing checks for error conditions that the spec requires an implementation to check for. 2. Newly-added state is not being saved/restored properly by meta-ops. 3. Overzealous error checks: some error condition is not being checked for in the correct way, causing an error to be generated even for correct input. 4. API Functions neglecting to set ctx->NewState when necessary, so the driver fails to notice that state has changed. IMHO, Ian's proposals, or some variant of them, would be useful because they directly address pain points like the ones above. For instance, requiring complete vertical slices of functionality would help with 3 and 4, because it would ensure that at the time state was added to the front-end, we could easily test that things propagated properly to the back-end. And requiring a robust set of tests would help with 1 and 2, because any adequately robust set of tests for a feature would include testing error conditions and meta-op behaviour. Having said that, I am concerned about these proposals encouraging people to batch their work into even larger patch series than they do already (and, in general, I think we already batch up our patches more than we should). To that end, I'd like to propose two clarifications and an exception to Ian's proposed rules: CLARIFICATION: any patch series that affects what is output to the screen (or to a buffer) counts as a vertical slice of functionality, even if it is not an entire feature or extension. So, for example, in the work that I did on transform feedback last quarter, "make transform feedback work properly for gl_ClipVertex" would still count as a vertical slice of functionality, because it affects what is output to the transform feedback buffer when gl_ClipVertex is in use. It also counts as a vertical slice of functionality if a patch series allows drawing to take place in a situation where an error or crash would have previously occurred. CLARIFICATION: for any patch series that implements a vertical slice of functionality, we should have at least one piglit test (but hopefully many tests) which checks that the correct output is generated, and the appropriate patch in the series should be annotated with "fixes Piglit test XYZ". EXCEPTION: we allow changes that don't affect what is output to the screen (or a buffer) provided that they aren't adding functionality. For example, refactoring to prepare for later commits is fine. Performance improvements are fine. Eliminating duplicate code is fine. These kinds of changes don't require tests. However, we don't add a bunch of front end support for something that has no back-end support yet. Note: I have worked on teams in the past that would have required tests even for performance optimization and some refactoring, but those teams had strong unit testing* in place. (*By "unit testing", I mean tests that exercise individual components of the software in isolation, rather than just testing from the client side as we do with Piglit). Adding unit tests to a project that hasn't had them before often requires a fair amount of refactoring, to insert the hooks necessary to test a component in isolation. I don't want to stall progress by asking for a level of unit testing that would be hard to achieve right now. But I do hope we can improve our level of unit testing in the future--in my experience it is one of the most effective ways to catch bugs early in the development process.