settingsLogin | Registersettings

[openstack-dev] [qa][all] Branchless Tempest beyond pure-API tests, impact on backporting policy

0 votes

TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
makes it difficult to test new features not discoverable via APIs

Folks,

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

The issues, as I see it, are:

  • Tempest is expected to do double-duty as both the integration testing
    harness for upstream CI and as a tool for externally probing capabilities
    in public clouds

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

  • We don't have another integration test harness other than Tempest
    that we could use to host tests that don't just make assertions
    about the correctness/presence of versioned APIs

  • We want to be able to add new features to Juno, or fix bugs of
    omission, in ways that aren't necessarily discoverable in the API;
    without backporting these patches to stable if we wouldn't have
    done so under the normal stable-maint policy[3]

  • Integrated projects are required[4] to provide Tempest coverage,
    so the rate of addition of tests to Tempest is unlikely to slow
    down anytime soon

So the specific type of test that I have in mind would be common
for Ceilometer, but also possibly for Ironic and others:

  1. an end-user initiates some action via an API
    (e.g. calls the cinder snapshot API)

  2. this initiates some actions behind the scenes
    (e.g. a volume is snapshot'd and a notification emitted)

  3. the test reasons over some expected side-effect
    (e.g. some metering data shows up in ceilometer)

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

Any other ideas, or preferences among the options laid out above?

Cheers,
Eoghan

[1] http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-08-21.03.html
[2] https://review.openstack.org/104863
[3] https://wiki.openstack.org/wiki/StableBranch#Appropriate_Fixes
[4] https://github.com/openstack/governance/blob/master/reference/incubation-integration-requirements.rst#qa-1
[5] https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst#scenario-1-new-tests-for-new-features

asked Jul 9, 2014 in openstack-dev by Eoghan_Glynn (7,800 points)   1 3 3
retagged Jan 29, 2015 by admin

14 Responses

0 votes

On Wed, Jul 09, 2014 at 05:41:10AM -0400, Eoghan Glynn wrote:

TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
makes it difficult to test new features not discoverable via APIs

Folks,

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

So, [2] is definitely not something that should be backported. But, doesn't it
mean that cinder snapshot notifications don't work at all in icehouse? Is this
reflected in the release notes or docs somewhere, because it seems like
something that would be expected to work, which, I think, is actually a bigger
bug being exposed by branchless tempest. As a user how do I know that with
the cloud I'm using whether cinder snapshot notifications are supported?

The issues, as I see it, are:

  • Tempest is expected to do double-duty as both the integration testing
    harness for upstream CI and as a tool for externally probing capabilities
    in public clouds

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

I think this is the bigger issue. If there is cross-service communication it
should have an API contract. (and probably be directly tested too) It doesn't
necessarily have to be a REST API, although in most cases that's easier. This
is probably something for the TC to discuss/mandate, though.

  • We don't have another integration test harness other than Tempest
    that we could use to host tests that don't just make assertions
    about the correctness/presence of versioned APIs

  • We want to be able to add new features to Juno, or fix bugs of
    omission, in ways that aren't necessarily discoverable in the API;
    without backporting these patches to stable if we wouldn't have
    done so under the normal stable-maint policy[3]

  • Integrated projects are required[4] to provide Tempest coverage,
    so the rate of addition of tests to Tempest is unlikely to slow
    down anytime soon

So the specific type of test that I have in mind would be common
for Ceilometer, but also possibly for Ironic and others:

  1. an end-user initiates some action via an API
    (e.g. calls the cinder snapshot API)

  2. this initiates some actions behind the scenes
    (e.g. a volume is snapshot'd and a notification emitted)

  3. the test reasons over some expected side-effect
    (e.g. some metering data shows up in ceilometer)

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

I think the test case you proposed is fine. I know some people will argue that
it is expanding the scope of tempest to include more whitebox like testing,
because the notification are an internal side-effect of the api call, but I
don't see it that way. It feels more like exactly what tempest is there to
enable testing, a cross-project interaction using the api.

I'm pretty sure that most of the concerns around tests like this were from the
gate maintenance and debug side of things. In other words when things go wrong
how impossible will it be to debug that a notification wasn't generated or not
counted? Right now I think it would be pretty difficult to debug a notification
test failure, which is where the problem is. While I think testing like this is
definitely valid, that doesn't mean we should rush in a bunch of sloppy tests
that are impossible to debug, because that'll just make everyone sad panda.

But, they're is also a slight misunderstanding here. Having a feature be
externally discoverable isn't a hard requirement for a config option in tempest,
it's just strongly recommended. Mostly, because if there isn't a way to
discover it how are end users expected to know what will work.

For this specific case I think it's definitely fair to have an option for which
notifications services are expected to be generated. That's something that is
definitely a configurable option when setting up a deployment, and is something
that feels like a valid tempest config option, so we know which tests will work.
We already have similar feature flags for config time options in the services,
and having options like that would also get you out of that backport mess you
have right now. However, it does raise the question of being an end user how am
I expected to know which notifications get counted? Which is why having the
feature discoverability is generally a really good idea.

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

I think the proposal here was for people interested in doing whitebox testing,
where there is a desire to test an internal project mechanism. I could see the
argument for testing notifications this way, but that would have to be for every
project individually. There are already several projects that have functional
testing like this in tree and run them as a gating job. There are definitely
certain classes of testing where doing this makes sense.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

Yeah, I'm not a fan of this approach at all. It is just a bad way of
reimplementing a temporally-aware tempest. But, instead of using branches we
have arbitrary service versions. It sacrifices all the real advantages of having
branchless tempest, but adds more complexity around the version discovery to all
the projects and tempest. If we want to revert back to a temporally aware
tempest, which I don't think we should, then going back the branched model is
what we should do.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

So I don't think we're anywhere near this. I think what we're hitting here is
more a matter of projects trying to map out exactly how to test things for real
in the gate with tempest. While at the same time coming to understand that
things don't quite work as well as we expected. I think that we have to remember
that this is the first cycle with branchless tempest it's still new for everyone
and what we're hitting here are just some of the growing pains around it. Having
discussions like this and mapping out the requirements more completely is th

I recognize that for projects that didn't have any real testing before we
started branchless tempest it's harder to get things going with it. Especially
because in my experience the adage "if it isn't tested it's broken" tends to
hold true. So I expect there will be a lot of non-backportable fixes just to
enable testing. What this friction with branchless tempest is showing us is that
these fixes, besides fixing the bug, will also have implications for people
using OpenStack clouds. Which I feel is invaluable information to collect, and
definitely something we should gate on. The open question is how do we make it
easier the to enable testing for new things.

-Matt Treinish
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL:

responded Jul 9, 2014 by Matthew_Treinish (11,200 points)   2 5 6
0 votes

I think we need to actually step back a little and figure out where we
are, how we got here, and what the future of validation might need to
look like in OpenStack. Because I think there has been some
communication gaps. (Also, for people I've had vigorous conversations
about this before, realize my positions have changed somewhat,
especially on separation of concerns.)

(Also note, this is all mental stream right now, so I will not pretend
that it's an entirely coherent view of the world, my hope in getting
things down is we can come up with that coherent view of the wold together.)

== Basic History ==

In the essex time frame Tempest was 70 tests. It was basically a barely
adequate sniff test for integration for OpenStack. So much so that our
first 3rd Party CI system, SmokeStack, used it's own test suite, which
legitimately found completely different bugs than Tempest. Not
surprising, Tempest was a really small number of integration tests.

As we got to Grizzly Tempest had grown to 1300 tests, somewhat
organically. People were throwing a mix of tests into the fold, some
using Tempest's client, some using official clients, some trying to hit
the database doing white box testing. It had become kind of a mess and a
rorshack test. We had some really weird design summit sessions because
many people had only looked at a piece of Tempest, and assumed the rest
was like it.

So we spent some time defining scope. Tempest couldn't really be
everything to everyone. It would be a few things:
* API testing for public APIs with a contract
* Some throughput integration scenarios to test some common flows
(these were expected to be small in number)
* 3rd Party API testing (because it had existed previously)

But importantly, Tempest isn't a generic function test suite. Focus is
important, because Tempests mission always was highly aligned with what
eventually became called Defcore. Some way to validate some
compatibility between clouds. Be that clouds built from upstream (is the
cloud of 5 patches ago compatible with the cloud right now), clouds from
different vendors, public clouds vs. private clouds, etc.

== The Current Validation Environment ==

Today most OpenStack projects have 2 levels of validation. Unit tests &
Tempest. That's sort of like saying your house has a basement and a
roof. For sufficiently small values of house, this is fine. I don't
think our house is sufficiently small any more.

This has caused things like Neutron's unit tests, which actually bring
up a full wsgi functional stack and test plugins through http calls
through the entire wsgi stack, replicated 17 times. It's the reason that
Neutron unit tests takes many GB of memory to run, and often run longer
than Tempest runs. (Maru has been doing hero's work to fix much of this.)

In the last year we made it really easy to get a devstack node of your
own, configured any way you want, to do any project level validation you
like. Swift uses it to drive their own functional testing. Neutron is
working on heading down this path.

== New Challenges with New Projects ==

When we started down this path all projects had user APIs. So all
projects were something we could think about from a tenant usage
environment. Looking at both Ironic and Ceilometer, we really have
projects that are Admin API only.

== Contracts or lack thereof ==

I think this is where we start to overlap with Eoghan's thread most.
Because branchless Tempest assumes that the test in Tempest are governed
by a stable contract. The behavior should only change based on API
version, not on day of the week. In the case that triggered this what
was really being tested was not an API, but the existence of a meter
that only showed up in Juno.

Ceilometer is also another great instance of something that's often in a
state of huge amounts of stack tracing because it depends on some
internals interface in a project which isn't a contract. Or notification
formats, which aren't (largely) versioned.

Ironic has a Nova driver in their tree, which implements the Nova driver
internals interface. Which means they depend on something that's not a
contract. It gets broken a lot.

== Depth of reach of a test suite ==

Tempest can only reach so far into a stack given that it's levers are
basically public API calls. That's ok. But it means that things like
testing a bunch of different dbs in the gate (i.e. the postgresql job)
are pretty ineffectual. Trying to exercise code 4 levels deep through
API calls is like driving a rover on Mars. You can do it, but only very
carefully.

== Replication ==

Because there is such a huge gap between unit tests, and Tempest tests,
replication of issues is often challenging. We have the ability to see
races in the gate due to volume of results, that don't show up for
developers very easily. When you do 30k runs a week, a ton of data falls
out of it.

A good instance is the live snapshot bug. It was failing on about 3% of
Tempest runs, which means that it had about a 10% chance of killing a
patch on it's own. So it's definitely real. It's real enough that if we
enable that path, there are a ton of extra rechecks required by people.
However it's at a frequency that reproducing on demand is hard. And
reproducing with enough signal to make it debuggable is also hard.

== The Fail Pit ==

All of which has somewhat led us to the fail pit. Where keeping
OpenStack in a state that it can actually pass Tempest consistently is a
full time job. It's actually more than a full time job, it's a full time
program. If it was it's own program it would probably be larger than 1/2
the official programs in OpenStack.

Also, when the Gate "program" is understaffed, it means that all the
rest of the OpenStack programs (possibly excepting infra and tripleo
because they aren't in the integrated gate) are slowed down
dramatically. That velocity loss has real community and people power
implications.

This is especially true of people trying to get time, review, mentoring,
otherwise, out of the QA team. As there is kind of a natural overlap
with folks that actually want us to be able to merge code, so while the
Gate is under water, getting help on Tempest issues isn't going to
happen in any really responsive rate.

Also, all the folks that have been the work horses here, myself, joe
gordon, matt treinish, matt riedemann, are pretty burnt out on this.
Every time we seem to nail one issue, 3 more crop up. Having no ending
in sight and spending all your time shoveling out other project bugs is
not a happy place to be.

== New Thinking about our validation layers ==

I feel like an ideal world would be the following:

  1. all projects have unit tests for their own internal testing, and
    these pass 100% of the time (note, most projects have races in their
    unit tests, and they don't pass 100% of the time. And they are low
    priority to fix).
  2. all projects have a functional devstack job with tests in their own
    tree
    that pokes their project in interesting ways. This is akin to what
    neutron is trying and what swift is doing. These are not cogating.
  3. all non public API contracts are shored up by landing contract tests
    in projects. We did this recently with Ironic in Nova -
    https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.

  4. all public API contracts are tested in Tempest (these are co-gating,
    and ensure a contract breakage in keystone doesn't break swift).

Out of these 4 levels, we currently have 2 (1 and 4). In some projects
we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
sometimes 2. And the problem with this is it's actually pretty wasteful,
and when things fail, they fail so far away from the test, that the
reproduce is hard.

I actually think that if we went down this path we could actually make
Tempest smaller. For instance, negative API testing is something I'd say
is really #2. While these tests don't take a ton of time, they do add a
certain amount of complexity. It might also mean that admin tests, whose
side effects are hard to understand sometimes without white/greybox
interactions might migrated into #2.

I also think that #3 would help expose much more surgically what the
cross project pain points are instead of proxy efforts through Tempest
for these subtle issues. Because Tempest is probably a terrible tool to
discover that notifications in nova changed. The results is some weird
failure in a ceilometer test which says some instance didn't run when it
was expected, then you have to dig through 5 different openstack logs to
figure out that it was really a deep exception somewhere. If it was
logged, which it often isn't. (I actually challenge anyone to figure out
the reason for a ceilometer failure from a Tempest test based on it's
current logging. :) )

And by ensuring specific functionality earlier in the stack, and letting
Nova beat up Nova the way they think they should in a functional test
(or land a Neutron functional test to ensure that it's doing the right
thing), would make the Tempests runs which were cogating, a ton more
predictable.

== Back to Branchless Tempest ==

I think the real issues that projects are running into with Branchless
Tempest is they are coming forward with tests not in class #4, which
fail, because while the same API existed 4 months ago as today, the
semantics of the project have changed in a non discoverable way. Which
I'd say was bad, however until we tried the radical idea of running the
API test suite against all releases that declared they had the same API,
we didn't see it. :)

Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
that I don't consider this all fully formed, but it's a lot of what's
been rattling around in my brain.

-Sean

On 07/09/2014 05:41 AM, Eoghan Glynn wrote:

TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
makes it difficult to test new features not discoverable via APIs

Folks,

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

The issues, as I see it, are:

  • Tempest is expected to do double-duty as both the integration testing
    harness for upstream CI and as a tool for externally probing capabilities
    in public clouds

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

  • We don't have another integration test harness other than Tempest
    that we could use to host tests that don't just make assertions
    about the correctness/presence of versioned APIs

  • We want to be able to add new features to Juno, or fix bugs of
    omission, in ways that aren't necessarily discoverable in the API;
    without backporting these patches to stable if we wouldn't have
    done so under the normal stable-maint policy[3]

  • Integrated projects are required[4] to provide Tempest coverage,
    so the rate of addition of tests to Tempest is unlikely to slow
    down anytime soon

So the specific type of test that I have in mind would be common
for Ceilometer, but also possibly for Ironic and others:

  1. an end-user initiates some action via an API
    (e.g. calls the cinder snapshot API)

  2. this initiates some actions behind the scenes
    (e.g. a volume is snapshot'd and a notification emitted)

  3. the test reasons over some expected side-effect
    (e.g. some metering data shows up in ceilometer)

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

Any other ideas, or preferences among the options laid out above?

Cheers,
Eoghan

[1] http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-08-21.03.html
[2] https://review.openstack.org/104863
[3] https://wiki.openstack.org/wiki/StableBranch#Appropriate_Fixes
[4] https://github.com/openstack/governance/blob/master/reference/incubation-integration-requirements.rst#qa-1
[5] https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst#scenario-1-new-tests-for-new-features


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL:

responded Jul 9, 2014 by Sean_Dague (66,200 points)   4 12 23
0 votes

Thanks for the response Matt, some comments inline.

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

So, [2] is definitely not something that should be backported.

Agreed. It was the avoidance of such forced backports that motivated
the thread.

But, doesn't it mean that cinder snapshot notifications don't work
at all in icehouse?

The snapshot notifications work in the sense that cinder emits them
at the appropriate points in time. What was missing in Icehouse is
that ceilometer didn't consume those notifications and translate to
metering data.

Is this reflected in the release notes or docs somewhere

Yeah it should be clear from the list of meters in the icehouse docs:

https://github.com/openstack/ceilometer/blob/stable/icehouse/doc/source/measurements.rst#volume-cinder

versus the Juno version:

https://github.com/openstack/ceilometer/blob/master/doc/source/measurements.rst#volume-cinder

because it seems like something that would be expected to work, which,
I think, is actually a bigger bug being exposed by branchless tempest.

The bigger bug being the lack of ceilometer support for consuming this
notification, or the lack of discoverability for that feature?

As a user how do I know that with the cloud I'm using whether cinder
snapshot notifications are supported?

If you depend on this as a front-end user, then you'd have to read
the documentation listing the meters being gathered.

But is this something that a front-end cloud user would actually be
directly concerned about?

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

I think this is the bigger issue. If there is cross-service communication it
should have an API contract. (and probably be directly tested too) It doesn't
necessarily have to be a REST API, although in most cases that's easier. This
is probably something for the TC to discuss/mandate, though.

As I said at the PTLs meeting yesterday, I think we need to be wary
of the temptation to bend the problem-space to fit the solution.

Short of the polling load imposed by ceilometer significantly increasing,
in reality we will have to continue to depend on notifications as one
of the main ways of detecting "phase-shifts" in resource state.

Note that the notifications that capture these resource state transitions
are a long-standing mechanism in openstack that ceilometer has depended
on from the very outset. I don't think it's realistic to envisage these
interactions will be replaced by REST APIs any time soon.

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

I think the test case you proposed is fine. I know some people will
argue that it is expanding the scope of tempest to include more
whitebox like testing, because the notification are an internal
side-effect of the api call, but I don't see it that way. It feels
more like exactly what tempest is there to enable testing, a
cross-project interaction using the api.

In my example, APIs are only used to initiate the action in cinder
and then to check the metering data in ceilometer.

But the middle-piece, i.e. the interaction between cinder & ceilometer,
is not mediated by an API. Rather, its carried via an unversioned
notification.

I'm pretty sure that most of the concerns around tests like this
were from the gate maintenance and debug side of things. In other
words when things go wrong how impossible will it be to debug that a
notification wasn't generated or not counted? Right now I think it
would be pretty difficult to debug a notification test failure,
which is where the problem is. While I think testing like this is
definitely valid, that doesn't mean we should rush in a bunch of
sloppy tests that are impossible to debug, because that'll just make
everyone sad panda.

It's a fair point that cross-service diagnosis is not necessarily easy,
especially as there's pressure to reduce the volume of debug logging
emitted. But notification-driven metering is an important part of what
ceilometer does, so we need to figure out some way of integration-testing
it, IMO.

But, they're is also a slight misunderstanding here. Having a
feature be externally discoverable isn't a hard requirement for a
config option in tempest, it's just strongly recommended. Mostly,
because if there isn't a way to discover it how are end users
expected to know what will work.

A-ha, I missed the subtle distinction there and thought that this
discoverability was a strict requirement. So how bad a citizen would
a project be considered to be if it chose not to meet that strong
recommendation?

For this specific case I think it's definitely fair to have an
option for which notifications services are expected to be
generated. That's something that is definitely a configurable option
when setting up a deployment, and is something that feels like a
valid tempest config option, so we know which tests will work. We
already have similar feature flags for config time options in the
services, and having options like that would also get you out of
that backport mess you have right now.

So would this test configuration option would have a semantic like:

"a wildcarded list of notification event types that ceilometer consumes"

then tests could be skipped on the basis of the notifications that
they depend on being unavailable, in the manner of say:

@testtools.skipUnless(
matchesAll(CONF.telemetryconsumednotifications.volume,
['snapshot.exists',
'snapshot.create.',
'snapshot.delete.
',
'snapshot.resize.*',]
)
)
@test.services('volume')
def test_check_volume_notification(self):
...

Is something of that ilk what you envisaged above?

However, it does raise the question of being an end user how am I
expected to know which notifications get counted? Which is why having
the feature discoverability is generally a really good idea.

So certain things we could potentially make discoverable through the
ceilometer capabilities API. But there's a limit to how fine-grained
we can make that. Also it was primarily intended to surface lack of
feature-parity in the storage driver layer (e.g. one driver supports
sdtdev, but another doesn't) as opposed to the notification-handling
layer.

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

I think the proposal here was for people interested in doing
whitebox testing, where there is a desire to test an internal
project mechanism. I could see the argument for testing
notifications this way, but that would have to be for every project
individually. There are already several projects that have
functional testing like this in tree and run them as a gating
job. There are definitely certain classes of testing where doing
this makes sense.

I'm not sure that this would be realistic to test individually (if by
that you meant just with the ceilometer agents running alone) as it
depends on a notification emitted from cinder.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

Yeah, I'm not a fan of this approach at all. It is just a bad way of
reimplementing a temporally-aware tempest. But, instead of using
branches we have arbitrary service versions. It sacrifices all the
real advantages of having branchless tempest, but adds more
complexity around the version discovery to all the projects and
tempest. If we want to revert back to a temporally aware tempest,
which I don't think we should, then going back the branched model is
what we should do.

Fair point about it giving us the worst of both worlds. Yeap, scratch
that suggestion.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

So I don't think we're anywhere near this.

Agreed.

I think what we're
hitting here is more a matter of projects trying to map out exactly
how to test things for real in the gate with tempest. While at the
same time coming to understand that things don't quite work as well
as we expected. I think that we have to remember that this is the
first cycle with branchless tempest it's still new for everyone and
what we're hitting here are just some of the growing pains around
it. Having discussions like this and mapping out the requirements
more completely is th

Yep, we're all learning and beginning to see the for-real implications
of branchless Tempest.

I recognize that for projects that didn't have any real testing
before we started branchless tempest it's harder to get things going
with it. Especially because in my experience the adage "if it isn't
tested it's broken" tends to hold true. So I expect there will be a
lot of non-backportable fixes just to enable testing. What this
friction with branchless tempest is showing us is that these fixes,
besides fixing the bug, will also have implications for people using
OpenStack clouds. Which I feel is invaluable information to collect,
and definitely something we should gate on. The open question is how
do we make it easier the to enable testing for new things.

Yes, ceilometer unfortunately falls somewhat into that category of not
having much pre-existing Tempest coverage. We had a lot of Tempest tests
proposed during Icehouse, but much of it stalled around performance
issues in our sql-alchemy driver.

Thanks in any case for the feedback.

Cheers,
Eoghan

responded Jul 9, 2014 by Eoghan_Glynn (7,800 points)   1 3 3
0 votes

I think we need to actually step back a little and figure out where we
are, how we got here, and what the future of validation might need to
look like in OpenStack. Because I think there has been some
communication gaps. (Also, for people I've had vigorous conversations
about this before, realize my positions have changed somewhat,
especially on separation of concerns.)

(Also note, this is all mental stream right now, so I will not pretend
that it's an entirely coherent view of the world, my hope in getting
things down is we can come up with that coherent view of the wold together.)

== Basic History ==

In the essex time frame Tempest was 70 tests. It was basically a barely
adequate sniff test for integration for OpenStack. So much so that our
first 3rd Party CI system, SmokeStack, used it's own test suite, which
legitimately found completely different bugs than Tempest. Not
surprising, Tempest was a really small number of integration tests.

As we got to Grizzly Tempest had grown to 1300 tests, somewhat
organically. People were throwing a mix of tests into the fold, some
using Tempest's client, some using official clients, some trying to hit
the database doing white box testing. It had become kind of a mess and a
rorshack test. We had some really weird design summit sessions because
many people had only looked at a piece of Tempest, and assumed the rest
was like it.

So we spent some time defining scope. Tempest couldn't really be
everything to everyone. It would be a few things:
* API testing for public APIs with a contract
* Some throughput integration scenarios to test some common flows
(these were expected to be small in number)
* 3rd Party API testing (because it had existed previously)

But importantly, Tempest isn't a generic function test suite. Focus is
important, because Tempests mission always was highly aligned with what
eventually became called Defcore. Some way to validate some
compatibility between clouds. Be that clouds built from upstream (is the
cloud of 5 patches ago compatible with the cloud right now), clouds from
different vendors, public clouds vs. private clouds, etc.

== The Current Validation Environment ==

Today most OpenStack projects have 2 levels of validation. Unit tests &
Tempest. That's sort of like saying your house has a basement and a
roof. For sufficiently small values of house, this is fine. I don't
think our house is sufficiently small any more.

This has caused things like Neutron's unit tests, which actually bring
up a full wsgi functional stack and test plugins through http calls
through the entire wsgi stack, replicated 17 times. It's the reason that
Neutron unit tests takes many GB of memory to run, and often run longer
than Tempest runs. (Maru has been doing hero's work to fix much of this.)

In the last year we made it really easy to get a devstack node of your
own, configured any way you want, to do any project level validation you
like. Swift uses it to drive their own functional testing. Neutron is
working on heading down this path.

== New Challenges with New Projects ==

When we started down this path all projects had user APIs. So all
projects were something we could think about from a tenant usage
environment. Looking at both Ironic and Ceilometer, we really have
projects that are Admin API only.

== Contracts or lack thereof ==

I think this is where we start to overlap with Eoghan's thread most.
Because branchless Tempest assumes that the test in Tempest are governed
by a stable contract. The behavior should only change based on API
version, not on day of the week. In the case that triggered this what
was really being tested was not an API, but the existence of a meter
that only showed up in Juno.

Ceilometer is also another great instance of something that's often in a
state of huge amounts of stack tracing because it depends on some
internals interface in a project which isn't a contract. Or notification
formats, which aren't (largely) versioned.

Ironic has a Nova driver in their tree, which implements the Nova driver
internals interface. Which means they depend on something that's not a
contract. It gets broken a lot.

== Depth of reach of a test suite ==

Tempest can only reach so far into a stack given that it's levers are
basically public API calls. That's ok. But it means that things like
testing a bunch of different dbs in the gate (i.e. the postgresql job)
are pretty ineffectual. Trying to exercise code 4 levels deep through
API calls is like driving a rover on Mars. You can do it, but only very
carefully.

== Replication ==

Because there is such a huge gap between unit tests, and Tempest tests,
replication of issues is often challenging. We have the ability to see
races in the gate due to volume of results, that don't show up for
developers very easily. When you do 30k runs a week, a ton of data falls
out of it.

A good instance is the live snapshot bug. It was failing on about 3% of
Tempest runs, which means that it had about a 10% chance of killing a
patch on it's own. So it's definitely real. It's real enough that if we
enable that path, there are a ton of extra rechecks required by people.
However it's at a frequency that reproducing on demand is hard. And
reproducing with enough signal to make it debuggable is also hard.

== The Fail Pit ==

All of which has somewhat led us to the fail pit. Where keeping
OpenStack in a state that it can actually pass Tempest consistently is a
full time job. It's actually more than a full time job, it's a full time
program. If it was it's own program it would probably be larger than 1/2
the official programs in OpenStack.

Also, when the Gate "program" is understaffed, it means that all the
rest of the OpenStack programs (possibly excepting infra and tripleo
because they aren't in the integrated gate) are slowed down
dramatically. That velocity loss has real community and people power
implications.

This is especially true of people trying to get time, review, mentoring,
otherwise, out of the QA team. As there is kind of a natural overlap
with folks that actually want us to be able to merge code, so while the
Gate is under water, getting help on Tempest issues isn't going to
happen in any really responsive rate.

Also, all the folks that have been the work horses here, myself, joe
gordon, matt treinish, matt riedemann, are pretty burnt out on this.
Every time we seem to nail one issue, 3 more crop up. Having no ending
in sight and spending all your time shoveling out other project bugs is
not a happy place to be.

== New Thinking about our validation layers ==

I feel like an ideal world would be the following:

  1. all projects have unit tests for their own internal testing, and
    these pass 100% of the time (note, most projects have races in their
    unit tests, and they don't pass 100% of the time. And they are low
    priority to fix).
  2. all projects have a functional devstack job with tests in their own
    tree
    that pokes their project in interesting ways. This is akin to what
    neutron is trying and what swift is doing. These are not cogating.
  3. all non public API contracts are shored up by landing contract tests
    in projects. We did this recently with Ironic in Nova -
    https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.

  4. all public API contracts are tested in Tempest (these are co-gating,
    and ensure a contract breakage in keystone doesn't break swift).

Out of these 4 levels, we currently have 2 (1 and 4). In some projects
we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
sometimes 2. And the problem with this is it's actually pretty wasteful,
and when things fail, they fail so far away from the test, that the
reproduce is hard.

I actually think that if we went down this path we could actually make
Tempest smaller. For instance, negative API testing is something I'd say
is really #2. While these tests don't take a ton of time, they do add a
certain amount of complexity. It might also mean that admin tests, whose
side effects are hard to understand sometimes without white/greybox
interactions might migrated into #2.

I also think that #3 would help expose much more surgically what the
cross project pain points are instead of proxy efforts through Tempest
for these subtle issues. Because Tempest is probably a terrible tool to
discover that notifications in nova changed. The results is some weird
failure in a ceilometer test which says some instance didn't run when it
was expected, then you have to dig through 5 different openstack logs to
figure out that it was really a deep exception somewhere. If it was
logged, which it often isn't. (I actually challenge anyone to figure out
the reason for a ceilometer failure from a Tempest test based on it's
current logging. :) )

And by ensuring specific functionality earlier in the stack, and letting
Nova beat up Nova the way they think they should in a functional test
(or land a Neutron functional test to ensure that it's doing the right
thing), would make the Tempests runs which were cogating, a ton more
predictable.

== Back to Branchless Tempest ==

I think the real issues that projects are running into with Branchless
Tempest is they are coming forward with tests not in class #4, which
fail, because while the same API existed 4 months ago as today, the
semantics of the project have changed in a non discoverable way. Which
I'd say was bad, however until we tried the radical idea of running the
API test suite against all releases that declared they had the same API,
we didn't see it. :)

Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
that I don't consider this all fully formed, but it's a lot of what's
been rattling around in my brain.

Thanks for the very detailed response Sean.

There's a lot in there, some of it background, some of it more focussed
on the what-next question.

I'll need to take a bit of time to digest all of that, and also discuss
at the weekly ceilometer meeting tomorrow. I'll circle back with a more
complete response after that.

Cheers,
Eoghan

-Sean

On 07/09/2014 05:41 AM, Eoghan Glynn wrote:

TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
makes it difficult to test new features not discoverable via APIs

Folks,

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

The issues, as I see it, are:

  • Tempest is expected to do double-duty as both the integration testing
    harness for upstream CI and as a tool for externally probing
    capabilities
    in public clouds

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

  • We don't have another integration test harness other than Tempest
    that we could use to host tests that don't just make assertions
    about the correctness/presence of versioned APIs

  • We want to be able to add new features to Juno, or fix bugs of
    omission, in ways that aren't necessarily discoverable in the API;
    without backporting these patches to stable if we wouldn't have
    done so under the normal stable-maint policy[3]

  • Integrated projects are required[4] to provide Tempest coverage,
    so the rate of addition of tests to Tempest is unlikely to slow
    down anytime soon

So the specific type of test that I have in mind would be common
for Ceilometer, but also possibly for Ironic and others:

  1. an end-user initiates some action via an API
    (e.g. calls the cinder snapshot API)

  2. this initiates some actions behind the scenes
    (e.g. a volume is snapshot'd and a notification emitted)

  3. the test reasons over some expected side-effect
    (e.g. some metering data shows up in ceilometer)

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

Any other ideas, or preferences among the options laid out above?

Cheers,
Eoghan

[1]
http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-08-21.03.html
[2] https://review.openstack.org/104863
[3] https://wiki.openstack.org/wiki/StableBranch#Appropriate_Fixes
[4]
https://github.com/openstack/governance/blob/master/reference/incubation-integration-requirements.rst#qa-1
[5]
https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst#scenario-1-new-tests-for-new-features


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Sean Dague
http://dague.net


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jul 9, 2014 by Eoghan_Glynn (7,800 points)   1 3 3
0 votes

On Wed, Jul 09, 2014 at 01:44:33PM -0400, Eoghan Glynn wrote:

Thanks for the response Matt, some comments inline.

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

So, [2] is definitely not something that should be backported.

Agreed. It was the avoidance of such forced backports that motivated
the thread.

But, doesn't it mean that cinder snapshot notifications don't work
at all in icehouse?

The snapshot notifications work in the sense that cinder emits them
at the appropriate points in time. What was missing in Icehouse is
that ceilometer didn't consume those notifications and translate to
metering data.

Yeah that's what I meant, sorry it wasn't worded clearly.

Is this reflected in the release notes or docs somewhere

Yeah it should be clear from the list of meters in the icehouse docs:

https://github.com/openstack/ceilometer/blob/stable/icehouse/doc/source/measurements.rst#volume-cinder

versus the Juno version:

https://github.com/openstack/ceilometer/blob/master/doc/source/measurements.rst#volume-cinder

because it seems like something that would be expected to work, which,
I think, is actually a bigger bug being exposed by branchless tempest.

The bigger bug being the lack of ceilometer support for consuming this
notification, or the lack of discoverability for that feature?

Well if it's in the docs as a limitation for icehouse, then its less severe
but it's still something on the discoverability side I guess. I think my lack
of experience with ceilometer is showing here.

As a user how do I know that with the cloud I'm using whether cinder
snapshot notifications are supported?

If you depend on this as a front-end user, then you'd have to read
the documentation listing the meters being gathered.

But is this something that a front-end cloud user would actually be
directly concerned about?

I'm not sure, probably not, but if it's exposed on the public api I think it's
totally fair to expect that someone will be depending on it.

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

I think this is the bigger issue. If there is cross-service communication it
should have an API contract. (and probably be directly tested too) It doesn't
necessarily have to be a REST API, although in most cases that's easier. This
is probably something for the TC to discuss/mandate, though.

As I said at the PTLs meeting yesterday, I think we need to be wary
of the temptation to bend the problem-space to fit the solution.

Short of the polling load imposed by ceilometer significantly increasing,
in reality we will have to continue to depend on notifications as one
of the main ways of detecting "phase-shifts" in resource state.

No, I agree notifications make a lot of sense, the load from frequent polling is
too high.

Note that the notifications that capture these resource state transitions
are a long-standing mechanism in openstack that ceilometer has depended
on from the very outset. I don't think it's realistic to envisage these
interactions will be replaced by REST APIs any time soon.

I wasn't advocating doing everything over a REST API. (API is an overloaded term)
I just meant that if we're depending on notifications for communication between
projects then we should enforce a stability contract on them. Similar to what we
already have with the API stability guidelines for the REST APIs. The fact that
there is no direct enforcement on notifications, either through social policy or
testing, is what I was taking issue with.

I also think if we decide to have a policy of enforcing notification stability
then we should directly test the notifications from an external repo to block
slips. But, that's a discussion for later, if at all.

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

I think the test case you proposed is fine. I know some people will
argue that it is expanding the scope of tempest to include more
whitebox like testing, because the notification are an internal
side-effect of the api call, but I don't see it that way. It feels
more like exactly what tempest is there to enable testing, a
cross-project interaction using the api.

In my example, APIs are only used to initiate the action in cinder
and then to check the metering data in ceilometer.

But the middle-piece, i.e. the interaction between cinder & ceilometer,
is not mediated by an API. Rather, its carried via an unversioned
notification.

Yeah, exactly, that's why I feel it's a valid Tempest test case.

What I was referring to as the counter argument, and where the difference of
opinion was, is that the test will be making REST API calls to both trigger a
nominally internal mechanism (the notification) from the services and then using
the ceilometer api to validate the notification worked. But, it's arguably the
real intent of these tests is to validate that internal mechanism, which is
basically a whitebox test. The argument was that by testing it in tempes we're
testing notifications poorly because of it's black box limitation notifications
will just be tested indirectly. Which I feel is a valid point, but not a
sufficient reason to exclude the notification tests from tempest.

I think the best way to move forward is to have functional whitebox tests for
the notifications as part of the individual projects generating them, and that
way we can direct validation of the notification. But, I also feel there should
be tempest tests on top of that that verify the ceilometer side of consuming the
notification and the api exposing that information.

I'm pretty sure that most of the concerns around tests like this
were from the gate maintenance and debug side of things. In other
words when things go wrong how impossible will it be to debug that a
notification wasn't generated or not counted? Right now I think it
would be pretty difficult to debug a notification test failure,
which is where the problem is. While I think testing like this is
definitely valid, that doesn't mean we should rush in a bunch of
sloppy tests that are impossible to debug, because that'll just make
everyone sad panda.

It's a fair point that cross-service diagnosis is not necessarily easy,
especially as there's pressure to reduce the volume of debug logging
emitted. But notification-driven metering is an important part of what
ceilometer does, so we need to figure out some way of integration-testing
it, IMO.

I actually think Sean's proposal in his response to the OP addresses some
alternative approaches.

But, they're is also a slight misunderstanding here. Having a
feature be externally discoverable isn't a hard requirement for a
config option in tempest, it's just strongly recommended. Mostly,
because if there isn't a way to discover it how are end users
expected to know what will work.

A-ha, I missed the subtle distinction there and thought that this
discoverability was a strict requirement. So how bad a citizen would
a project be considered to be if it chose not to meet that strong
recommendation?

You'd be far from the only ones who are doing that, for an existing example
look at anything on the nova driver feature matrix. Most of those aren't
discoverable from the API. So I think it would be ok to do that, but when we
have efforts like:

https://review.openstack.org/#/c/94473/

it'll make that more difficult. Which is why I think having discoverability
through the API is important. (it's the same public cloud question)

For this specific case I think it's definitely fair to have an
option for which notifications services are expected to be
generated. That's something that is definitely a configurable option
when setting up a deployment, and is something that feels like a
valid tempest config option, so we know which tests will work. We
already have similar feature flags for config time options in the
services, and having options like that would also get you out of
that backport mess you have right now.

So would this test configuration option would have a semantic like:

"a wildcarded list of notification event types that ceilometer consumes"

then tests could be skipped on the basis of the notifications that
they depend on being unavailable, in the manner of say:

@testtools.skipUnless(
matchesAll(CONF.telemetryconsumednotifications.volume,
['snapshot.exists',
'snapshot.create.',
'snapshot.delete.
',
'snapshot.resize.*',]
)
)
@test.services('volume')
def test_check_volume_notification(self):
i> ...

Is something of that ilk what you envisaged above?

Yeah that was my idea more or less, but I'd probably move the logic into a separate
decorator to make a bit cleaner. Like:

@test.consumed_notifications('volumes', 'snapshot.exists', 'snapshot.create.*',
'snapshot.delete.*', 'snapshot.resize.*')

and you can just double them up if the test requires notifications from other
services.

However, it does raise the question of being an end user how am I
expected to know which notifications get counted? Which is why having
the feature discoverability is generally a really good idea.

So certain things we could potentially make discoverable through the
ceilometer capabilities API. But there's a limit to how fine-grained
we can make that. Also it was primarily intended to surface lack of
feature-parity in the storage driver layer (e.g. one driver supports
sdtdev, but another doesn't) as opposed to the notification-handling
layer.

I think we should probably decouple the discoverability question from the
discussion about testing, because to a certain degree it's a separate problem.
It might be worth splitting the API discoverability discussion off as a separate
thread so we don't cloud either topic. (pun intended :) )

I guess it really depends on how we expect people to be consuming the API, and
whether we're presenting that expectation to people. I'll have to admit I'm not
as familiar with ceilometer as I should be, which is probably where some of my
confusion is coming from.

The naive real world example I have in my head is: someone wants to build a tool
that will generate pretty pictures from data queried from the ceilometer API,
and if they want to run this on 2+ distinct OpenStack deployments how will they
know only through the API which notifications are getting counted, etc. If it's
doable through the API then the tool can adjust what gets generated based on an
API call, (either up front or with an expected error response, like unsupported
request. (of course a new response like unsupported would probably mean a new
API rev) Without the discoverability manual intervention would be required each
time it's used with a new cloud. This probably isn't a valid use case and I'm
missing something critical here. In which case the discoverability discussion is
moot.

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

I think the proposal here was for people interested in doing
whitebox testing, where there is a desire to test an internal
project mechanism. I could see the argument for testing
notifications this way, but that would have to be for every project
individually. There are already several projects that have
functional testing like this in tree and run them as a gating
job. There are definitely certain classes of testing where doing
this makes sense.

I'm not sure that this would be realistic to test individually (if by
that you meant just with the ceilometer agents running alone) as it
depends on a notification emitted from cinder.

I wasn't actually referring to use it for this specific case, just a general
thought. Although maybe it would be useful to add to add notification functional
tests to Cinder. I think Sean's response on the OP outlined some interesting
alternative testing strategies which could would definitely fit this use case. I
still need to finish forming my opinions about that before I respond to it.

I think what we're
hitting here is more a matter of projects trying to map out exactly
how to test things for real in the gate with tempest. While at the
same time coming to understand that things don't quite work as well
as we expected. I think that we have to remember that this is the
first cycle with branchless tempest it's still new for everyone and
what we're hitting here are just some of the growing pains around
it. Having discussions like this and mapping out the requirements
more completely is th

Yep, we're all learning and beginning to see the for-real implications
of branchless Tempest.

Heh, I guess I forgot to finish that sentence. :)

I recognize that for projects that didn't have any real testing
before we started branchless tempest it's harder to get things going
with it. Especially because in my experience the adage "if it isn't
tested it's broken" tends to hold true. So I expect there will be a
lot of non-backportable fixes just to enable testing. What this
friction with branchless tempest is showing us is that these fixes,
besides fixing the bug, will also have implications for people using
OpenStack clouds. Which I feel is invaluable information to collect,
and definitely something we should gate on. The open question is how
do we make it easier the to enable testing for new things.

Yes, ceilometer unfortunately falls somewhat into that category of not
having much pre-existing Tempest coverage. We had a lot of Tempest tests
proposed during Icehouse, but much of it stalled around performance
issues in our sql-alchemy driver.

Yeah that's what I was referring to. The fact that all the ramp up is happening
here means you guys are first to hit a lot of these issues first.

So I wrote this in one pass, sorry if bits of it are incoherent. :)

-Matt Treinish
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL:

responded Jul 9, 2014 by Matthew_Treinish (11,200 points)   2 5 6
0 votes

Note that the notifications that capture these resource state transitions
are a long-standing mechanism in openstack that ceilometer has depended
on from the very outset. I don't think it's realistic to envisage these
interactions will be replaced by REST APIs any time soon.

I wasn't advocating doing everything over a REST API. (API is an
overloaded term) I just meant that if we're depending on
notifications for communication between projects then we should
enforce a stability contract on them. Similar to what we already
have with the API stability guidelines for the REST APIs. The fact
that there is no direct enforcement on notifications, either through
social policy or testing, is what I was taking issue with.

I also think if we decide to have a policy of enforcing notification
stability then we should directly test the notifications from an
external repo to block slips. But, that's a discussion for later, if
at all.

A-ha, OK, got it.

I've discussed enforcing such stability with jogo on IRC last night, and
kicked off a separate thread to capture that:

http://lists.openstack.org/pipermail/openstack-dev/2014-July/039858.html

However the time-horizon for that effort would be quite a bit into the
future, compared to the test coverage that we're aiming to have in place
for juno-2.

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

I think the test case you proposed is fine. I know some people will
argue that it is expanding the scope of tempest to include more
whitebox like testing, because the notification are an internal
side-effect of the api call, but I don't see it that way. It feels
more like exactly what tempest is there to enable testing, a
cross-project interaction using the api.

In my example, APIs are only used to initiate the action in cinder
and then to check the metering data in ceilometer.

But the middle-piece, i.e. the interaction between cinder & ceilometer,
is not mediated by an API. Rather, its carried via an unversioned
notification.

Yeah, exactly, that's why I feel it's a valid Tempest test case.

Just to clarify: you meant to type "it's a valid Tempest test case"
as opposed to "it's not a valid Tempest test case", right?

What I was referring to as the counter argument, and where the
difference of opinion was, is that the test will be making REST API
calls to both trigger a nominally internal mechanism (the
notification) from the services and then using the ceilometer api to
validate the notification worked.

Yes, that's exactly the idea.

But, it's arguably the real intent of these tests is to validate
that internal mechanism, which is basically a whitebox test. The
argument was that by testing it in tempes we're testing
notifications poorly because of it's black box limitation
notifications will just be tested indirectly. Which I feel is a
valid point, but not a sufficient reason to exclude the notification
tests from tempest.

Agreed.

I think the best way to move forward is to have functional whitebox
tests for the notifications as part of the individual projects
generating them, and that way we can direct validation of the
notification. But, I also feel there should be tempest tests on top
of that that verify the ceilometer side of consuming the
notification and the api exposing that information.

Excellent. So, indeed, more fullsome coverage of the notification
logic with in-tree tests on the producer side would definitely
to welcome, and could be seen as a "phase zero" of an overall to
fix/impove the notification mechanism.

But, they're is also a slight misunderstanding here. Having a
feature be externally discoverable isn't a hard requirement for a
config option in tempest, it's just strongly recommended. Mostly,
because if there isn't a way to discover it how are end users
expected to know what will work.

A-ha, I missed the subtle distinction there and thought that this
discoverability was a strict requirement. So how bad a citizen would
a project be considered to be if it chose not to meet that strong
recommendation?

You'd be far from the only ones who are doing that, for an existing example
look at anything on the nova driver feature matrix. Most of those aren't
discoverable from the API. So I think it would be ok to do that, but when we
have efforts like:

https://review.openstack.org/#/c/94473/

it'll make that more difficult. Which is why I think having discoverability
through the API is important. (it's the same public cloud question)

So for now, would it suffice for the master versus stable/icehouse
config to be checked-in in static form pending the completion of that
BP on tempest-conf-autogen?

Then the assumption is that this static config is replaced by auto-
generating the tempest config using some project-specific discovery
mechanisms?

For this specific case I think it's definitely fair to have an
option for which notifications services are expected to be
generated. That's something that is definitely a configurable option
when setting up a deployment, and is something that feels like a
valid tempest config option, so we know which tests will work. We
already have similar feature flags for config time options in the
services, and having options like that would also get you out of
that backport mess you have right now.

So would this test configuration option would have a semantic like:

"a wildcarded list of notification event types that ceilometer consumes"

then tests could be skipped on the basis of the notifications that
they depend on being unavailable, in the manner of say:

@testtools.skipUnless(
matchesAll(CONF.telemetryconsumednotifications.volume,
['snapshot.exists',
'snapshot.create.',
'snapshot.delete.
',
'snapshot.resize.*',]
)
)
@test.services('volume')
def test_check_volume_notification(self):
i> ...

Is something of that ilk what you envisaged above?

Yeah that was my idea more or less, but I'd probably move the logic into a
separate
decorator to make a bit cleaner. Like:

@test.consumed_notifications('volumes', 'snapshot.exists',
'snapshot.create.*',
'snapshot.delete.*', 'snapshot.resize.*')

and you can just double them up if the test requires notifications from other
services.

OK, so the remaining thing I wanted to confirm is that it's acceptable
for the skip/no-skip logic of that decorator to be driven by static
(as opposed to discoverable) config?

However, it does raise the question of being an end user how am I
expected to know which notifications get counted? Which is why having
the feature discoverability is generally a really good idea.

So certain things we could potentially make discoverable through the
ceilometer capabilities API. But there's a limit to how fine-grained
we can make that. Also it was primarily intended to surface lack of
feature-parity in the storage driver layer (e.g. one driver supports
sdtdev, but another doesn't) as opposed to the notification-handling
layer.

I think we should probably decouple the discoverability question
from the discussion about testing, because to a certain degree it's
a separate problem. It might be worth splitting the API
discoverability discussion off as a separate thread so we don't
cloud either topic. (pun intended :) )

I'm confused as how we can completely split off the discoverability
question, if its a strong recommendation (but not a strict requirement)
of the testing framework.

I guess it really depends on how we expect people to be consuming
the API, and whether we're presenting that expectation to
people. I'll have to admit I'm not as familiar with ceilometer as I
should be, which is probably where some of my confusion is coming
from.

The intent of capabilities API was initially purely to give users
insight into the storage-driver-specific functionality exposed in
the primary API.

I guess we've been kicking around on the idea of re-purposing that
API, so that it also allows surfacing of the list of meters being
captured.

But that idea hasn't yet been firmed up to the extent that we've
figured out how users might consume it.

The naive real world example I have in my head is: someone wants to
build a tool that will generate pretty pictures from data queried
from the ceilometer API, and if they want to run this on 2+ distinct
OpenStack deployments how will they know only through the API which
notifications are getting counted, etc. If it's doable through the
API then the tool can adjust what gets generated based on an API
call, (either up front or with an expected error response, like
unsupported request. (of course a new response like unsupported
would probably mean a new API rev) Without the discoverability
manual intervention would be required each time it's used with a new
cloud. This probably isn't a valid use case and I'm missing
something critical here. In which case the discoverability
discussion is moot.

That would be a case where discoverability of meters would be useful
(i.e. not just the meters currently in the metering store, also all
the meter that this deployment of ceilometer is capable of gathering
samples for). However that feature will take a bit more thought &
discussion to firm up.

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

I think the proposal here was for people interested in doing
whitebox testing, where there is a desire to test an internal
project mechanism. I could see the argument for testing
notifications this way, but that would have to be for every project
individually. There are already several projects that have
functional testing like this in tree and run them as a gating
job. There are definitely certain classes of testing where doing
this makes sense.

I'm not sure that this would be realistic to test individually (if by
that you meant just with the ceilometer agents running alone) as it
depends on a notification emitted from cinder.

I wasn't actually referring to use it for this specific case, just a
general thought. Although maybe it would be useful to add to add
notification functional tests to Cinder. I think Sean's response on
the OP outlined some interesting alternative testing strategies
which could would definitely fit this use case. I still need to
finish forming my opinions about that before I respond to it.

OK. (And so do I).

So I wrote this in one pass, sorry if bits of it are incoherent. :)

No, it was all good, thanks!

Cheers,
Eoghan

responded Jul 10, 2014 by Eoghan_Glynn (7,800 points)   1 3 3
0 votes

On Wed, Jul 09, 2014 at 09:16:01AM -0400, Sean Dague wrote:
I think we need to actually step back a little and figure out where we
are, how we got here, and what the future of validation might need to
look like in OpenStack. Because I think there has been some
communication gaps. (Also, for people I've had vigorous conversations
about this before, realize my positions have changed somewhat,
especially on separation of concerns.)

(Also note, this is all mental stream right now, so I will not pretend
that it's an entirely coherent view of the world, my hope in getting
things down is we can come up with that coherent view of the wold together.)

== Basic History ==

In the essex time frame Tempest was 70 tests. It was basically a barely
adequate sniff test for integration for OpenStack. So much so that our
first 3rd Party CI system, SmokeStack, used it's own test suite, which
legitimately found completely different bugs than Tempest. Not
surprising, Tempest was a really small number of integration tests.

As we got to Grizzly Tempest had grown to 1300 tests, somewhat
organically. People were throwing a mix of tests into the fold, some
using Tempest's client, some using official clients, some trying to hit
the database doing white box testing. It had become kind of a mess and a
rorshack test. We had some really weird design summit sessions because
many people had only looked at a piece of Tempest, and assumed the rest
was like it.

So we spent some time defining scope. Tempest couldn't really be
everything to everyone. It would be a few things:
* API testing for public APIs with a contract
* Some throughput integration scenarios to test some common flows
(these were expected to be small in number)
* 3rd Party API testing (because it had existed previously)

But importantly, Tempest isn't a generic function test suite. Focus is
important, because Tempests mission always was highly aligned with what
eventually became called Defcore. Some way to validate some
compatibility between clouds. Be that clouds built from upstream (is the
cloud of 5 patches ago compatible with the cloud right now), clouds from
different vendors, public clouds vs. private clouds, etc.

== The Current Validation Environment ==

Today most OpenStack projects have 2 levels of validation. Unit tests &
Tempest. That's sort of like saying your house has a basement and a
roof. For sufficiently small values of house, this is fine. I don't
think our house is sufficiently small any more.

This has caused things like Neutron's unit tests, which actually bring
up a full wsgi functional stack and test plugins through http calls
through the entire wsgi stack, replicated 17 times. It's the reason that
Neutron unit tests takes many GB of memory to run, and often run longer
than Tempest runs. (Maru has been doing hero's work to fix much of this.)

In the last year we made it really easy to get a devstack node of your
own, configured any way you want, to do any project level validation you
like. Swift uses it to drive their own functional testing. Neutron is
working on heading down this path.

== New Challenges with New Projects ==

When we started down this path all projects had user APIs. So all
projects were something we could think about from a tenant usage
environment. Looking at both Ironic and Ceilometer, we really have
projects that are Admin API only.

== Contracts or lack thereof ==

I think this is where we start to overlap with Eoghan's thread most.
Because branchless Tempest assumes that the test in Tempest are governed
by a stable contract. The behavior should only change based on API
version, not on day of the week. In the case that triggered this what
was really being tested was not an API, but the existence of a meter
that only showed up in Juno.

Ceilometer is also another great instance of something that's often in a
state of huge amounts of stack tracing because it depends on some
internals interface in a project which isn't a contract. Or notification
formats, which aren't (largely) versioned.

Ironic has a Nova driver in their tree, which implements the Nova driver
internals interface. Which means they depend on something that's not a
contract. It gets broken a lot.

== Depth of reach of a test suite ==

Tempest can only reach so far into a stack given that it's levers are
basically public API calls. That's ok. But it means that things like
testing a bunch of different dbs in the gate (i.e. the postgresql job)
are pretty ineffectual. Trying to exercise code 4 levels deep through
API calls is like driving a rover on Mars. You can do it, but only very
carefully.

== Replication ==

Because there is such a huge gap between unit tests, and Tempest tests,
replication of issues is often challenging. We have the ability to see
races in the gate due to volume of results, that don't show up for
developers very easily. When you do 30k runs a week, a ton of data falls
out of it.

A good instance is the live snapshot bug. It was failing on about 3% of
Tempest runs, which means that it had about a 10% chance of killing a
patch on it's own. So it's definitely real. It's real enough that if we
enable that path, there are a ton of extra rechecks required by people.
However it's at a frequency that reproducing on demand is hard. And
reproducing with enough signal to make it debuggable is also hard.

== The Fail Pit ==

All of which has somewhat led us to the fail pit. Where keeping
OpenStack in a state that it can actually pass Tempest consistently is a
full time job. It's actually more than a full time job, it's a full time
program. If it was it's own program it would probably be larger than 1/2
the official programs in OpenStack.

Also, when the Gate "program" is understaffed, it means that all the
rest of the OpenStack programs (possibly excepting infra and tripleo
because they aren't in the integrated gate) are slowed down
dramatically. That velocity loss has real community and people power
implications.

This is especially true of people trying to get time, review, mentoring,
otherwise, out of the QA team. As there is kind of a natural overlap
with folks that actually want us to be able to merge code, so while the
Gate is under water, getting help on Tempest issues isn't going to
happen in any really responsive rate.

Also, all the folks that have been the work horses here, myself, joe
gordon, matt treinish, matt riedemann, are pretty burnt out on this.
Every time we seem to nail one issue, 3 more crop up. Having no ending
in sight and spending all your time shoveling out other project bugs is
not a happy place to be.

== New Thinking about our validation layers ==

I feel like an ideal world would be the following:

  1. all projects have unit tests for their own internal testing, and
    these pass 100% of the time (note, most projects have races in their
    unit tests, and they don't pass 100% of the time. And they are low
    priority to fix).
  2. all projects have a functional devstack job with tests in their own
    tree
    that pokes their project in interesting ways. This is akin to what
    neutron is trying and what swift is doing. These are not cogating.

So I'm not sure that this should be a mandatory thing, but an opt-in. My real
concern is the manpower, who is going to take the time to write all the test
suites for all of the projects. I think it would be better to add that on-demand
as the extra testing is required. That being said, I definitely view doing this
as a good thing and something to be encouraged, because tempest won't be able to
test everything.

The other thing to also consider is duplicated effort between projects. For an
example, look at the CLI tests in Tempest, the functional testing framework for
testing CLI formatting was essentially the same between all the clients which is
why they're in tempest. Under your proposal here, CLI tests should be moved back
to the clients. But, would that mean we have a bunch of copy and pasted versions
of the CLI test framework between all the project.

I really want to avoid a situation where every project does the same basic
testing differently just in a rush to spin up functional testing. I think coming
up with a solution for a place with common test patterns and frameworks that can
be maintained independently of all the projects and consumed for project
specific testing is something we should figure out first. (I'm not sure oslo
would be the right place for this necessarily)

  1. all non public API contracts are shored up by landing contract tests
    in projects. We did this recently with Ironic in Nova -
    https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.

So I think that the contract unit tests work well specifically for the ironic
use case, but isn't a general solution. Mostly because the Nova driver api is an
unstable interface and there is no reason for that to change. It's also a
temporary thing because eventually the driver will be moved into Nova and then
the only cross-project interaction between Ironic and Nova will be over the
stable REST APIs.

I think in general we should try to avoid doing non REST API cross-project
communication. So hopefully there won't be more of these class of things, and
if there are we can tackle them on a per case basis. But, even if it's a non
REST API I don't think we should ever encourage or really allow any
cross-project interactions over unstable interfaces.

As a solution for notifications I'd rather see a separate notification
white/grey (or any other monochrome shade) box test suite. If as a project we
say that notifications have to be versioned for any change we can then enforce
that easily with an external test suite that contains the definitions for all
the notification. It then just makes a bunch of api calls and sits on RPC
verifying the notification format. (or something of that ilk)

I agree that normally whitebox testing needs to be tightly coupled with the data
models in the projects, but I feel like notifications are slightly different.
Mostly, because the basic format is the same between all the projects to make
consumption simpler. So instead of duplicating the work to validate the
notifications in all the projects it would be better to just implement it once.
I also think tempest being an external audit on the API has been invaluable so
enforcing that for notifications would have similar benefits.

As an aside I think it would probably be fair if this was maintained as part of
ceilometer or the telemetry program, since that's really all notifications are
used for. (or least as AIUI) But, it would still be a co-gating test suite for
anything that emits notifications.

  1. all public API contracts are tested in Tempest (these are co-gating,
    and ensure a contract breakage in keystone doesn't break swift).

Out of these 4 levels, we currently have 2 (1 and 4). In some projects
we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
sometimes 2. And the problem with this is it's actually pretty wasteful,
and when things fail, they fail so far away from the test, that the
reproduce is hard.

I think the only real issue in your proposal is that the boundaries between all
the test classifications aren't as well defined as they seem. I agree that
having more intermediate classes of testing is definitely a good thing to do.
Especially, since there is a great deal of hand waving on the details between
what is being run in between tempest and unit tests. But, the issue as I see it
is without guidelines on what type of tests belong where we'll end up with a
bunch duplicated work.

It's the same problem we have all the time in tempest, where we get a lot of
patches that exceed the scope of tempest, despite it being arguably clearly
outlined in the developer docs. But, the complexity is higher in this situation,
because of having a bunch of different types of test suites that are available
to add a new test to. I just think before we adopt #2 as mandatory it's
important to have a better definition on the scope of the project specific
functional testing.

I actually think that if we went down this path we could actually make
Tempest smaller. For instance, negative API testing is something I'd say
is really #2. While these tests don't take a ton of time, they do add a
certain amount of complexity. It might also mean that admin tests, whose
side effects are hard to understand sometimes without white/greybox
interactions might migrated into #2.

I think that negative testing is still part of tempest in your proposal. I still
feel that the negative space of an API still is part of the contract, and should
be externally validated. As part of tempest I think we need to revisit the
negative space solution again, because I haven't seen much growth on the
automatic test generation. We also can probably be way more targeted about what
we're running, but I don't think punting on negative testing in tempest is
something we should do.

I actually think that testing on the admin api is doubly important because of
the inadvertent side effects that they can cause. I think attempting to map that
out is useful. (I don't think we can assume that the admin api is being used in
a vacuum) I agree that in your proposal that tests for those weird interactions
might be more fitting for #2. (to avoid more heisenbugs in tempest, etc) But,
I'm on the fence about that. Mostly because I still think an admin api should
conform to the api guidelines and thus needs tempest tests for that. I know
you've expressed the opposite opinion about stability on the admin apis. But, I
fail to see the distinction between an admin api and any other api when it comes
to the stable api guarantees.

For a real world example look at the default-quotas api which was probably the
most recent example of this. (and I suspect why you mentioned it here :) ) The
reason the test was added was because it was previously removed from nova, while
horizon depended on it, which is exactly the kind of thing we should be using
tempest for. (even under your proposal since it's a co-gating rest api issue)
What's better about this example is that the test added had all the harms you
outlined about a weird cross interactions from this extension and the other
tests. I think when we weigh the complexity against the benefits for testing
admin-apis in tempest there isn't a compelling reason to pull them out of
tempest. But, as an alternative should start attempting to get clever about
scheduling tests to avoid some strange cross interactions.

I also think that #3 would help expose much more surgically what the
cross project pain points are instead of proxy efforts through Tempest
for these subtle issues. Because Tempest is probably a terrible tool to
discover that notifications in nova changed. The results is some weird
failure in a ceilometer test which says some instance didn't run when it
was expected, then you have to dig through 5 different openstack logs to
figure out that it was really a deep exception somewhere. If it was
logged, which it often isn't. (I actually challenge anyone to figure out
the reason for a ceilometer failure from a Tempest test based on it's
current logging. :) )

I agree that we should be directly testing the cross-project integration points
which aren't REST APIs. I just don't think that we should decrease the level of
api testing in tempest for something that consumes that integration point. I
just feel if we ignore the top level too much we're going to expose more api
bugs. What I think the real path forward is validating both separately.
Hopefully, that'll let us catch bugs at each level independently.

And by ensuring specific functionality earlier in the stack, and letting
Nova beat up Nova the way they think they should in a functional test
(or land a Neutron functional test to ensure that it's doing the right
thing), would make the Tempests runs which were cogating, a ton more
predictable.

== Back to Branchless Tempest ==

I think the real issues that projects are running into with Branchless
Tempest is they are coming forward with tests not in class #4, which
fail, because while the same API existed 4 months ago as today, the
semantics of the project have changed in a non discoverable way. Which
I'd say was bad, however until we tried the radical idea of running the
API test suite against all releases that declared they had the same API,
we didn't see it. :)

Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
that I don't consider this all fully formed, but it's a lot of what's
been rattling around in my brain.

So here are some of my initial thoughts, I still need to stew some more on some
of the details, so certain things may be more of a knee-jerk reaction and I
might still be missing certain intricacies. Also a part of my response here is
just me playing devil's advocate. I definitely think more testing is always
better. I just want to make sure we're targeting the right things, because this
proposal is pushing for a lot extra work for everyone. I want to make sure that
before we commit to something this large that it's the right direction.

-Matt Treinish

On 07/09/2014 05:41 AM, Eoghan Glynn wrote:

TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
makes it difficult to test new features not discoverable via APIs

Folks,

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

The issues, as I see it, are:

  • Tempest is expected to do double-duty as both the integration testing
    harness for upstream CI and as a tool for externally probing capabilities
    in public clouds

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

  • We don't have another integration test harness other than Tempest
    that we could use to host tests that don't just make assertions
    about the correctness/presence of versioned APIs

  • We want to be able to add new features to Juno, or fix bugs of
    omission, in ways that aren't necessarily discoverable in the API;
    without backporting these patches to stable if we wouldn't have
    done so under the normal stable-maint policy[3]

  • Integrated projects are required[4] to provide Tempest coverage,
    so the rate of addition of tests to Tempest is unlikely to slow
    down anytime soon

So the specific type of test that I have in mind would be common
for Ceilometer, but also possibly for Ironic and others:

  1. an end-user initiates some action via an API
    (e.g. calls the cinder snapshot API)

  2. this initiates some actions behind the scenes
    (e.g. a volume is snapshot'd and a notification emitted)

  3. the test reasons over some expected side-effect
    (e.g. some metering data shows up in ceilometer)

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

Any other ideas, or preferences among the options laid out above?

Cheers,
Eoghan

[1] http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-08-21.03.html
[2] https://review.openstack.org/104863
[3] https://wiki.openstack.org/wiki/StableBranch#Appropriate_Fixes
[4] https://github.com/openstack/governance/blob/master/reference/incubation-integration-requirements.rst#qa-1
[5] https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst#scenario-1-new-tests-for-new-features


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Sean Dague
http://dague.net


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL:

responded Jul 10, 2014 by Matthew_Treinish (11,200 points)   2 5 6
0 votes

On Thu, Jul 10, 2014 at 08:37:40AM -0400, Eoghan Glynn wrote:

Note that the notifications that capture these resource state transitions
are a long-standing mechanism in openstack that ceilometer has depended
on from the very outset. I don't think it's realistic to envisage these
interactions will be replaced by REST APIs any time soon.

I wasn't advocating doing everything over a REST API. (API is an
overloaded term) I just meant that if we're depending on
notifications for communication between projects then we should
enforce a stability contract on them. Similar to what we already
have with the API stability guidelines for the REST APIs. The fact
that there is no direct enforcement on notifications, either through
social policy or testing, is what I was taking issue with.

I also think if we decide to have a policy of enforcing notification
stability then we should directly test the notifications from an
external repo to block slips. But, that's a discussion for later, if
at all.

A-ha, OK, got it.

I've discussed enforcing such stability with jogo on IRC last night, and
kicked off a separate thread to capture that:

http://lists.openstack.org/pipermail/openstack-dev/2014-July/039858.html

However the time-horizon for that effort would be quite a bit into the
future, compared to the test coverage that we're aiming to have in place
for juno-2.

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

I think the test case you proposed is fine. I know some people will
argue that it is expanding the scope of tempest to include more
whitebox like testing, because the notification are an internal
side-effect of the api call, but I don't see it that way. It feels
more like exactly what tempest is there to enable testing, a
cross-project interaction using the api.

In my example, APIs are only used to initiate the action in cinder
and then to check the metering data in ceilometer.

But the middle-piece, i.e. the interaction between cinder & ceilometer,
is not mediated by an API. Rather, its carried via an unversioned
notification.

Yeah, exactly, that's why I feel it's a valid Tempest test case.

Just to clarify: you meant to type "it's a valid Tempest test case"
as opposed to "it's not a valid Tempest test case", right?

Heh, yes I meant to say, "it is a valid test case".

What I was referring to as the counter argument, and where the
difference of opinion was, is that the test will be making REST API
calls to both trigger a nominally internal mechanism (the
notification) from the services and then using the ceilometer api to
validate the notification worked.

Yes, that's exactly the idea.

But, it's arguably the real intent of these tests is to validate
that internal mechanism, which is basically a whitebox test. The
argument was that by testing it in tempes we're testing
notifications poorly because of it's black box limitation
notifications will just be tested indirectly. Which I feel is a
valid point, but not a sufficient reason to exclude the notification
tests from tempest.

Agreed.

I think the best way to move forward is to have functional whitebox
tests for the notifications as part of the individual projects
generating them, and that way we can direct validation of the
notification. But, I also feel there should be tempest tests on top
of that that verify the ceilometer side of consuming the
notification and the api exposing that information.

Excellent. So, indeed, more fullsome coverage of the notification
logic with in-tree tests on the producer side would definitely
to welcome, and could be seen as a "phase zero" of an overall to
fix/impove the notification mechanism.

But, they're is also a slight misunderstanding here. Having a
feature be externally discoverable isn't a hard requirement for a
config option in tempest, it's just strongly recommended. Mostly,
because if there isn't a way to discover it how are end users
expected to know what will work.

A-ha, I missed the subtle distinction there and thought that this
discoverability was a strict requirement. So how bad a citizen would
a project be considered to be if it chose not to meet that strong
recommendation?

You'd be far from the only ones who are doing that, for an existing example
look at anything on the nova driver feature matrix. Most of those aren't
discoverable from the API. So I think it would be ok to do that, but when we
have efforts like:

https://review.openstack.org/#/c/94473/

it'll make that more difficult. Which is why I think having discoverability
through the API is important. (it's the same public cloud question)

So for now, would it suffice for the master versus stable/icehouse
config to be checked-in in static form pending the completion of that
BP on tempest-conf-autogen?

Yeah, I think that'll be fine. The auto-generation stuff is far from having
complete coverage of all the tempest config options. It's more of a best effort
approach.

Then the assumption is that this static config is replaced by auto-
generating the tempest config using some project-specific discovery
mechanisms?

Not exactly, tempest will still always require a static config. The
auto-generating mechanism won't ever be used for gating, it just to help
enable configuring and running tempest against an existing deployment, which
is apparently a popular use case.

For this specific case I think it's definitely fair to have an
option for which notifications services are expected to be
generated. That's something that is definitely a configurable option
when setting up a deployment, and is something that feels like a
valid tempest config option, so we know which tests will work. We
already have similar feature flags for config time options in the
services, and having options like that would also get you out of
that backport mess you have right now.

So would this test configuration option would have a semantic like:

"a wildcarded list of notification event types that ceilometer consumes"

then tests could be skipped on the basis of the notifications that
they depend on being unavailable, in the manner of say:

@testtools.skipUnless(
matchesAll(CONF.telemetryconsumednotifications.volume,
['snapshot.exists',
'snapshot.create.',
'snapshot.delete.
',
'snapshot.resize.*',]
)
)
@test.services('volume')
def test_check_volume_notification(self):
i> ...

Is something of that ilk what you envisaged above?

Yeah that was my idea more or less, but I'd probably move the logic into a
separate
decorator to make a bit cleaner. Like:

@test.consumed_notifications('volumes', 'snapshot.exists',
'snapshot.create.*',
'snapshot.delete.*', 'snapshot.resize.*')

and you can just double them up if the test requires notifications from other
services.

OK, so the remaining thing I wanted to confirm is that it's acceptable
for the skip/no-skip logic of that decorator to be driven by static
(as opposed to discoverable) config?

Yes, it'll always be from a static config in the tempest config file. For the
purposes of skip decisions it's always a static option.

The discoverability being used in that spec I mentioned before would be a
separate tool to aid in generating that config file if one wasn't available
beforehand.

However, it does raise the question of being an end user how am I
expected to know which notifications get counted? Which is why having
the feature discoverability is generally a really good idea.

So certain things we could potentially make discoverable through the
ceilometer capabilities API. But there's a limit to how fine-grained
we can make that. Also it was primarily intended to surface lack of
feature-parity in the storage driver layer (e.g. one driver supports
sdtdev, but another doesn't) as opposed to the notification-handling
layer.

I think we should probably decouple the discoverability question
from the discussion about testing, because to a certain degree it's
a separate problem. It might be worth splitting the API
discoverability discussion off as a separate thread so we don't
cloud either topic. (pun intended :) )

I'm confused as how we can completely split off the discoverability
question, if its a strong recommendation (but not a strict requirement)
of the testing framework.

I just meant it was probably worth discussing it separately on the Ceilometer
side, as an independent topic. Because, while it's related to testing it
really can't be settled in a discussion about testing.

-Matt Treinish

I guess it really depends on how we expect people to be consuming
the API, and whether we're presenting that expectation to
people. I'll have to admit I'm not as familiar with ceilometer as I
should be, which is probably where some of my confusion is coming
from.

The intent of capabilities API was initially purely to give users
insight into the storage-driver-specific functionality exposed in
the primary API.

I guess we've been kicking around on the idea of re-purposing that
API, so that it also allows surfacing of the list of meters being
captured.

But that idea hasn't yet been firmed up to the extent that we've
figured out how users might consume it.

The naive real world example I have in my head is: someone wants to
build a tool that will generate pretty pictures from data queried
from the ceilometer API, and if they want to run this on 2+ distinct
OpenStack deployments how will they know only through the API which
notifications are getting counted, etc. If it's doable through the
API then the tool can adjust what gets generated based on an API
call, (either up front or with an expected error response, like
unsupported request. (of course a new response like unsupported
would probably mean a new API rev) Without the discoverability
manual intervention would be required each time it's used with a new
cloud. This probably isn't a valid use case and I'm missing
something critical here. In which case the discoverability
discussion is moot.

That would be a case where discoverability of meters would be useful
(i.e. not just the meters currently in the metering store, also all
the meter that this deployment of ceilometer is capable of gathering
samples for). However that feature will take a bit more thought &
discussion to firm up.

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

I think the proposal here was for people interested in doing
whitebox testing, where there is a desire to test an internal
project mechanism. I could see the argument for testing
notifications this way, but that would have to be for every project
individually. There are already several projects that have
functional testing like this in tree and run them as a gating
job. There are definitely certain classes of testing where doing
this makes sense.

I'm not sure that this would be realistic to test individually (if by
that you meant just with the ceilometer agents running alone) as it
depends on a notification emitted from cinder.

I wasn't actually referring to use it for this specific case, just a
general thought. Although maybe it would be useful to add to add
notification functional tests to Cinder. I think Sean's response on
the OP outlined some interesting alternative testing strategies
which could would definitely fit this use case. I still need to
finish forming my opinions about that before I respond to it.

OK. (And so do I).

So I wrote this in one pass, sorry if bits of it are incoherent. :)

No, it was all good, thanks!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL:

responded Jul 10, 2014 by Matthew_Treinish (11,200 points)   2 5 6
0 votes

Hi!

There is a lot of useful information in that post (even excluding the
part brainstorming solutions) and it would be a shame if it was lost in
a sub-thread. Do you plan to make a blog post, or reference wiki page,
out of this ?

Back to the content, I think a more layered testing approach (as
suggested) is a great way to reduce our gating issues, but also to
reduce the configuration matrix issue.

On the gating side, our current solution is optimized to detect rare
issues. It's a good outcome, but the main goal should really be to
detect in-project and cross-project regressions, while not preventing us
from landing patches. Rare issues detection should be a side-effect of
the data we generate, not the life-and-death issue it currently is.

So limiting co-gating tests to external interfaces blackbox testing,
while the project would still run more whitebox tests on its own
behavior sounds like a good idea. It would go a long way to limit the
impact a rare issue in project A has on project B velocity, which is
where most of the gate frustration comes from.

Adding another level of per-project functional testing also lets us test
more configuration options outside of co-gating tests. If we can test
that MySQL and Postgres behave the same from Nova's perspective in
Nova-specific functional whitebox testing, then we really don't need to
test both in cogating tests. By being more specific in what we test for
each project, we can actually test more things by running less tests.

Sean Dague wrote:
I think we need to actually step back a little and figure out where we
are, how we got here, and what the future of validation might need to
look like in OpenStack. Because I think there has been some
communication gaps. (Also, for people I've had vigorous conversations
about this before, realize my positions have changed somewhat,
especially on separation of concerns.)

(Also note, this is all mental stream right now, so I will not pretend
that it's an entirely coherent view of the world, my hope in getting
things down is we can come up with that coherent view of the wold together.)

== Basic History ==

In the essex time frame Tempest was 70 tests. It was basically a barely
adequate sniff test for integration for OpenStack. So much so that our
first 3rd Party CI system, SmokeStack, used it's own test suite, which
legitimately found completely different bugs than Tempest. Not
surprising, Tempest was a really small number of integration tests.

As we got to Grizzly Tempest had grown to 1300 tests, somewhat
organically. People were throwing a mix of tests into the fold, some
using Tempest's client, some using official clients, some trying to hit
the database doing white box testing. It had become kind of a mess and a
rorshack test. We had some really weird design summit sessions because
many people had only looked at a piece of Tempest, and assumed the rest
was like it.

So we spent some time defining scope. Tempest couldn't really be
everything to everyone. It would be a few things:
* API testing for public APIs with a contract
* Some throughput integration scenarios to test some common flows
(these were expected to be small in number)
* 3rd Party API testing (because it had existed previously)

But importantly, Tempest isn't a generic function test suite. Focus is
important, because Tempests mission always was highly aligned with what
eventually became called Defcore. Some way to validate some
compatibility between clouds. Be that clouds built from upstream (is the
cloud of 5 patches ago compatible with the cloud right now), clouds from
different vendors, public clouds vs. private clouds, etc.

== The Current Validation Environment ==

Today most OpenStack projects have 2 levels of validation. Unit tests &
Tempest. That's sort of like saying your house has a basement and a
roof. For sufficiently small values of house, this is fine. I don't
think our house is sufficiently small any more.

This has caused things like Neutron's unit tests, which actually bring
up a full wsgi functional stack and test plugins through http calls
through the entire wsgi stack, replicated 17 times. It's the reason that
Neutron unit tests takes many GB of memory to run, and often run longer
than Tempest runs. (Maru has been doing hero's work to fix much of this.)

In the last year we made it really easy to get a devstack node of your
own, configured any way you want, to do any project level validation you
like. Swift uses it to drive their own functional testing. Neutron is
working on heading down this path.

== New Challenges with New Projects ==

When we started down this path all projects had user APIs. So all
projects were something we could think about from a tenant usage
environment. Looking at both Ironic and Ceilometer, we really have
projects that are Admin API only.

== Contracts or lack thereof ==

I think this is where we start to overlap with Eoghan's thread most.
Because branchless Tempest assumes that the test in Tempest are governed
by a stable contract. The behavior should only change based on API
version, not on day of the week. In the case that triggered this what
was really being tested was not an API, but the existence of a meter
that only showed up in Juno.

Ceilometer is also another great instance of something that's often in a
state of huge amounts of stack tracing because it depends on some
internals interface in a project which isn't a contract. Or notification
formats, which aren't (largely) versioned.

Ironic has a Nova driver in their tree, which implements the Nova driver
internals interface. Which means they depend on something that's not a
contract. It gets broken a lot.

== Depth of reach of a test suite ==

Tempest can only reach so far into a stack given that it's levers are
basically public API calls. That's ok. But it means that things like
testing a bunch of different dbs in the gate (i.e. the postgresql job)
are pretty ineffectual. Trying to exercise code 4 levels deep through
API calls is like driving a rover on Mars. You can do it, but only very
carefully.

== Replication ==

Because there is such a huge gap between unit tests, and Tempest tests,
replication of issues is often challenging. We have the ability to see
races in the gate due to volume of results, that don't show up for
developers very easily. When you do 30k runs a week, a ton of data falls
out of it.

A good instance is the live snapshot bug. It was failing on about 3% of
Tempest runs, which means that it had about a 10% chance of killing a
patch on it's own. So it's definitely real. It's real enough that if we
enable that path, there are a ton of extra rechecks required by people.
However it's at a frequency that reproducing on demand is hard. And
reproducing with enough signal to make it debuggable is also hard.

== The Fail Pit ==

All of which has somewhat led us to the fail pit. Where keeping
OpenStack in a state that it can actually pass Tempest consistently is a
full time job. It's actually more than a full time job, it's a full time
program. If it was it's own program it would probably be larger than 1/2
the official programs in OpenStack.

Also, when the Gate "program" is understaffed, it means that all the
rest of the OpenStack programs (possibly excepting infra and tripleo
because they aren't in the integrated gate) are slowed down
dramatically. That velocity loss has real community and people power
implications.

This is especially true of people trying to get time, review, mentoring,
otherwise, out of the QA team. As there is kind of a natural overlap
with folks that actually want us to be able to merge code, so while the
Gate is under water, getting help on Tempest issues isn't going to
happen in any really responsive rate.

Also, all the folks that have been the work horses here, myself, joe
gordon, matt treinish, matt riedemann, are pretty burnt out on this.
Every time we seem to nail one issue, 3 more crop up. Having no ending
in sight and spending all your time shoveling out other project bugs is
not a happy place to be.

== New Thinking about our validation layers ==

I feel like an ideal world would be the following:

  1. all projects have unit tests for their own internal testing, and
    these pass 100% of the time (note, most projects have races in their
    unit tests, and they don't pass 100% of the time. And they are low
    priority to fix).
  2. all projects have a functional devstack job with tests in their own
    tree
    that pokes their project in interesting ways. This is akin to what
    neutron is trying and what swift is doing. These are not cogating.
  3. all non public API contracts are shored up by landing contract tests
    in projects. We did this recently with Ironic in Nova -
    https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.

  4. all public API contracts are tested in Tempest (these are co-gating,
    and ensure a contract breakage in keystone doesn't break swift).

Out of these 4 levels, we currently have 2 (1 and 4). In some projects
we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
sometimes 2. And the problem with this is it's actually pretty wasteful,
and when things fail, they fail so far away from the test, that the
reproduce is hard.

I actually think that if we went down this path we could actually make
Tempest smaller. For instance, negative API testing is something I'd say
is really #2. While these tests don't take a ton of time, they do add a
certain amount of complexity. It might also mean that admin tests, whose
side effects are hard to understand sometimes without white/greybox
interactions might migrated into #2.

I also think that #3 would help expose much more surgically what the
cross project pain points are instead of proxy efforts through Tempest
for these subtle issues. Because Tempest is probably a terrible tool to
discover that notifications in nova changed. The results is some weird
failure in a ceilometer test which says some instance didn't run when it
was expected, then you have to dig through 5 different openstack logs to
figure out that it was really a deep exception somewhere. If it was
logged, which it often isn't. (I actually challenge anyone to figure out
the reason for a ceilometer failure from a Tempest test based on it's
current logging. :) )

And by ensuring specific functionality earlier in the stack, and letting
Nova beat up Nova the way they think they should in a functional test
(or land a Neutron functional test to ensure that it's doing the right
thing), would make the Tempests runs which were cogating, a ton more
predictable.

== Back to Branchless Tempest ==

I think the real issues that projects are running into with Branchless
Tempest is they are coming forward with tests not in class #4, which
fail, because while the same API existed 4 months ago as today, the
semantics of the project have changed in a non discoverable way. Which
I'd say was bad, however until we tried the radical idea of running the
API test suite against all releases that declared they had the same API,
we didn't see it. :)

Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
that I don't consider this all fully formed, but it's a lot of what's
been rattling around in my brain.

-Sean

On 07/09/2014 05:41 AM, Eoghan Glynn wrote:

TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
makes it difficult to test new features not discoverable via APIs

Folks,

At the project/release status meeting yesterday[1], I raised the issue
that featureful backports to stable are beginning to show up[2], purely
to facilitate branchless Tempest. We had a useful exchange of views on
IRC but ran out of time, so this thread is intended to capture and
complete the discussion.

The issues, as I see it, are:

  • Tempest is expected to do double-duty as both the integration testing
    harness for upstream CI and as a tool for externally probing capabilities
    in public clouds

  • Tempest has an implicit bent towards pure API tests, yet not all
    interactions between OpenStack services that we want to test are
    mediated by APIs

  • We don't have another integration test harness other than Tempest
    that we could use to host tests that don't just make assertions
    about the correctness/presence of versioned APIs

  • We want to be able to add new features to Juno, or fix bugs of
    omission, in ways that aren't necessarily discoverable in the API;
    without backporting these patches to stable if we wouldn't have
    done so under the normal stable-maint policy[3]

  • Integrated projects are required[4] to provide Tempest coverage,
    so the rate of addition of tests to Tempest is unlikely to slow
    down anytime soon

So the specific type of test that I have in mind would be common
for Ceilometer, but also possibly for Ironic and others:

  1. an end-user initiates some action via an API
    (e.g. calls the cinder snapshot API)

  2. this initiates some actions behind the scenes
    (e.g. a volume is snapshot'd and a notification emitted)

  3. the test reasons over some expected side-effect
    (e.g. some metering data shows up in ceilometer)

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].

One approach mooted for allowing these kind of scenarios to be tested
was to split off the pure-API aspects of Tempest so that it can be used
for probing public-cloud-capabilities as well as upstream CI, and then
build project-specific mini-Tempests to test integration with other
projects.

Personally, I'm not a fan of that approach as it would require a lot
of QA expertise in each project, lead to inefficient use of CI
nodepool resources to run all the mini-Tempests, and probably lead to
a divergent hotchpotch of per-project approaches.

Another idea would be to keep all tests in Tempest, while also
micro-versioning the services such that tests can be skipped on the
basis of whether a particular feature-adding commit is present.

When this micro-versioning can't be discovered by the test (as in the
public cloud capabilities probing case), those tests would be skipped
anyway.

The final, less palatable, approach that occurs to me would be to
revert to branchful Tempest.

Any other ideas, or preferences among the options laid out above?

Cheers,
Eoghan

[1] http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-08-21.03.html
[2] https://review.openstack.org/104863
[3] https://wiki.openstack.org/wiki/StableBranch#Appropriate_Fixes
[4] https://github.com/openstack/governance/blob/master/reference/incubation-integration-requirements.rst#qa-1
[5] https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst#scenario-1-new-tests-for-new-features


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Thierry Carrez (ttx)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL:

responded Jul 10, 2014 by Thierry_Carrez (57,480 points)   3 10 16
0 votes

On 07/10/2014 08:53 AM, Matthew Treinish wrote:
On Thu, Jul 10, 2014 at 08:37:40AM -0400, Eoghan Glynn wrote:

Note that the notifications that capture these resource state transitions
are a long-standing mechanism in openstack that ceilometer has depended
on from the very outset. I don't think it's realistic to envisage these
interactions will be replaced by REST APIs any time soon.
I wasn't advocating doing everything over a REST API. (API is an
overloaded term) I just meant that if we're depending on
notifications for communication between projects then we should
enforce a stability contract on them. Similar to what we already
have with the API stability guidelines for the REST APIs. The fact
that there is no direct enforcement on notifications, either through
social policy or testing, is what I was taking issue with.

I also think if we decide to have a policy of enforcing notification
stability then we should directly test the notifications from an
external repo to block slips. But, that's a discussion for later, if
at all.
A-ha, OK, got it.

I've discussed enforcing such stability with jogo on IRC last night, and
kicked off a separate thread to capture that:

http://lists.openstack.org/pipermail/openstack-dev/2014-July/039858.html

However the time-horizon for that effort would be quite a bit into the
future, compared to the test coverage that we're aiming to have in place
for juno-2.

The branchless Tempest spec envisages new features will be added and
need to be skipped when testing stable/previous, but IIUC requires
that the presence of new behaviors is externally discoverable[5].
I think the test case you proposed is fine. I know some people will
argue that it is expanding the scope of tempest to include more
whitebox like testing, because the notification are an internal
side-effect of the api call, but I don't see it that way. It feels
more like exactly what tempest is there to enable testing, a
cross-project interaction using the api.
In my example, APIs are only used to initiate the action in cinder
and then to check the metering data in ceilometer.

But the middle-piece, i.e. the interaction between cinder & ceilometer,
is not mediated by an API. Rather, its carried via an unversioned
notification.
Yeah, exactly, that's why I feel it's a valid Tempest test case.
Just to clarify: you meant to type "it's a valid Tempest test case"
as opposed to "it's not a valid Tempest test case", right?
Heh, yes I meant to say, "it is a valid test case".

What I was referring to as the counter argument, and where the
difference of opinion was, is that the test will be making REST API
calls to both trigger a nominally internal mechanism (the
notification) from the services and then using the ceilometer api to
validate the notification worked.
Yes, that's exactly the idea.

But, it's arguably the real intent of these tests is to validate
that internal mechanism, which is basically a whitebox test. The
argument was that by testing it in tempes we're testing
notifications poorly because of it's black box limitation
notifications will just be tested indirectly. Which I feel is a
valid point, but not a sufficient reason to exclude the notification
tests from tempest.
Agreed.

I think the best way to move forward is to have functional whitebox
tests for the notifications as part of the individual projects
generating them, and that way we can direct validation of the
notification. But, I also feel there should be tempest tests on top
of that that verify the ceilometer side of consuming the
notification and the api exposing that information.
Excellent. So, indeed, more fullsome coverage of the notification
logic with in-tree tests on the producer side would definitely
to welcome, and could be seen as a "phase zero" of an overall to
fix/impove the notification mechanism.

But, they're is also a slight misunderstanding here. Having a
feature be externally discoverable isn't a hard requirement for a
config option in tempest, it's just strongly recommended. Mostly,
because if there isn't a way to discover it how are end users
expected to know what will work.
A-ha, I missed the subtle distinction there and thought that this
discoverability was a strict requirement. So how bad a citizen would
a project be considered to be if it chose not to meet that strong
recommendation?
You'd be far from the only ones who are doing that, for an existing example
look at anything on the nova driver feature matrix. Most of those aren't
discoverable from the API. So I think it would be ok to do that, but when we
have efforts like:

https://review.openstack.org/#/c/94473/

it'll make that more difficult. Which is why I think having discoverability
through the API is important. (it's the same public cloud question)
So for now, would it suffice for the master versus stable/icehouse
config to be checked-in in static form pending the completion of that
BP on tempest-conf-autogen?
Yeah, I think that'll be fine. The auto-generation stuff is far from having
complete coverage of all the tempest config options. It's more of a best effort
approach.

Then the assumption is that this static config is replaced by auto-
generating the tempest config using some project-specific discovery
mechanisms?
Not exactly, tempest will still always require a static config. The
auto-generating mechanism won't ever be used for gating, it just to help
enable configuring and running tempest against an existing deployment, which
is apparently a popular use case.

For this specific case I think it's definitely fair to have an
option for which notifications services are expected to be
generated. That's something that is definitely a configurable option
when setting up a deployment, and is something that feels like a
valid tempest config option, so we know which tests will work. We
already have similar feature flags for config time options in the
services, and having options like that would also get you out of
that backport mess you have right now.
So would this test configuration option would have a semantic like:

"a wildcarded list of notification event types that ceilometer consumes"

then tests could be skipped on the basis of the notifications that
they depend on being unavailable, in the manner of say:

@testtools.skipUnless(
matchesAll(CONF.telemetryconsumednotifications.volume,
['snapshot.exists',
'snapshot.create.',
'snapshot.delete.
',
'snapshot.resize.*',]
)
)
@test.services('volume')
def test_check_volume_notification(self):
i> ...
Is something of that ilk what you envisaged above?
Yeah that was my idea more or less, but I'd probably move the logic into a
separate
decorator to make a bit cleaner. Like:

@test.consumed_notifications('volumes', 'snapshot.exists',
'snapshot.create.*',
'snapshot.delete.*', 'snapshot.resize.*')

and you can just double them up if the test requires notifications from other
services.
OK, so the remaining thing I wanted to confirm is that it's acceptable
for the skip/no-skip logic of that decorator to be driven by static
(as opposed to discoverable) config?
Yes, it'll always be from a static config in the tempest config file. For the
purposes of skip decisions it's always a static option.

The discoverability being used in that spec I mentioned before would be a
separate tool to aid in generating that config file if one wasn't available
beforehand.

Right. But making a non-discoverable, but tempest-visible, change means
the cloud deployer or installer has to know whether the code being
deployed has the fix or not so tempest can be informed about the value
of this option. Unless I am missing something, this is really saying
there is a time-based versioning going on but it will be hidden from users.

What is the objection to saying that all api capabilities should be
discoverable? I realize that any bug fix causes a "behavior change" but
we should be able to have guidelines for what behaviour changes must be
discoverable just as we do for api stability. It's really the same issue.

-David

responded Jul 10, 2014 by David_Kranz (4,560 points)   1 3 3
...