settingsLogin | Registersettings

[openstack-dev] [TripleO] Austin summit - session recap/summary

0 votes

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked May 3, 2016 in openstack-dev by Steven_Hardy (16,900 points)   2 7 13
retagged Jan 25, 2017 by admin

11 Responses

0 votes

On 05/03/2016 10:34 AM, Steven Hardy wrote:
Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Thanks Steve for the really awesome write-up. Much of what I read here seems to have been bubbling up recently so it's nice to see it getting the attention it deserves.

Thanks,
Jason

--
Jason E. Rist
Senior Software Engineer
OpenStack User Interfaces
Red Hat, Inc.
openuc: +1.972.707.6408
mobile: +1.720.256.3933
Freenode: jrist
github/twitter: knowncitizen


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 3, 2016 by Jason_Rist (2,740 points)   1 2
0 votes

Thanks a ton, Steve! I have to admit, although I was at the summit in
person, I can get out of your write-up a lot more than from the sessions
itself. (Probably because of a mixture of not being a native speaker and
being new to tripleO.)

Cheers,

Sven


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 4, 2016 by Sven_Anderson (300 points)  
0 votes

On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 6, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

On Fri, May 06, 2016 at 09:18:03AM -0400, Paul Belanger wrote:
On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

I could use another set of eyes on patch [1] below. I've done a few rechecks
now and cannot get tripleo-ci to pass consistently. It appears I'm running into
random timeouts.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I had a chance to look into this today. To move forward, we'd need a static
VM setup, with public IPv4 address and about 100GB of HDD (more the better). We
don't need much memory (2GB) and single core, we are just serving HTTP traffic.
Lastly, it should be running ubuntu-trusty so our puppet manifests in
openstack-infra work correctly.

Is this something we could stand up this week?

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 9, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

On Mon, May 09, 2016 at 04:50:19PM -0400, Paul Belanger wrote:
On Fri, May 06, 2016 at 09:18:03AM -0400, Paul Belanger wrote:

On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

I could use another set of eyes on patch [1] below. I've done a few rechecks
now and cannot get tripleo-ci to pass consistently. It appears I'm running into
random timeouts.

This is now merged, thanks!

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

We've bumped to 25GB for now, but still running out of room. I have another
question about regarding devstack to hopefully address this.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I had a chance to look into this today. To move forward, we'd need a static
VM setup, with public IPv4 address and about 100GB of HDD (more the better). We
don't need much memory (2GB) and single core, we are just serving HTTP traffic.
Lastly, it should be running ubuntu-trusty so our puppet manifests in
openstack-infra work correctly.

Is this something we could stand up this week?

This is the last step we need to discuss on the list. Is there any objections
to creating the instance today / tomorrow? See my last comment about the amount
of resources we need.

Looking at the cloud today, we have d1.medium, 4GB, 300GB, 2VCPUs. Do we have
the resources to launch one of theses?

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 16, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

On 6 May 2016 at 14:18, Paul Belanger pabelanger@redhat.com wrote:
On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

The conversion should be ok for the moment to allow use to make
progress, longer term
we'll probably need to change the libvirt domain definitions on the
testenvs in order to
be able to just generate and use a raw format.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

This is where we'll likely hit the biggest problems, In order to bump
the disk space allocated to the jenkins slaves and to simultaneously
take advantage of the SSD's we're going to have to look into using the
SSD's as a cache for the spinning disks. I havn't done this before but
I hope we can look into it soon.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I'm not sure we'll get as much a benefit from this as the devstack
based jobs do, as is some of the mirrors you mention wouldn't be used
at all while others we would only make very light use of. Is it
possible to selectively add mirrors to the AFS mirror, or add
additional things that tripleo would be interested in? e.g. image
cache

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 18, 2016 by Derek_Higgins (5,340 points)   1 3 3
0 votes

On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:
On 6 May 2016 at 14:18, Paul Belanger pabelanger@redhat.com wrote:

On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

The conversion should be ok for the moment to allow use to make
progress, longer term
we'll probably need to change the libvirt domain definitions on the
testenvs in order to
be able to just generate and use a raw format.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

This is where we'll likely hit the biggest problems, In order to bump
the disk space allocated to the jenkins slaves and to simultaneously
take advantage of the SSD's we're going to have to look into using the
SSD's as a cache for the spinning disks. I havn't done this before but
I hope we can look into it soon.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I'm not sure we'll get as much a benefit from this as the devstack
based jobs do, as is some of the mirrors you mention wouldn't be used
at all while others we would only make very light use of. Is it
possible to selectively add mirrors to the AFS mirror, or add
additional things that tripleo would be interested in? e.g. image
cache

I think you'll actually benefit from this, mostly because you no longer have to
run your own mirror / squid servers in tripleo. The way AFS mirrors work is
more like a cache.

Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our
jobs only access a small fraction of the data, most mirror AFS servers are only
using about 5GB of data locally.

In the case of tripleo, it will even be less since you are not running the full
suite of job in your cloud.

Right now, nothing would need to chance to selectively use mirrors, because
AFS will only cache what is used. As for adding things specific to tripleo, it
could be possible, it is also possible other jobs will likely need the same bits
too.

I strongly encourage us to setup an AFS mirror.

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 18, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

On 18 May 2016 at 13:34, Paul Belanger pabelanger@redhat.com wrote:
On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:

On 6 May 2016 at 14:18, Paul Belanger pabelanger@redhat.com wrote:

On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

The conversion should be ok for the moment to allow use to make
progress, longer term
we'll probably need to change the libvirt domain definitions on the
testenvs in order to
be able to just generate and use a raw format.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

This is where we'll likely hit the biggest problems, In order to bump
the disk space allocated to the jenkins slaves and to simultaneously
take advantage of the SSD's we're going to have to look into using the
SSD's as a cache for the spinning disks. I havn't done this before but
I hope we can look into it soon.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I'm not sure we'll get as much a benefit from this as the devstack
based jobs do, as is some of the mirrors you mention wouldn't be used
at all while others we would only make very light use of. Is it
possible to selectively add mirrors to the AFS mirror, or add
additional things that tripleo would be interested in? e.g. image
cache

I think you'll actually benefit from this, mostly because you no longer have to
run your own mirror / squid servers in tripleo. The way AFS mirrors work is
more like a cache.

Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our
jobs only access a small fraction of the data, most mirror AFS servers are only
using about 5GB of data locally.

In the case of tripleo, it will even be less since you are not running the full
suite of job in your cloud.

Right now, nothing would need to chance to selectively use mirrors, because
AFS will only cache what is used. As for adding things specific to tripleo, it
could be possible, it is also possible other jobs will likely need the same bits
too.

I strongly encourage us to setup an AFS mirror.

Ok, I'm still a little skeptical because our biggest bandwidth hogs
arn't mentioned in the list of things mirrored , but that's not a good
reason to get in your way, if it proves to be a help then great, if
not at least we tried, so what do you need from me to try it out? If I
create a d1.medium trusty instance with a floating IP, will that work
for you? This should allow you to test things for now, longer term
were going to have the same problems we do with larger jenkins
instance so until we solve this we wont be able to consider this a
permanent part of the infrastructure.

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2016 by Derek_Higgins (5,340 points)   1 3 3
0 votes

On Wed, May 18, 2016 at 08:34:40AM -0400, Paul Belanger wrote:
On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:

On 6 May 2016 at 14:18, Paul Belanger pabelanger@redhat.com wrote:

On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

The conversion should be ok for the moment to allow use to make
progress, longer term
we'll probably need to change the libvirt domain definitions on the
testenvs in order to
be able to just generate and use a raw format.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

This is where we'll likely hit the biggest problems, In order to bump
the disk space allocated to the jenkins slaves and to simultaneously
take advantage of the SSD's we're going to have to look into using the
SSD's as a cache for the spinning disks. I havn't done this before but
I hope we can look into it soon.

Looks like we just ran out of space again on centos-7 DIB. 7GB for /opt/git,
10+GB for devstack-gate and the rest is converting the iamge from qcow2 to raw
and back.

Is there a diagram of how the cloud is deployed and resources? I'm having
trouble trying to figure out the setup of everything.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I'm not sure we'll get as much a benefit from this as the devstack
based jobs do, as is some of the mirrors you mention wouldn't be used
at all while others we would only make very light use of. Is it
possible to selectively add mirrors to the AFS mirror, or add
additional things that tripleo would be interested in? e.g. image
cache

I think you'll actually benefit from this, mostly because you no longer have to
run your own mirror / squid servers in tripleo. The way AFS mirrors work is
more like a cache.

Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our
jobs only access a small fraction of the data, most mirror AFS servers are only
using about 5GB of data locally.

In the case of tripleo, it will even be less since you are not running the full
suite of job in your cloud.

Right now, nothing would need to chance to selectively use mirrors, because
AFS will only cache what is used. As for adding things specific to tripleo, it
could be possible, it is also possible other jobs will likely need the same bits
too.

I strongly encourage us to setup an AFS mirror.

Any feedback here? I'd like to finish off this work if possible this week, but
we seem to be in a holding pattern on this.

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
0 votes

On Thu, May 19, 2016 at 03:50:15PM +0100, Derek Higgins wrote:
On 18 May 2016 at 13:34, Paul Belanger pabelanger@redhat.com wrote:

On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:

On 6 May 2016 at 14:18, Paul Belanger pabelanger@redhat.com wrote:

On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:

Hi all,

Some folks have requested a summary of our summit sessions, as has been
provided for some other projects.

I'll probably go into more detail on some of these topics either via
subsequent more focussed threads an/or some blog posts but what follows is
an overview of our summit sessions[1] with notable actions or decisions
highlighted. I'm including some of my own thoughts and conclusions, folks
are welcome/encouraged to follow up with their own clarifications or
different perspectives :)

TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:


Upgrades - current status and roadmap

In this session we discussed the current state of upgrades - initial
support for full major version upgrades has been implemented, but the
implementation is monolithic, highly coupled to pacemaker, and inflexible
with regard to third-party extraconfig changes.

The main outcomes were that we will add support for more granular
definition of the upgrade lifecycle to the new composable services format,
and that we will explore moving towards the proposed lightweight HA
architecture to reduce the need for so much pacemaker specific logic.

We also agreed that investigating use of mistral to drive upgrade workflows
was a good idea - currently we have a mixture of scripts combined with Heat
to drive the upgrade process, and some refactoring into discrete mistral
workflows may provide a more maintainable solution. Potential for using
the existing SoftwareDeployment approach directly via mistral (outside of
the heat templates) was also discussed as something to be further
investigated and prototyped.

We also touched on the CI implications of upgrades - we've got an upgrades
job now, but we need to ensure coverage of full release-to-release upgrades
(not just commit to commit).


Containerization status/roadmap

In this session we discussed the current status of containers in TripleO
(which is to say, the container based compute node which deploys containers
via Heat onto an an Atomic host node that is also deployed via Heat), and
what strategy is most appropriate to achieve a fully containerized TripleO
deployment.

Several folks from Kolla participated in the session, and there was
significant focus on where work may happen such that further collaboration
between communities is possible. To some extent this discussion on where
(as opposed to how) proved a distraction and prevented much discussion on
supportable architectural implementation for TripleO, thus what follows is
mostly my perspective on the issues that exist:

Significant uncertainty exists wrt integration between Kolla and TripleO -
there's largely consensus that we want to consume the container images
defined by the Kolla community, but much less agreement that we can
feasably switch to the ansible-orchestrated deployment/config flow
supported by Kolla without breaking many of our primary operator interfaces
in a fundamentally unacceptable way, for example:

  • The Mistral based API is being implemented on the expectation that the
    primary interface to TripleO deployments is a parameters schema exposed
    by a series of Heat templates - this is no longer true in a "split stack"
    model where we have to hand off to an alternate service orchestration tool.

  • The tripleo-ui (based on the Mistral based API) consumes heat parameter
    schema to build it's UI, and Ansible doesn't support the necessary
    parameter schema definition (such as types and descriptions) to enable
    this pattern to be replicated. Ansible also doesn't provide a HTTP API,
    so we'd still have to maintain and API surface for the (non python) UI to
    consume.

We also discussed ideas around integration with kubernetes (a hot topic on
the Kolla track this summit), but again this proved inconclusive beyond
that yes someone should try developing a PoC to stimulate further
discussion. Again, significant challenges exist:

  • We still need to maintain the Heat parameter interfaces for the API/UI,
    and there is also a strong preference to maintain puppet as a tool for
    generating service configuration (so that existing operator integrations
    via puppet continue to function) - this is a barrier to directly
    consuming the kolla-kubernetes effort directly.

  • A COE layer like kubernetes is a poor fit for deployments where operators
    require strict control of service placement (e.g exactly which nodes a service
    runs on, IP address assignments to specific nodes etc) - this is already
    a strong requirement for TripleO users and we need to figure out if/how
    it's possible to control container placement per node/namespace.

  • There are several uncertainties regarding the HA architecture, such as
    how do we achieve fencing for nodes (which is currently provided via
    pacemaker), in particular the HA model for real production deployments
    via kubernetes for stateful services such as rabbit/galera is unclear.

Overall a session with much discussion, but further prototyping and
discussion is required before we can define a definitive implementation
strategy (several folks are offering to be involved in this).


Work session (Composable Services and beyond)

In this session we discussed the status of the currently in-progress work
to decompose our monolithic manifests into per-service profiles[3] in
puppet-tripleo, then consume these profiles via per-service templates in
tripleo-heat-templates[4][5], and potential further work to enable fully
composable (including user defined) roles.

Overall there was agreement that the composable services work and puppet
refactoring are going well, but that we need to improve velocity and get
more reviewers helping to land the changes. There was also agreement that
a sub-team should form temporarily to drive the remaining work[6], that
we should not land any new features in the "old" template architecture and
relatedly that tripleo cores should help rebase and convert currently
under-review changes to the new format where needed to ease the transition.

I described a possible approach to providing fully composable roles that
uses some template pre-processing (via jinja2)[7], a blueprint and initial
implementation will be posted soon, but overall the response was positive,
and it may provide a workable path to fully composable roles that won't
break upgrades of existing deployments.


Work session (API and TripleO UI)

In this session we disccussed the current status of the TripleO UI, and the
Mistral based API implementation it depends on.

Overall it's clear there is a lot of good progress in this area, but there
are some key areas which require focus and additional work to enable a
fully functional upstream TripleO UI:

  • The undercloud requires some configuration changes to enable the UI
    necessary access to the undercloud services

  • The UI currently depends on the previous prototype API implementation,
    and must be converted to the new Mistral based API (in-progress)

  • We need to improve velocity of the Mistral based implementation (need
    more testing and reviewing), such that we can land it and folks can start
    integrating with it.

  • There was agreement that the previously proposed validation API can be
    implemented as another Mistral action, which will provide a way to run
    validation related to the undercloud configuration/state.

  • There are some features we could add to Heat which would make
    implementation cleaner (description/metadata in environment files, enable
    multiple parameter groups.

The session concluded with some discussion around the requirements related
to network configuration. Currently the templates offer considerable
flexibility in this regard, and we need to decide how this is surfaced via
the API such that it's easily consumable via TripleO Ux interfaces.


Work session (Reducing the CI pain)

This session covered a few topics, but mostly ended up focussed on the
debate with infra regarding moving to 3rd party CI. There are arguments on
both sides here, and I'll perhaps let derekh or dprince reply with a more
detailed discussion of them, but suffice to say there wasn't a clear
conclusion, and discussion is ongoing.

It was mostly me pushing for tripleo to move to 3rd party CI. I still think it
is the right place for tripleo however after hearing dprince's concerns I think
we have a compromise for the moment. I've gone a head and done the work to
upgrade tripleo-ci jenkins slave from Fedora-22 to the centos-7 DIB[1] produced by
openstack-infra. Please take a moment to review the patch as it exposed 3
issues.

1) CentOS 7 does not support nbd out of the box, and we can't compile a new
kernel ATM. So, I've worked around the problem by converting the qcow2 image to
raw format, update instack and reconverted it back to qcow2. Ideally, if I can
find where the instack.qcow2 image is build, we also produce a raw format so we
don't have to do this every gate job.

The conversion should be ok for the moment to allow use to make
progress, longer term
we'll probably need to change the libvirt domain definitions on the
testenvs in order to
be able to just generate and use a raw format.

2) Jenkins slave needs more HDD space. Using centos-7 we cache data to the slave
now, mostly packages and git repos. As a result the HDD starts at 7.5GB and
because the current slaves use 20GB we quickly run out of space. Ideally we
need 80GB[2] of space to be consistent with the other cloud provides we run
jenkins slaves on.

This is where we'll likely hit the biggest problems, In order to bump
the disk space allocated to the jenkins slaves and to simultaneously
take advantage of the SSD's we're going to have to look into using the
SSD's as a cache for the spinning disks. I havn't done this before but
I hope we can look into it soon.

3) No AFS mirror in tripleo-ci[3]. To take advantage of the new centos-7 dib,
openstack-infra has an AFS mirroring infrastructure in place. As a result,
we'll also need to launch one in tripleo-ci. For the moment, I've disabled the
logic to configure the mirror. Mirrors include pypi, npm, wheel, ubuntu trusty,
ubuntu precise, ceph. We are bringing RPM mirrors online shortly.

I'm not sure we'll get as much a benefit from this as the devstack
based jobs do, as is some of the mirrors you mention wouldn't be used
at all while others we would only make very light use of. Is it
possible to selectively add mirrors to the AFS mirror, or add
additional things that tripleo would be interested in? e.g. image
cache

I think you'll actually benefit from this, mostly because you no longer have to
run your own mirror / squid servers in tripleo. The way AFS mirrors work is
more like a cache.

Currently our AFS volumes in rax-dfw are over 1TB of data now, but since our
jobs only access a small fraction of the data, most mirror AFS servers are only
using about 5GB of data locally.

In the case of tripleo, it will even be less since you are not running the full
suite of job in your cloud.

Right now, nothing would need to chance to selectively use mirrors, because
AFS will only cache what is used. As for adding things specific to tripleo, it
could be possible, it is also possible other jobs will likely need the same bits
too.

I strongly encourage us to setup an AFS mirror.

Ok, I'm still a little skeptical because our biggest bandwidth hogs
arn't mentioned in the list of things mirrored , but that's not a good
reason to get in your way, if it proves to be a help then great, if
not at least we tried, so what do you need from me to try it out? If I
create a d1.medium trusty instance with a floating IP, will that work
for you? This should allow you to test things for now, longer term
were going to have the same problems we do with larger jenkins
instance so until we solve this we wont be able to consider this a
permanent part of the infrastructure.

I just need to know the flavor we are using, I'll be using our
opentack-infra/system-config launch-node script to provision the server. Since
we need to loop it into our ansible / puppet wheel.

If you are okay with d1.medium for now, I can start it.

I'd really like to get some feedback on these 3 issue, I know they might not be
solved today because of the hardware move. However, I think we are pretty close
now to getting triplo-ci more inline with some of the openstack-infra tooling.

[1] https://review.openstack.org/#/c/312725/
[2] https://review.openstack.org/#/c/312992/
[3] https://review.openstack.org/#/c/312058/

The other output from this session was agreement that we'd move our jobs to
a different cloud (managed by the RDO community) ahead of a planned
relocation of our current hardware. This has advantages in terms of
maintenance overhead, and if it all goes well we can contribute our
hardware to this cloud long term vs maintaining our own infrastructure.

Overall it was an excellent week, and I thank all the session participants
for their input and discussion. Further notes can be found in the
etherpads linked from [1] but feel free to reply if specific items require
clarification (and/or I've missed anything!)

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/Design_Summit/Newton/Etherpads#TripleO
[2] https://review.openstack.org/#/c/299628/
[3] https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests
[4] https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles
[5] https://etherpad.openstack.org/p/tripleo-composable-roles-work
[6] http://lists.openstack.org/pipermail/openstack-dev/2016-April/093533.html
[7] http://paste.fedoraproject.org/360836/87416814/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded May 19, 2016 by pabelanger_at_redhat (6,560 points)   1 1 2
...