settingsLogin | Registersettings

[openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

0 votes

Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that
specific filter, why I dislike it and how I think we could improve the
situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are checked
against all the enabled filters and the TrustedFilter is making an
external HTTP(S) call to the Attestation API service (not handled by
Nova) for each host to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing. I
can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the HTTP
requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the exposed
usecase given in [1] (ie. I want to make sure that if my host gets
compromised, my instances will not be running on that host) but that
just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for a
specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova
responsibility since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool responsible
for periodically checking the state of the hosts, and if compromised
then call the Nova API for fencing the host and evacuating the
compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
asked Jun 23, 2015 in openstack-dev by Sylvain_Bauza (14,100 points)   1 3 5

16 Responses

0 votes

I agree. I feel like this is another example of functionality which is
trivially implemented outside nova, and where it works much better if
we don't do it. Couldn't an admin just have a cron job which verifies
hosts, and then adds them to a compromised-hosts host aggregate if
they're owned? I assume without testing it that you can migrate
instances out of a host aggregate you can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com wrote:
Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that specific
filter, why I dislike it and how I think we could improve the situation -
and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly all
the evacuate/live-migrate cases, but let's not discuss about that now). When
the request goes in the scheduler, all the hosts are checked against all the
enabled filters and the TrustedFilter is making an external HTTP(S) call to
the Attestation API service (not handled by Nova) for each host to see if
the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly does
an external call to a separate service that Nova is not managing. I can see
at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're IO-blocking

N times given N hosts (we're even not multiplexing the HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that conceptually
we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for Nova

(since it's an in-tree filter) while it's not listed as a dependency and
thus not gated.

All of these reasons could be acceptable if that would cover the exposed
usecase given in [1] (ie. I want to make sure that if my host gets
compromised, my instances will not be running on that host) but that just
doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to prevent
its election as a valid destination host. There is no need for a specialised
filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova responsibility
since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as something analog
as the HA usecase [3] where we need a 3rd-party tool responsible for
periodically checking the state of the hosts, and if compromised then call
the Nova API for fencing the host and evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly mention
to drop it from in-tree in a later cycle https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 23, 2015 by Michael_Still (16,180 points)   3 6 13
0 votes

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which is designed to solve the performance issue mentioned here.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How are they going to do upgrade in the future?

2. Dependency should not be a reason to deprecate a module in OpenStack, Nova is not a stand-alone module, and it depends on various technologies and libraries.

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to address the concerns mentioned in the thread. And also, OAT is an open source project which is being maintained as the long-term strategy.

For the situation that a host gets compromised, OAT checks trusted or untrusted from the start point of boot/reboot, it is hard for OAT to detect whether a host gets compromised when it is running, I don't know how to detect that without the filter?
Back to Michael's question, the process of the verification is done by software automatically when a host boots or reboots, will that be an overhead for the admin to have a separate job?

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is trivially implemented outside nova, and where it works much better if we don't do it. Couldn't an admin just have a cron job which verifies hosts, and then adds them to a compromised-hosts host aggregate if they're owned? I assume without testing it that you can migrate instances out of a host aggregate you can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com wrote:
Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Wang,_Shane (1,720 points)   1 3
0 votes

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying implementations. Its default could even be "Not Implemented" and always return false.
And Nova.conf could specify use the OAT trust implementation. This would not break present day users of the functionality.

2) The issue in the original bug is a a VM waking up after a reboot on a host that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints requested by a VM during launch are confirmed to hold when it re-awakens, even if it is not
going through Nova scheduler at this point.

 This holds even for aggregates that might be specified by geo, or even reservation such as "Coke" or "Pepsi".
 What if a host, even without a reboot and certainly before a reboot was assigned from Coke to Pepsi, there is cross contamination.
 Perhaps we need Nova hooks that can be registered with functions that check expected aggregate values.

 Better still have  libvirt functionality that makes a call back for each VM on a host to ensure its constraints are satisfied on start-up/boot, and re-start when it comes out of pause.

 Using aggregate for trust with a cron job to check for trust is inefficient in this case, trust status gets updated only on a host reboot. Intel TXT is a boot
 time authentication.

Regards
Malini

-----Original Message-----
From: Wang, Shane [mailto:shane.wang@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which is designed to solve the performance issue mentioned here.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How are they going to do upgrade in the future?

2. Dependency should not be a reason to deprecate a module in OpenStack, Nova is not a stand-alone module, and it depends on various technologies and libraries.

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to address the concerns mentioned in the thread. And also, OAT is an open source project which is being maintained as the long-term strategy.

For the situation that a host gets compromised, OAT checks trusted or untrusted from the start point of boot/reboot, it is hard for OAT to detect whether a host gets compromised when it is running, I don't know how to detect that without the filter?
Back to Michael's question, the process of the verification is done by software automatically when a host boots or reboots, will that be an overhead for the admin to have a separate job?

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is trivially implemented outside nova, and where it works much better if we don't do it. Couldn't an admin just have a cron job which verifies hosts, and then adds them to a compromised-hosts host aggregate if they're owned? I assume without testing it that you can migrate instances out of a host aggregate you can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com wrote:
Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Bhandaru,_Malini_K (1,920 points)   4
0 votes

Only if all the hosts managed by OpenStack are capable for measured boot process, then let 3rd-party tool call nova fencing API might be better than using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we can schedule VM to only measured and trusted host, but in 3rd-party tool case, only untrusted/compromised hosts will be fenced, the host with unknown trustworthiness will still be able to run VM but the owner is not willing to do it that way.

So I would suggest using the 3rd-party tools as enhancing way to supplement our TCP/trustedfilter feature. And the 3rd party tools can also call attestation API for host attestation.

Thanks
Jimmy

-----Original Message-----
From: Bhandaru, Malini K [mailto:malini.k.bhandaru@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying implementations. Its default could even be "Not Implemented" and always return false.
And Nova.conf could specify use the OAT trust implementation. This would not break present day users of the functionality.

2) The issue in the original bug is a a VM waking up after a reboot on a host that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints requested by a VM during launch are confirmed to hold when it re-awakens, even if it is not
going through Nova scheduler at this point.

 This holds even for aggregates that might be specified by geo, or even reservation such as "Coke" or "Pepsi".
 What if a host, even without a reboot and certainly before a reboot was assigned from Coke to Pepsi, there is cross contamination.
 Perhaps we need Nova hooks that can be registered with functions that check expected aggregate values.

 Better still have  libvirt functionality that makes a call back for each VM on a host to ensure its constraints are satisfied on start-up/boot, and re-start when it comes out of pause.

 Using aggregate for trust with a cron job to check for trust is inefficient in this case, trust status gets updated only on a host reboot. Intel TXT is a boot
 time authentication.

Regards
Malini

-----Original Message-----
From: Wang, Shane [mailto:shane.wang@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which is designed to solve the performance issue mentioned here.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How are they going to do upgrade in the future?

2. Dependency should not be a reason to deprecate a module in OpenStack, Nova is not a stand-alone module, and it depends on various technologies and libraries.

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to address the concerns mentioned in the thread. And also, OAT is an open source project which is being maintained as the long-term strategy.

For the situation that a host gets compromised, OAT checks trusted or untrusted from the start point of boot/reboot, it is hard for OAT to detect whether a host gets compromised when it is running, I don't know how to detect that without the filter?
Back to Michael's question, the process of the verification is done by software automatically when a host boots or reboots, will that be an overhead for the admin to have a separate job?

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is trivially implemented outside nova, and where it works much better if we don't do it. Couldn't an admin just have a cron job which verifies hosts, and then adds them to a compromised-hosts host aggregate if they're owned? I assume without testing it that you can migrate instances out of a host aggregate you can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com wrote:
Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Wei,_Gang (140 points)  
0 votes

(general point, could we please try not top-posting ? It makes a little
harder to follow the conversation)

Replies inline.

Le 24/06/2015 08:15, Wei, Gang a écrit :

Only if all the hosts managed by OpenStack are capable for measured boot process, then let 3rd-party tool call nova fencing API might be better than using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we can schedule VM to only measured and trusted host, but in 3rd-party tool case, only untrusted/compromised hosts will be fenced, the host with unknown trustworthiness will still be able to run VM but the owner is not willing to do it that way.
You don't need a specific filter for fencing one host from being
scheduled. Just calling the Nova os-services API to explicitly disable
the service (and providing a reason) just makes the hosts belonging to
the service not able to be elected (thanks to the ComputeFilter)

To be clear, I would love to see the logic inverted, ie. something which
would call the OAT service for a specific host would then fire a service
disable.

So I would suggest using the 3rd-party tools as enhancing way to supplement our TCP/trustedfilter feature. And the 3rd party tools can also call attestation API for host attestation.

I don't see much benefits of keeping such filter for the reasons I
mentioned below. Again, if you want to fence one host, you can just
disable its service, that's enough.

Thanks
Jimmy

-----Original Message-----
From: Bhandaru, Malini K [mailto:malini.k.bhandaru@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying implementations. Its default could even be "Not Implemented" and always return false.
And Nova.conf could specify use the OAT trust implementation. This would not break present day users of the functionality.

Don't get me wrong, I'm not against OAT, I'm just saying that the
TrustedFilter design is wrong. Even if another alternative would come up
to serve the TrustedComputePool model of things, it would still be bad
for the reasons I mentioned below, and wouldn't cover the usecase I quoted.

2) The issue in the original bug is a a VM waking up after a reboot on a host that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints requested by a VM during launch are confirmed to hold when it re-awakens, even if it is not
going through Nova scheduler at this point.

So I think we are in agreement that for covering that usecase, it can't
be done at the scheduler level.
Using TrustedFilter just ensures that at the instance creation time, the
host is checked but confuses people because they think it will be
enforced for the whole instance lifecyle.

  This holds even for aggregates that might be specified by geo, or even reservation such as "Coke" or "Pepsi".
  What if a host, even without a reboot and certainly before a reboot was assigned from Coke to Pepsi, there is cross contamination.
  Perhaps we need Nova hooks that can be registered with functions that check expected aggregate values.

I don't honestly see the point of an host aggregate. Given the failure
domain is an host, you only need to trust that host or not. The fact
that the host belongs to an aggregate or not is orthogonal to our
problem IMHO.

  Better still have  libvirt functionality that makes a call back for each VM on a host to ensure its constraints are satisfied on start-up/boot, and re-start when it comes out of pause.

Hum, doesn't it sound weird to have the host being the source of truth ?
Also, if an host gets compromised, why couldn't we assume that the
instances can be compromised too and need to be resurrected (ie.
evacuated) ?

  Using aggregate for trust with a cron job to check for trust is inefficient in this case, trust status gets updated only on a host reboot. Intel TXT is a boot
  time authentication.

Isn't that a specific implementation of OAT ? Couldn't we assume some
alternative implementations able to do live checks ? I mean, whatever on
how you trigger an host check (at boot time or periodically), you can
then fire an alarm which would set the necessary remediation actions :
fence the host and evacuate the instances

Regards
Malini

-----Original Message-----
From: Wang, Shane [mailto:shane.wang@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which is designed to solve the performance issue mentioned here.

Fair point, even if it can be a security flaw because we know that
caching can be having stale data.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How are they going to do upgrade in the future?

I didn't said remove the filter, but rather deprecating it. It would
basically mean that users would get a LOG warning for the next 2 cycles
saying "this filter will be removed in the next future, please consider
using other ways".

Also, removing a filter from in-tree doesn't prevent you to ship it in a
distro since out-of-tree filters can easily be added using a
configuration flag.

2. Dependency should not be a reason to deprecate a module in OpenStack, Nova is not a stand-alone module, and it depends on various technologies and libraries.

You made me wrong. The dependency issue is one of the reasons I have
serious concerns with that filter, but that's not the only one. From my
perspective, like I said, the main compelling reason is that the filter
makes a promise it can't sustain (make sure that Nova fences my
compromised hosts)

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to address the concerns mentioned in the thread. And also, OAT is an open source project which is being maintained as the long-term strategy.

Again, like I said, I have strong respect for OAT. My problem is not
with OAT but with TrustedFilter. I'm perfectly fine keeping OAT for
performing host checks.

For the situation that a host gets compromised, OAT checks trusted or untrusted from the start point of boot/reboot, it is hard for OAT to detect whether a host gets compromised when it is running, I don't know how to detect that without the filter?
Back to Michael's question, the process of the verification is done by software automatically when a host boots or reboots, will that be an overhead for the admin to have a separate job?

I think the real question is : "who will trigger the detection and
when?". Given my original thread, I said it can't be Nova because Nova
is not designed that way. Please consider the link I gave about HA to
see how I feel it can be done (using Pacemaker or not)

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is trivially implemented outside nova, and where it works much better if we don't do it. Couldn't an admin just have a cron job which verifies hosts, and then adds them to a compromised-hosts host aggregate if they're owned? I assume without testing it that you can migrate instances out of a host aggregate you can't boot in?
Like I said, I see 3 steps there :
- periodically check status of the hosts (can be done using a cron
job, Nagios or whatever else even Pacemaker or HAProxy) by calling OAT
=> that would replace TrustedFilter
- if one host is marked compromised, fence the host => my proposal was
to disable the service, but yours is good too (move the host to a toxic
host aggregate, provided we're using a filter for explicitely removing
hosts belonging to that filter to be elected)
- potentially resurrect the instances that were running on that host,
and here I proposed calling the evacuate API for rebuilding the instance

HTH,
-Sylvain

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/


____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Sylvain_Bauza (14,100 points)   1 3 5
0 votes

-----Original Message-----
From: Sylvain Bauza [mailto:sbauza@redhat.com]
Sent: Wednesday, June 24, 2015 9:39 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

(general point, could we please try not top-posting ? It makes a little harder
to follow the conversation)

Replies inline.

Le 24/06/2015 08:15, Wei, Gang a écrit :

Only if all the hosts managed by OpenStack are capable for measured boot
process, then let 3rd-party tool call nova fencing API might be better than
using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we
can schedule VM to only measured and trusted host, but in 3rd-party tool
case, only untrusted/compromised hosts will be fenced, the host with
unknown trustworthiness will still be able to run VM but the owner is not
willing to do it that way.
You don't need a specific filter for fencing one host from being scheduled.
Just calling the Nova os-services API to explicitly disable the service (and
providing a reason) just makes the hosts belonging to the service not able to
be elected (thanks to the ComputeFilter)

To be clear, I would love to see the logic inverted, ie. something which would
call the OAT service for a specific host would then fire a service disable.

So I would suggest using the 3rd-party tools as enhancing way to
supplement our TCP/trustedfilter feature. And the 3rd party tools can also
call attestation API for host attestation.

I don't see much benefits of keeping such filter for the reasons I mentioned
below. Again, if you want to fence one host, you can just disable its service,
that's enough.

This won't address the case in which you have heterogenic environment and you want only some important VMs to run on trusted hosts (and for the rest of the VMs you don't care).

Thanks
Jimmy

-----Original Message-----
From: Bhandaru, Malini K [mailto:malini.k.bhandaru@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying
implementations. Its default could even be "Not Implemented" and always
return false.
And Nova.conf could specify use the OAT trust implementation. This
would not break present day users of the functionality.

Don't get me wrong, I'm not against OAT, I'm just saying that the
TrustedFilter design is wrong. Even if another alternative would come up to
serve the TrustedComputePool model of things, it would still be bad for the
reasons I mentioned below, and wouldn't cover the usecase I quoted.

2) The issue in the original bug is a a VM waking up after a reboot on a host
that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints
requested by a VM during launch are confirmed to hold when it re-awakens,
even if it is not
going through Nova scheduler at this point.

So I think we are in agreement that for covering that usecase, it can't be
done at the scheduler level.
Using TrustedFilter just ensures that at the instance creation time, the host is
checked but confuses people because they think it will be enforced for the
whole instance lifecyle.

  This holds even for aggregates that might be specified by geo, or even

reservation such as "Coke" or "Pepsi".
What if a host, even without a reboot and certainly before a reboot was
assigned from Coke to Pepsi, there is cross contamination.
Perhaps we need Nova hooks that can be registered with functions that
check expected aggregate values.

I don't honestly see the point of an host aggregate. Given the failure domain
is an host, you only need to trust that host or not. The fact that the host
belongs to an aggregate or not is orthogonal to our problem IMHO.

  Better still have  libvirt functionality that makes a call back for each VM

on a host to ensure its constraints are satisfied on start-up/boot, and re-start
when it comes out of pause.

Hum, doesn't it sound weird to have the host being the source of truth ?
Also, if an host gets compromised, why couldn't we assume that the
instances can be compromised too and need to be resurrected (ie.
evacuated) ?

  Using aggregate for trust with a cron job to check for trust is inefficient

in this case, trust status gets updated only on a host reboot. Intel TXT is a
boot
time authentication.

Isn't that a specific implementation of OAT ? Couldn't we assume some
alternative implementations able to do live checks ? I mean, whatever on
how you trigger an host check (at boot time or periodically), you can
then fire an alarm which would set the necessary remediation actions :
fence the host and evacuate the instances

Regards
Malini

-----Original Message-----
From: Wang, Shane [mailto:shane.wang@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state,
which is designed to solve the performance issue mentioned here.

Fair point, even if it can be a security flaw because we know that
caching can be having stale data.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How

are they going to do upgrade in the future?

I didn't said remove the filter, but rather deprecating it. It would
basically mean that users would get a LOG warning for the next 2 cycles
saying "this filter will be removed in the next future, please consider
using other ways".

Also, removing a filter from in-tree doesn't prevent you to ship it in a
distro since out-of-tree filters can easily be added using a
configuration flag.

2. Dependency should not be a reason to deprecate a module in

OpenStack, Nova is not a stand-alone module, and it depends on various
technologies and libraries.

You made me wrong. The dependency issue is one of the reasons I have
serious concerns with that filter, but that's not the only one. From my
perspective, like I said, the main compelling reason is that the filter
makes a promise it can't sustain (make sure that Nova fences my
compromised hosts)

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to
address the concerns mentioned in the thread. And also, OAT is an open
source project which is being maintained as the long-term strategy.

Again, like I said, I have strong respect for OAT. My problem is not
with OAT but with TrustedFilter. I'm perfectly fine keeping OAT for
performing host checks.

For the situation that a host gets compromised, OAT checks trusted or
untrusted from the start point of boot/reboot, it is hard for OAT to detect
whether a host gets compromised when it is running, I don't know how to
detect that without the filter?
Back to Michael's question, the process of the verification is done by
software automatically when a host boots or reboots, will that be an
overhead for the admin to have a separate job?

I think the real question is : "who will trigger the detection and
when?". Given my original thread, I said it can't be Nova because Nova
is not designed that way. Please consider the link I gave about HA to
see how I feel it can be done (using Pacemaker or not)

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is trivially
implemented outside nova, and where it works much better if we don't do it.
Couldn't an admin just have a cron job which verifies hosts, and then adds
them to a compromised-hosts host aggregate if they're owned? I assume
without testing it that you can migrate instances out of a host aggregate
you can't boot in?
Like I said, I see 3 steps there :
- periodically check status of the hosts (can be done using a cron
job, Nagios or whatever else even Pacemaker or HAProxy) by calling OAT
=> that would replace TrustedFilter
- if one host is marked compromised, fence the host => my proposal was
to disable the service, but yours is good too (move the host to a toxic
host aggregate, provided we're using a filter for explicitely removing
hosts belonging to that filter to be elected)
- potentially resurrect the instances that were running on that host,
and here I proposed calling the evacuate API for rebuilding the instance

HTH,
-Sylvain

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com
wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not
compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating
the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/



____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Dulko,_Michal (4,760 points)   2 3 5
0 votes

Le 24/06/2015 10:35, Dulko, Michal a écrit :

-----Original Message-----
From: Sylvain Bauza [mailto:sbauza@redhat.com]
Sent: Wednesday, June 24, 2015 9:39 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

(general point, could we please try not top-posting ? It makes a little harder
to follow the conversation)

Replies inline.

Le 24/06/2015 08:15, Wei, Gang a écrit :

Only if all the hosts managed by OpenStack are capable for measured boot
process, then let 3rd-party tool call nova fencing API might be better than
using TrustedFilter.
But if not all the hosts support measured boot, then with TrustedFilter we
can schedule VM to only measured and trusted host, but in 3rd-party tool
case, only untrusted/compromised hosts will be fenced, the host with
unknown trustworthiness will still be able to run VM but the owner is not
willing to do it that way.
You don't need a specific filter for fencing one host from being scheduled.
Just calling the Nova os-services API to explicitly disable the service (and
providing a reason) just makes the hosts belonging to the service not able to
be elected (thanks to the ComputeFilter)

To be clear, I would love to see the logic inverted, ie. something which would
call the OAT service for a specific host would then fire a service disable.

So I would suggest using the 3rd-party tools as enhancing way to
supplement our TCP/trustedfilter feature. And the 3rd party tools can also
call attestation API for host attestation.

I don't see much benefits of keeping such filter for the reasons I mentioned
below. Again, if you want to fence one host, you can just disable its service,
that's enough.
This won't address the case in which you have heterogenic environment and you want only some important VMs to run on trusted hosts (and for the rest of the VMs you don't care).

In that case, you don't care about fencing the host, rather making sure
that your trusted instances have to move. All of that is still not
needed in Nova, you can just identify the instances with a specific
metadata said 'trusted' and ask to evacuate them.

If you want to prevent new 'trusted" instances to be booted on
compromised hosts, you perhaps have to write a filter for comparing the
instance metadata and the host metadata (which can be given by an
aggregate) but all of that doesn't require the TrustedFilter.

To be clear, maybe some gaps are missing for fulfilling your whole
story, but I'd rather identify what's missing within Nova for matching
instances and hosts and leave the tagging done by a 3rd party tool,
rather than trying to promote a filter which is very specific and
doesn't really work by its own (needs an external dependency)

-Sylvain

Thanks
Jimmy

-----Original Message-----
From: Bhandaru, Malini K [mailto:malini.k.bhandaru@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying
implementations. Its default could even be "Not Implemented" and always
return false.
And Nova.conf could specify use the OAT trust implementation. This
would not break present day users of the functionality.

Don't get me wrong, I'm not against OAT, I'm just saying that the
TrustedFilter design is wrong. Even if another alternative would come up to
serve the TrustedComputePool model of things, it would still be bad for the
reasons I mentioned below, and wouldn't cover the usecase I quoted.

2) The issue in the original bug is a a VM waking up after a reboot on a host
that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints
requested by a VM during launch are confirmed to hold when it re-awakens,
even if it is not
going through Nova scheduler at this point.
So I think we are in agreement that for covering that usecase, it can't be
done at the scheduler level.
Using TrustedFilter just ensures that at the instance creation time, the host is
checked but confuses people because they think it will be enforced for the
whole instance lifecyle.

   This holds even for aggregates that might be specified by geo, or even

reservation such as "Coke" or "Pepsi".
What if a host, even without a reboot and certainly before a reboot was
assigned from Coke to Pepsi, there is cross contamination.
Perhaps we need Nova hooks that can be registered with functions that
check expected aggregate values.

I don't honestly see the point of an host aggregate. Given the failure domain
is an host, you only need to trust that host or not. The fact that the host
belongs to an aggregate or not is orthogonal to our problem IMHO.

   Better still have  libvirt functionality that makes a call back for each VM

on a host to ensure its constraints are satisfied on start-up/boot, and re-start
when it comes out of pause.

Hum, doesn't it sound weird to have the host being the source of truth ?
Also, if an host gets compromised, why couldn't we assume that the
instances can be compromised too and need to be resurrected (ie.
evacuated) ?

   Using aggregate for trust with a cron job to check for trust is inefficient

in this case, trust status gets updated only on a host reboot. Intel TXT is a
boot
time authentication.
Isn't that a specific implementation of OAT ? Couldn't we assume some
alternative implementations able to do live checks ? I mean, whatever on
how you trigger an host check (at boot time or periodically), you can
then fire an alarm which would set the necessary remediation actions :
fence the host and evacuate the instances

Regards
Malini

-----Original Message-----
From: Wang, Shane [mailto:shane.wang@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)
AFAIK, TrustedFilter is using a sort of cache to cache the trusted state,
which is designed to solve the performance issue mentioned here.

Fair point, even if it can be a security flaw because we know that
caching can be having stale data.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How

are they going to do upgrade in the future?

I didn't said remove the filter, but rather deprecating it. It would
basically mean that users would get a LOG warning for the next 2 cycles
saying "this filter will be removed in the next future, please consider
using other ways".

Also, removing a filter from in-tree doesn't prevent you to ship it in a
distro since out-of-tree filters can easily be added using a
configuration flag.

2. Dependency should not be a reason to deprecate a module in

OpenStack, Nova is not a stand-alone module, and it depends on various
technologies and libraries.

You made me wrong. The dependency issue is one of the reasons I have
serious concerns with that filter, but that's not the only one. From my
perspective, like I said, the main compelling reason is that the filter
makes a promise it can't sustain (make sure that Nova fences my
compromised hosts)

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to
address the concerns mentioned in the thread. And also, OAT is an open
source project which is being maintained as the long-term strategy.

Again, like I said, I have strong respect for OAT. My problem is not
with OAT but with TrustedFilter. I'm perfectly fine keeping OAT for
performing host checks.

For the situation that a host gets compromised, OAT checks trusted or
untrusted from the start point of boot/reboot, it is hard for OAT to detect
whether a host gets compromised when it is running, I don't know how to
detect that without the filter?
Back to Michael's question, the process of the verification is done by
software automatically when a host boots or reboots, will that be an
overhead for the admin to have a separate job?

I think the real question is : "who will trigger the detection and
when?". Given my original thread, I said it can't be Nova because Nova
is not designed that way. Please consider the link I gave about HA to
see how I feel it can be done (using Pacemaker or not)

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)
I agree. I feel like this is another example of functionality which is trivially
implemented outside nova, and where it works much better if we don't do it.
Couldn't an admin just have a cron job which verifies hosts, and then adds
them to a compromised-hosts host aggregate if they're owned? I assume
without testing it that you can migrate instances out of a host aggregate
you can't boot in?
Like I said, I see 3 steps there :
- periodically check status of the hosts (can be done using a cron
job, Nagios or whatever else even Pacemaker or HAProxy) by calling OAT
=> that would replace TrustedFilter
- if one host is marked compromised, fence the host => my proposal was
to disable the service, but yours is good too (move the host to a toxic
host aggregate, provided we're using a filter for explicitely removing
hosts belonging to that filter to be elected)
- potentially resurrect the instances that were running on that host,
and here I proposed calling the evacuate API for rebuilding the instance

HTH,
-Sylvain

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com
wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not
compromised) or not.
To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating
the compromised instances.
Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/



____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Sylvain_Bauza (14,100 points)   1 3 5
0 votes

On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza sbauza@redhat.com wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that specific
filter, why I dislike it and how I think we could improve the situation -
and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly all
the evacuate/live-migrate cases, but let's not discuss about that now).
When the request goes in the scheduler, all the hosts are checked against
all the enabled filters and the TrustedFilter is making an external HTTP(S)
call to the Attestation API service (not handled by Nova) for each host
to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing. I
can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the HTTP
requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for Nova

(since it's an in-tree filter) while it's not listed as a dependency and
thus not gated.

All of these reasons could be acceptable if that would cover the exposed
usecase given in [1] (ie. I want to make sure that if my host gets
compromised, my instances will not be running on that host) but that just
doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to prevent
its election as a valid destination host. There is no need for a
specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova responsibility
since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool responsible for
periodically checking the state of the hosts, and if compromised then call
the Nova API for fencing the host and evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly mention
to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Given people are using this, it is a negligible maintenance burden. I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with this
filter (live migration, possible performance issues etc.)

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 24, 2015 by Joe_Gordon (24,620 points)   2 5 8
0 votes

Le 24/06/2015 19:56, Joe Gordon a écrit :

On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza <sbauza@redhat.com
sbauza@redhat.com> wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly
open related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that
specific filter, why I dislike it and how I think we could improve
the situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host
is compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not
exactly all the evacuate/live-migrate cases, but let's not discuss
about that now). When the request goes in the scheduler, all the
hosts are checked against all the enabled filters and the
TrustedFilter is making an external HTTP(S) call to the
Attestation API service (not handled by Nova) for *each host* to
see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which
explicitly does an external call to a separate service that Nova
is not managing. I can see at least 3 reasons for thinking about
why it's bad :

#1 : that's a terrible bottleneck for performance, because we're
IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)
#2 : all the filters are checking an internal Nova state for the
host (called HostState) but that the TrustedFilter, which means
that conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency
for Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.


All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my
host gets compromised, my instances will not be running on that
host) but that just doesn't work, due to the situation I mentioned
above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need
for a specialised filter.
b/ if a host is compromised, we can assume that the instances have
to resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova
responsibility since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as
something analog as the HA usecase [3] where we need a 3rd-party
tool responsible for periodically checking the state of the hosts,
and if compromised then call the Nova API for fencing the host and
evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Given people are using this, it is a negligible maintenance burden. I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with
this filter (live migration, possible performance issues etc.)

Well, I can understand that customers could not be agreeing to remove
the filter because there is no clear alternative for them. That said, I
think saying that the filter is deprecated without saying when it would
be removed would help some contributors thinking about that and working
on a better solution, exactly like we did for EC2 API.

To be clear, I want to freeze the filter by deprecating it and
explaining why it's wrong (by amending the devref section and giving a
LOG warning saying it's deprecated) and then leave the filter within
in-tree unless we are sure that there is a good solution out of Nova.

-Sylvain

Thoughts ?
-Sylvain



[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 25, 2015 by Sylvain_Bauza (14,100 points)   1 3 5
0 votes

On 24 June 2015 at 09:35, Dulko, Michal michal.dulko@intel.com wrote:

-----Original Message-----
From: Sylvain Bauza [mailto:sbauza@redhat.com]
Sent: Wednesday, June 24, 2015 9:39 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

(general point, could we please try not top-posting ? It makes a little harder
to follow the conversation)

Replies inline.

Le 24/06/2015 08:15, Wei, Gang a écrit :

Only if all the hosts managed by OpenStack are capable for measured boot
process, then let 3rd-party tool call nova fencing API might be better than
using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we
can schedule VM to only measured and trusted host, but in 3rd-party tool
case, only untrusted/compromised hosts will be fenced, the host with
unknown trustworthiness will still be able to run VM but the owner is not
willing to do it that way.
You don't need a specific filter for fencing one host from being scheduled.
Just calling the Nova os-services API to explicitly disable the service (and
providing a reason) just makes the hosts belonging to the service not able to
be elected (thanks to the ComputeFilter)

To be clear, I would love to see the logic inverted, ie. something which would
call the OAT service for a specific host would then fire a service disable.

So I would suggest using the 3rd-party tools as enhancing way to
supplement our TCP/trustedfilter feature. And the 3rd party tools can also
call attestation API for host attestation.

I don't see much benefits of keeping such filter for the reasons I mentioned
below. Again, if you want to fence one host, you can just disable its service,
that's enough.

This won't address the case in which you have heterogenic environment and you want only some important VMs to run on trusted hosts (and for the rest of the VMs you don't care).

This is an interesting one to dig into.

I had assumed in this case you put all the VMs that want the
attestation check in a subset of nodes that are setup to use that set.
You can do that using host aggregates and our existing filters.

An external system could then just disable hosts within that subset of
hosts that have the attestation check working.

Does that work for your use case?

Thanks,
John

-----Original Message-----
From: Bhandaru, Malini K [mailto:malini.k.bhandaru@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying
implementations. Its default could even be "Not Implemented" and always
return false.
And Nova.conf could specify use the OAT trust implementation. This
would not break present day users of the functionality.

Don't get me wrong, I'm not against OAT, I'm just saying that the
TrustedFilter design is wrong. Even if another alternative would come up to
serve the TrustedComputePool model of things, it would still be bad for the
reasons I mentioned below, and wouldn't cover the usecase I quoted.

2) The issue in the original bug is a a VM waking up after a reboot on a host
that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints
requested by a VM during launch are confirmed to hold when it re-awakens,
even if it is not
going through Nova scheduler at this point.

So I think we are in agreement that for covering that usecase, it can't be
done at the scheduler level.
Using TrustedFilter just ensures that at the instance creation time, the host is
checked but confuses people because they think it will be enforced for the
whole instance lifecyle.

  This holds even for aggregates that might be specified by geo, or even

reservation such as "Coke" or "Pepsi".
What if a host, even without a reboot and certainly before a reboot was
assigned from Coke to Pepsi, there is cross contamination.
Perhaps we need Nova hooks that can be registered with functions that
check expected aggregate values.

I don't honestly see the point of an host aggregate. Given the failure domain
is an host, you only need to trust that host or not. The fact that the host
belongs to an aggregate or not is orthogonal to our problem IMHO.

  Better still have  libvirt functionality that makes a call back for each VM

on a host to ensure its constraints are satisfied on start-up/boot, and re-start
when it comes out of pause.

Hum, doesn't it sound weird to have the host being the source of truth ?
Also, if an host gets compromised, why couldn't we assume that the
instances can be compromised too and need to be resurrected (ie.
evacuated) ?

  Using aggregate for trust with a cron job to check for trust is inefficient

in this case, trust status gets updated only on a host reboot. Intel TXT is a
boot
time authentication.

Isn't that a specific implementation of OAT ? Couldn't we assume some
alternative implementations able to do live checks ? I mean, whatever on
how you trigger an host check (at boot time or periodically), you can
then fire an alarm which would set the necessary remediation actions :
fence the host and evacuate the instances

Regards
Malini

-----Original Message-----
From: Wang, Shane [mailto:shane.wang@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state,
which is designed to solve the performance issue mentioned here.

Fair point, even if it can be a security flaw because we know that
caching can be having stale data.

My thoughts for deprecating it are:

1. We already have customers here in China who are using that filter. How

are they going to do upgrade in the future?

I didn't said remove the filter, but rather deprecating it. It would
basically mean that users would get a LOG warning for the next 2 cycles
saying "this filter will be removed in the next future, please consider
using other ways".

Also, removing a filter from in-tree doesn't prevent you to ship it in a
distro since out-of-tree filters can easily be added using a
configuration flag.

2. Dependency should not be a reason to deprecate a module in

OpenStack, Nova is not a stand-alone module, and it depends on various
technologies and libraries.

You made me wrong. The dependency issue is one of the reasons I have
serious concerns with that filter, but that's not the only one. From my
perspective, like I said, the main compelling reason is that the filter
makes a promise it can't sustain (make sure that Nova fences my
compromised hosts)

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to
address the concerns mentioned in the thread. And also, OAT is an open
source project which is being maintained as the long-term strategy.

Again, like I said, I have strong respect for OAT. My problem is not
with OAT but with TrustedFilter. I'm perfectly fine keeping OAT for
performing host checks.

For the situation that a host gets compromised, OAT checks trusted or
untrusted from the start point of boot/reboot, it is hard for OAT to detect
whether a host gets compromised when it is running, I don't know how to
detect that without the filter?
Back to Michael's question, the process of the verification is done by
software automatically when a host boots or reboots, will that be an
overhead for the admin to have a separate job?

I think the real question is : "who will trigger the detection and
when?". Given my original thread, I said it can't be Nova because Nova
is not designed that way. Please consider the link I gave about HA to
see how I feel it can be done (using Pacemaker or not)

Thanks.
--
Shane

-----Original Message-----
From: Michael Still [mailto:mikal@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is trivially
implemented outside nova, and where it works much better if we don't do it.
Couldn't an admin just have a cron job which verifies hosts, and then adds
them to a compromised-hosts host aggregate if they're owned? I assume
without testing it that you can migrate instances out of a host aggregate
you can't boot in?
Like I said, I see 3 steps there :
- periodically check status of the hosts (can be done using a cron
job, Nagios or whatever else even Pacemaker or HAProxy) by calling OAT
=> that would replace TrustedFilter
- if one host is marked compromised, fence the host => my proposal was
to disable the service, but yours is good too (move the host to a toxic
host aggregate, provided we're using a filter for explicitely removing
hosts belonging to that filter to be elected)
- potentially resurrect the instances that were running on that host,
and here I proposed calling the evacuate API for rebuilding the instance

HTH,
-Sylvain

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sbauza@redhat.com
wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly open
related to TrustedFilter [1] I want to take the opportunity for
raising my concerns about that specific filter, why I dislike it and
how I think we could improve the situation - and clarify everyone's
thoughts)

The current situation is that way : Nova only checks if one host is
compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not exactly
all the evacuate/live-migrate cases, but let's not discuss about that
now). When the request goes in the scheduler, all the hosts are
checked against all the enabled filters and the TrustedFilter is
making an external HTTP(S) call to the Attestation API service (not
handled by Nova) for each host to see if the host is valid (not
compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly
does an external call to a separate service that Nova is not managing.
I can see at least 3 reasons for thinking about why it's bad :

1 : that's a terrible bottleneck for performance, because we're

IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)

2 : all the filters are checking an internal Nova state for the host

(called HostState) but that the TrustedFilter, which means that
conceptually we defer the decision to a 3rd-party engine

3 : that Attestation API services becomes a de facto dependency for

Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my host
gets compromised, my instances will not be running on that host) but
that just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need for
a specialised filter.
b/ if a host is compromised, we can assume that the instances have to
resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an
host is compromised or not is not a Nova responsibility since it's
already perfectly done by [2]

In other words, I'm considering that "security" usecase as something
analog as the HA usecase [3] where we need a 3rd-party tool
responsible for periodically checking the state of the hosts, and if
compromised then call the Nova API for fencing the host and evacuating
the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592

Thoughts ?
-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
l/



____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Rackspace Australia



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-
request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
responded Jun 25, 2015 by John_Garbutt (15,460 points)   3 4 5
...