02-05-2018, 02:07 AM

Apologies in advance; this really doesn't have much to do with HP calculators,

except that I'd love to develop an approximation model that would run on one..

I'm hoping for some expertise from someone smarter than me..

I'm trying to model a system that involves about 2400 hosts in a sharded

database of sorts. Users enter queries, they get distributed to the 2400

systems simultaneously and when all have finished, the user gets the results.

The overall system has an SLA (Service Level Agreement) that the queries

finish in a certain length of time, 90% of the time.

Obviously, the weakest link in the 2400 systems determines the length of time

that a search completes in.

A naive approach that assumes all queries on the backend are independent

would mean we are looking for the probability that "at least one of" the

backend systems is over SLA. To find that of course, we'd use the complement

of probabilities; the intersection set of probabilities where all queries on

the back end are within SLA. If they were all equal and independent, it might

look something like:

Probability of Search being within SLA == 0.99995^2400

Unfortunately, the queries on the backend aren't totally independent.

Sometimes they are, such as a query hitting a system that is busy doing

something else. However, sometimes the query performance rests with the

complexity, number of terms etc, and will impact a number of backend systems.

Nevertheless, there is a long tail of systems that are always within SLA.

I'm wondering if there is a relatively simple way to model the behaviour

such that we can "play with the numbers" and get approximate results.

Eg. What percentage of the systems would we have to "fix" and by how

much to maintain the user experience etc.

As a sample, the worst systems are within SLA 95% of the time, while the

best are within 100%. The user experience however is closer to 90% which

is natural of course, as the probabilities would accumulate and be worse than

the worst one..

Thoughts? Suggestions? Pointers? I'm stuck..

Thanks!

except that I'd love to develop an approximation model that would run on one..

I'm hoping for some expertise from someone smarter than me..

I'm trying to model a system that involves about 2400 hosts in a sharded

database of sorts. Users enter queries, they get distributed to the 2400

systems simultaneously and when all have finished, the user gets the results.

The overall system has an SLA (Service Level Agreement) that the queries

finish in a certain length of time, 90% of the time.

Obviously, the weakest link in the 2400 systems determines the length of time

that a search completes in.

A naive approach that assumes all queries on the backend are independent

would mean we are looking for the probability that "at least one of" the

backend systems is over SLA. To find that of course, we'd use the complement

of probabilities; the intersection set of probabilities where all queries on

the back end are within SLA. If they were all equal and independent, it might

look something like:

Probability of Search being within SLA == 0.99995^2400

Unfortunately, the queries on the backend aren't totally independent.

Sometimes they are, such as a query hitting a system that is busy doing

something else. However, sometimes the query performance rests with the

complexity, number of terms etc, and will impact a number of backend systems.

Nevertheless, there is a long tail of systems that are always within SLA.

I'm wondering if there is a relatively simple way to model the behaviour

such that we can "play with the numbers" and get approximate results.

Eg. What percentage of the systems would we have to "fix" and by how

much to maintain the user experience etc.

As a sample, the worst systems are within SLA 95% of the time, while the

best are within 100%. The user experience however is closer to 90% which

is natural of course, as the probabilities would accumulate and be worse than

the worst one..

Thoughts? Suggestions? Pointers? I'm stuck..

Thanks!