ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &

Question

Question

ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &

ATTEMPT ALL QUESTIONS

1. Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. 5 marks
2. Describe quorum-based protocol for distributed concurrency control. 5 marks
Discuss how distributed systems are used in organizations 10 marks
Discuss the techniques used to facilitate distributed query processing and Optimization. 10 marks

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

a.

Answer :

Optimistic Concurrency is a cure for wounds and a pat on user’s back. It means that we allow concurrency conflicts happen. But we also (want to) believe that it will not happen :). And if it happens anyway, we react on it in some manner. It’s supported in Entity Framework – you have got concurrency exceptions to handle, you can add a column of rowversion type (or timestamp in older SQL Servers) to database table and so on… It’s probably a good moment to stop and come back to the subject in separate post!

Pessimistic Concurrency is a ‘seatbelt in your car’ approach – we assume that concurrency conflicts will happen and we believe they will happen often. It locks database’s record for update access and other users can only access the record as read-only or have to wait for a record to be ‘unlocked’. Programming an app with pessimistic concurrency approach can be more complicated and complex in managing because of deadlocks’ risk.

b .

Answer:

In case of replicated databases, a quorum-based replica control protocol can be used to ensure that no two copies of a data item are read or written by two transactions concurrently. ... Each operation then has to obtain a read quorum (V_r) or a write quorum (V_w) to read or write a data item, respectively.

This is one of the distributed lock manager based concurrency control protocol in distributed database systems. It works as follows;

1. The protocol assigns each site that have a replica with a weight.

2. For any data item, the protocol assigns a read quorum Q_r and write quorum Q_w. Here, Q_r and Q_w are two integers (sum of weights of some sites). And, these two integers are chosen according to the following conditions put together;

Q_r + Q_w > S – this rule avoids read-write conflict. (i.e, two transactions cannot read and write concurrently)

2 * Q_w > S – this rule avoids write-write conflict. (i.e, two transactions cannot write concurrently)

Here, S is the total weight of all sites in which the data item replicated.

How do we perform read and write on replicas?

A transaction that needs a data item for reading purpose has to lock enough sites. That is, it has lock sites with the sum of their weight >= Q_r. Read quorum must always intersect with write quorum.

A transaction that needs a data item for writing purpose has to lock enough sites. That is, it has lock sites with the sum of their weight >= Q_w.

Example:

(How does Quorum Consensus Protocol work?)

Let us assume a fully replicated distributed database with four sites S1, S2, S3, and S4.

1. According to the protocol, we need to assign a weight to every site. (This weight can be chosen on many factors like the availability of the site, latency etc.). For simplicity, let us assume the weight as 1 for all sites.

2. Let us choose the values for Q_r and Q_w as 2 and 3. Our total weight S is 4. And according to the conditions, our Q_r and Q_w values are correct;

Q_r + Q_w > S => 2 + 3 > 4 True

2 * Q_w > S => 2 * 3 > 4 True

3. Now, a transaction which needs a read lock on a data item has to lock 2 sites. A transaction which needs a write lock on data item has to lock 3 sites.

c.

Answer :

Definition

A distributed system, also known as distributed computing, is a system with multiple components located on different machines that communicate and coordinate actions in order to appear as a single coherent system to the end-user.

Overview

The machines that are a part of a distributed system may be computers, physical servers, virtual machines, containers, or any other node that can connect to the network, have local memory, and communicate by passing messages.

There are two general ways that distributed systems function:

Each machine works toward a common goal and the end-user views results as one cohesive unit.
Each machine has its own end-user and the distributed system facilitates sharing resources or communication services.

Although distributed systems can sometimes be obscure, they usually have three primary characteristics: all components run concurrently, there is no global clock, and all components fail independently of each other.

Benefits and challenges of distributed systems

There are three reasons that teams generally decide to implement distributed systems:

Horizontal Scalability—Since computing happens independently on each node, it is easy and generally inexpensive to add additional nodes and functionality as necessary.
Reliability—Most distributed systems are fault-tolerant as they can be made up of hundreds of nodes that work together. The system generally doesn’t experience any disruptions if a single machine fails.
Performance—Distributed systems are extremely efficient because work loads can be broken up and sent to multiple machines.

However, distributed systems are not without challenges. Complex architectural design, construction, and debugging processes that are required to create an effective distributed system can be overwhelming.

Three more challenges you may encounter include:

Scheduling—A distributed system has to decide which jobs need to run, when they should run, and where they should run. Schedulers ultimately have limitations, leading to underutilized hardware and unpredictable runtimes.
Latency—The more widely your system is distributed, the more latency you can experience with communications. This often leads to teams making tradeoffs between availability, consistency, and latency.
Observability—Gathering, processing, presenting, and monitoring hardware usage metrics for large clusters is a significant challenge.

How a Distributed System Works

Hardware and software architectures are used to maintain a distributed system. Everything must be interconnected—CPUs via the network and processes via the communication system.

Types of distributed systems

Distributed systems generally fall into one of four different basic architecture models:

Client-server—Clients contact the server for data, then format it and display it to the end-user. The end-user can also make a change from the client-side and commit it back to the server to make it permanent.
Three-tier—Information about the client is stored in a middle tier rather than on the client to simplify application deployment. This architecture model is most common for web applications.
n-tier—Generally used when an application or server needs to forward requests to additional enterprise services on the network.
Peer-to-peer—There are no additional machines used to provide services or manage resources. Responsibilities are uniformly distributed among machines in the system, known as peers, which can serve as either client or server.

Example of a Distributed System

Distributed systems have endless use cases, a few being electronic banking systems, massive multiplayer online games, and sensor networks.

StackPath utilizes a particularly large distributed system to power its content delivery network service. Every one of our points of presence (PoPs) has nodes that form a worldwide distributed system. And to provide top notch content delivery, StackPath stores the most recently and frequently requested content in edge locations closest to the location it is being used.

Distributed systems at the edge

With StackPath’s edge compute services, virtual machines, and containers, users can create their own distributed systems. By interconnecting VMs and containers, while also leveraging the speed and agility that comes with edge computing, your system can handle thousands of simultaneous requests at lightning-fast speed.

The key features of a distributed system are:

Components in the system are concurrent. A distributed system allows resource sharing, including software by systems connected to the network at the same time.
There can be multiple components, but they will generally be autonomous in nature.
A global clock is not required in a distributed system. The systems can be spread across different geographies.
Compared to other network models, there is greater fault tolerance in a distributed model.
Price/performance ratio is much better.

T

Challenges for distributed systems include:

Security is a big challenge in a distributed environment, especially when using public networks.
Fault tolerance could be tough when the distributed model is built based on unreliable components.
Coordination and resource sharing can be difficult if proper protocols or policies are not in place.
Process knowledge should be put in place for the administrators and users of the distributed model.

Advantages of a Distributed System

There are a number of potential advantages to using a distributed system. One of the easiest to understand is redundancy and resiliency. If a company is serving its website from a distributed set of servers, rather than a single server, it may be able to stay up even if one server physically fails. If data is distributed between multiple servers or disks, a common occurrence in modern distributed systems, there may not be any data loss even if a storage device ceases to work.

Speed and Content Distribution

Distributed systems can also be faster than single-computer systems. One of the advantages of a distributed database is that queries can be routed to a server with a particular user's information, rather than all requests having to go to a single machine that can be overloaded.

Requests can also be routed to servers physically close or on a speedy network connection to whoever wants the data, which can mean less time and other resources allocated to dealing with network traffic and bottlenecks. That's a common occurrence in content distribution networks used for online media.

Scaling and Parallelism

Once distributed systems are set up to distribute data among the servers involved, they can also be easily scalable. If they're well designed, it can be as simple as adding some new hardware and telling the network to add it to the distributed system.

Distributed systems can also be designed for parallelism. This is common in mathematical operations for things like weather modeling and scientific computing, where multiple powerful processors can divide up independent parts of complex simulations and get the answer faster than they would running them in series.

Distributed Computing Challenges

One big challenge with distributed computing is that it can be hard for programmers to reason about. There can be challenges in how to distribute data to ensure that resiliency requirements are met under various unexpected conditions.

If devices need to synchronize, there can be difficult-to-spot bugs that cause them to wait on each other to transmit data or accidentally try to read or write the same piece of data at the same time, causing errors.

Security and privacy can also become an issue with distributed systems, since people's data is stored across multiple computers, sometimes in multiple physical locations. Distributed systems can also be overkill for some tasks, using more physical resources and engineering time than is necessary.

d.

Answer :

Distributed Query Processing Architecture

In a distributed database system, processing a query comprises of optimization at both the global and the local level. The query enters the database system at the client or controlling site. Here, the user is validated, the query is checked, translated, and optimized at a global level.

The process of mapping global queries to local ones can be realized as follows −

The tables required in a global query have fragments distributed across multiple sites. The local databases have information only about local data. The controlling site uses the global data dictionary to gather information about the distribution and reconstructs the global view from the fragments.
If there is no replication, the global optimizer runs local queries at the sites where the fragments are stored. If there is replication, the global optimizer selects the site based upon communication cost, workload, and server speed.
The global optimizer generates a distributed execution plan so that least amount of data transfer occurs across the sites. The plan states the location of the fragments, order in which query steps needs to be executed and the processes involved in transferring intermediate results.
The local queries are optimized by the local database servers. Finally, the local query results are merged together through union operation in case of horizontal fragments and join operation for vertical fragments.

For example, let us consider that the following Project schema is horizontally fragmented according to City, the cities being New Delhi, Kolkata and Hyderabad.

PROJECT

PId

City

Department

Status

Suppose there is a query to retrieve details of all projects whose status is “Ongoing”.

The global query will be &inus;

$$\sigma_{status} = {\small "ongoing"}^{(PROJECT)}$$

Query in New Delhi’s server will be −

$$\sigma_{status} = {\small "ongoing"}^{({NewD}_-{PROJECT})}$$

Query in Kolkata’s server will be −

$$\sigma_{status} = {\small "ongoing"}^{({Kol}_-{PROJECT})}$$

Query in Hyderabad’s server will be −

$$\sigma_{status} = {\small "ongoing"}^{({Hyd}_-{PROJECT})}$$

In order to get the overall result, we need to union the results of the three queries as follows −

$\sigma_{status} = {\small "ongoing"}^{({NewD}_-{PROJECT})} \cup \sigma_{status} = {\small "ongoing"}^{({kol}_-{PROJECT})} \cup \sigma_{status} = {\small "ongoing"}^{({Hyd}_-{PROJECT})}$

Distributed Query Optimization

Distributed query optimization requires evaluation of a large number of query trees each of which produce the required results of a query. This is primarily due to the presence of large amount of replicated and fragmented data. Hence, the target is to find an optimal solution instead of the best solution.

The main issues for distributed query optimization are −

Optimal utilization of resources in the distributed system.
Query trading.
Reduction of solution space of the query.

Optimal Utilization of Resources in the Distributed System

A distributed system has a number of database servers in the various sites to perform the operations pertaining to a query. Following are the approaches for optimal resource utilization −

Operation Shipping − In operation shipping, the operation is run at the site where the data is stored and not at the client site. The results are then transferred to the client site. This is appropriate for operations where the operands are available at the same site. Example: Select and Project operations.

Data Shipping − In data shipping, the data fragments are transferred to the database server, where the operations are executed. This is used in operations where the operands are distributed at different sites. This is also appropriate in systems where the communication costs are low, and local processors are much slower than the client server.

Hybrid Shipping − This is a combination of data and operation shipping. Here, data fragments are transferred to the high-speed processors, where the operation runs. The results are then sent to the client site.

Query Trading

In query trading algorithm for distributed database systems, the controlling/client site for a distributed query is called the buyer and the sites where the local queries execute are called sellers. The buyer formulates a number of alternatives for choosing sellers and for reconstructing the global results. The target of the buyer is to achieve the optimal cost.

The algorithm starts with the buyer assigning sub-queries to the seller sites. The optimal plan is created from local optimized query plans proposed by the sellers combined with the communication cost for reconstructing the final result. Once the global optimal plan is formulated, the query is executed.

Reduction of Solution Space of the Query

Optimal solution generally involves reduction of solution space so that the cost of query and data transfer is reduced. This can be achieved through a set of heuristic rules, just as heuristics in centralized systems.

Following are some of the rules −

Perform selection and projection operations as early as possible. This reduces the data flow over communication network.
Simplify operations on horizontal fragments by eliminating selection conditions which are not relevant to a particular site.
In case of join and union operations comprising of fragments located in multiple sites, transfer fragmented data to the site where most of the data is present and perform operation there.
Use semi-join operation to qualify tuples that are to be joined. This reduces the amount of data transfer which in turn reduces communication cost.
Merge the common leaves and sub-trees in a distributed query tree.

Add a comment

Answer 2

ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &

Homework Answers

Add Answer to:
ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &

Post as a guest

Earn Coins

1. Compare and contrast first generation sequencing techniques (Sanger sequencing and pyrosequencing) and the second generation...

Name Wannarat Natilerdsak Student ID6101139 IAC 326 Accounting Information System Homework Chapter 4 Relational Databases Explain...

Question 1: (5 Marks) Compare the similarities and differences between traditional computing and the computing clouds...

Question 1: (5 Marks) Compare the similarities and differences between traditional computing and the computing clouds la...

Review Questions 1. Compare and contrast two of CMS's value-based purchasing programs. 2. Caitlin's physician suspe...

INTERNATIONAL BUSINESS course short essay questions from Chapter 1-4: 1. Compare and contrast World Bank and...

36 References Review Questions 1. Compare and contrast the use of Six Sigma, Lean, and HRO...

Please answer the following questions: 1) Describe the main functions of a computer network (12 marks)...

Review Questions 1. What three transaction cycles exist in all businesses? 2. Name the major subsystems...

INSTRUCTIONS: ATTEMPT ANY FOUR (4) QUESTIONS a. Question 1 A mutual fund is a professionally managed...

ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &

Homework Answers

Add Answer to: ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &

Post as a guest

Earn Coins

Add Answer to:
ATTEMPT ALL QUESTIONS Compare and contrast Pessimistic and Optimistic Concurrency Control techniques. &