RDBMS or relational database management systems use queries to get data from the database. Queries can be built on more than one table using equi or no equi joins. The queries can also specify where the conditions will be retrieved, where the data will be retrieved based on some conditions, such as Select student_name from students where place_of_residence = “Chennai”. The join conditions are specified as Select a.customer_name, b.order_id from customer a, order b where a.customer_id = b.customer_id.

In the above case, a join is being made between two tables, customer and order. Before a query is executed, the SQL engine calculates an execution plan. The query execution plan is a sequence of steps determined by the query optimizer. These steps would generate a sequence of operations that would result in the lowest cost of executing the query. In complex situations, when queries are built across many tables specifying many query conditions, one or two missing join conditions can cause long query execution times if the database has large volumes of data.

A join of two tables without a join condition is just a cross product of two sets. If table A has 10,000 rows and table B has 5,000 rows, a cross product of two tables will result in 5,00,00,000 records. Whereas if a join condition is entered, the result set will contain 10,000 rows or 5,000 rows in the resulting search space. If a query, for example, contains 10 tables and cross products of two or three tables taken at a time, and if the query expression contains 20 – 30 joins, then 3 or 4 join conditions may be missed due to neglect or error. This may be the case for SQL queries that are executed at large production or manufacturing sites that perform high-volume data processing. The size of each table is very large and so is the number of tables.

A designer can write a query to record an upgrade process at a production house in which actual versus projected sales data is loaded into the database. Due to the complexity of the database, a single query can use 20 tables containing more than 100,000 records per table on average. The query can use many joins, and if by accident some join conditions are missed, the query will search a space of 10 records of power 20 instead of just 10 records of power 5. This would lead to a runtime long query, sometimes a single program containing such a failed query can run for as long as 20 hours without completing updates. But by identifying and adding missing join conditions, like table3.column3 = table4.column7 or table7.column2 = table1.column9, etc., in the query, you can make the query run in an acceptable time.

Leave a Reply

Your email address will not be published. Required fields are marked *