※This sample exam is different from those that appear in the actual OSS-DB Exam.
In general, an index is created to improve the performance of a search. Hard disk access greatly affects the performance of the RDBMS. Simply put, the greater the amount of disk access required the greater the time it will take for a query to run. When no index exists, data in a table must be read from beginning to end. Since data files are often very large, this is not an ideal method of access. An index, may require only a portion of the index file and a portion of the corresponding data file to be read. This results in better performance as the less disk access is required.
A simple example:
CREATE TABLE table1 (id INTEGER, val VARCHAR (50));
Performing the following query on the table defined above without and index would result in all the data files representing the table to be read.
(1) SELECT * FROM table1 WHERE id = 1;
(2) SELECT * FROM table1 WHERE val = 'abc';
However, if we were to create an index on the id column using the below command:
CREATE INDEX ON table1 (id);
Execution of the SELECT statement in (1), would result in the index being utilized and result in a better performing query. However, the SELECT statement in (2), would not utilize the index as the column specified in the where clause uses a different column to what the index was created on. This means that answer D is not correct.
If we were to create and index on column ‘val’
CREATE INDEX ON table 1 (val);
Then a second index would be created (on the ‘val’ column of table1).
In this case both queries (1) and (2) would utilize indexes, and it is possible to predict which index is used by which statement due to the column specified in the WHERE clause. Therefore answer C is incorrect.
The effectiveness of the index depends on the distribution of the data. In the above example, assuming that there are 10000 rows of data in table 1, when 5000 rows are distributed with an id equal to 1 and the remaining 5000 rows have an id equal to 2; A SELECT statement like that in (1) requires half of the data file to be read. In this case it is faster to read the data from the file sequentially rather than randomly because random access is far slower. Therefore the existence of indexes don’t guarantee an improvement of performance for all situations. Therefore answer A is incorrect.
Next, consider the situation of updating data. If you have an index, you must also update the index when updating the table, so updates require slightly more time, therefore you should not create indexes that are not required.
In the example of table 1 above,
Let's look at the following UPDATE statement.
UPDATE table1 SET val = 'xyz' WHERE id = 1;
In this case, we look for data with an id equal to 1. If there is no index on the id column, all data files will require reading. But if there is an index on the id column, it is often sufficient to read only the index file and part of the data file.
Once the data to be updated has been located, updating of the relevant part of the data file and updating of the index file is performed. If the number of index files is large, updating will take additional time. Since the overall performance requires consideration together including the first step of locating the data, updating will be normally be faster if there is an index on the id column. Therefore answer B in incorrect.
Therefore, the correct answer is E.