Database 1
The Database Design Process
There are three major stages in database design, each producing a progressively lower level description:
Requirements analysis
First, we determine and write down what exactly the database is needed for, what data will be stored, and how the data items relate to each other. In practice, this might involve detailed study of the application requirements and talking to people in various roles that will interact with the database and application.
Conceptual design
Once we know what the database requirements are, we distill them into a formal description of the database design.
Logical design
Finally, we map the database design onto an actual database management system and database tables.
The Entity Relationship Model
At a basic level, databases store information about distinct objects, or entities, and the associations, or relationships, between these entities. For example, a university database might store information about students, courses, and enrollment. A student and a course are entities, while an enrollment is a relationship between a student and a course. Similarly, an inventory and sales database might store information about products, customers, and sales. A product and a customer are entities, while a sale is a relationship between a customer and a product.
A popular approach to conceptual design uses the Entity Relationship (ER) model, which helps transform the requirements into a formal description of the entities and relationships that appear in the database.
Representing Entities
We typically use the database to store certain characteristics, or attributes, of the entities. In a sales database, we could store the name, email address, postal address, and telephone number for each customer. In a more elaborate customer relationship management (CRM) application, we could also store the names of the customer’s spouse and children, the languages the customer speaks, the customer’s history of interaction with our company, and so on.
Attributes describe the entity they belong to. An attribute may be formed from smaller parts; for example, a postal address is composed of a street number, city, ZIP code, and country. We classify attributes as composite if they’re composed of smaller parts in this way, and as simple otherwise. Some attributes can have multiple values for a given entity. For example, a customer could provide several telephone numbers, so the telephone number attribute is multivalued. Attributes help distinguish one entity from other entities of the same type. We could use the name attribute to distinguish between customers, but this could be an inadequate solution because several customers could have identical names. To be able to tell them apart, we need an attribute (or a minimal combination of attributes) guaranteed to be unique to each individual customer. The identifying attribute or attributes form a key.
In our example, we can assume that no two customers have the same email address, so the email address can be the key. However, we need to think carefully about the implications of our choices. For example, if we decide to identify customers by their email address, it would be hard to allow a customer to have multiple email addresses. Any applications we build to use this database might treat each email address as a separate person, and it might be hard to adapt everything to allow people to have multiple email addresses. Using the email address as the key also means that every customer must have an email address; otherwise, we wouldn’t be able to distinguish between customers who don’t have one.
Looking at the other attributes for one that can serve as an alternative key, we see that while it’s possible that two customers would have the same telephone number (and so we cannot use the telephone number as a key), it’s likely that people who have the same telephone number never have the same name, so we can use the combination of the telephone number and the name as a composite key.
Clearly, there may be several possible keys that could be used to identify an entity; we choose one of the alternative, or candidate, keys to be our main, or primary, key. You usually make this choice based on how confident you are that the attribute will be nonempty and unique for each individual entity, and on how small the key is (shorter keys are faster to maintain and use).
Attribute values are chosen from a domain of legal values; for example, we could specify that a customer’s given names and surname attributes can each be a string of up to 100 characters, while a telephone number can be a string of up to 40 characters. Similarly, a product price could be a positive rational number.
Attributes can be empty; for example, some customers may not provide their telephone numbers. The primary key of an entity (including thecomponents of a multi attribute primary key) must never be unknown (technically, it must be NOT NULL); for example, if it’s possible for a customer to not provide an email address, we cannot use the email address as the key. You should think carefully when classifying an attribute as multivalued: are all the values equivalent, or do they in fact represent different things? For example, when listing multiple telephone numbers for a customer, would they be more usefully labeled separately as the customer’s business phone number, home phone number, cell phone number, and so on?
Let’s look at another example. The sales database requirements may specify that a product has a name and a price. We can see that the product is an entity because it’s a distinct object. However, the product’s name and price aren’t distinct objects; they’re attributes that describe the product entity. Note that if we want to have different prices for different markets, then the price is no longer just related to the product entity, and
we need to model it differently. For some applications, no combination of attributes can uniquely identify an entity (or it would be too unwieldy to use a large composite key), so we create an artificial attribute that’s defined to be unique and can therefore be used as a key: student numbers, driver’s license numbers, and library card numbers are examples of unique attributes created for various applications
Representing Relationships
Entities can participate in relationships with other entities. For example, a customer can buy a product, a student can take a course,an artist can record an album, and so on.
Like entities, relationships can have attributes: we can define a sale to be a relationship between a customer entity (identified by the unique email address) and a given number of the product entity (identified by the unique product ID) that exists at a particular date and time (the timestamp).
Our database could then record each sale and tell us, for example, that at 3:13 p.m. on Wednesday, March 22, Ali Thomson bought one “Four-port USB 2.0 Hub,” one “300 GB 16 MB Cache 7200 rpm SATA Serial ATA133 HDD Hard Disk,” and two sets of “2000 Watt 5.1 Channel Sub-Woofer Speakers.”
Different numbers of entities can appear on each side of a relationship. For example, each customer can buy any number of products, and each product can be bought by any number of customers. This is known as a many-to-many relationship. We can also have one-to-many relationships. For example, one person can have several credit cards, but each credit card belongs to just one person. Looking at it the other way, a one-to-many relationship becomes a many-to-one relationship; for example, many credit cards belong to a single person. Finally, the serial number on a car engine is an example of a one-to-one relationship; each engine has just one serialnumber, and each serial number belongs to just one engine. We often use the shorthand terms 1:1, 1: N, and M:N for one-to-one, one-to-many, and many-to-many relationships, respectively. The number of entities on either side of a relationship (the cardinality of the relationship) define the key constraints of the relationship. It’s important to think about the cardinality of relationships carefully. There are many relationships that may at first seem to be one-to-one, but turn out to be more complex. For example, people sometimes change their names; in some applications, such as police databases, this is of particular interest, and so it may be necessary to model a many-to-many relationship between a person entity and a name entity. Redesigning a database can be time-consuming if you assume a relationship is simpler than it really is.
Partial and Total Participation
Relationships between entities can be optional orcompulsory. In our example, we could decide that a person is considered to be a customer only if they have bought a product. On the other hand, we could say that a customer is a person whom we know about and whom we hope might buy something—that is, we can have people listed as customers in our database who never buy a product. In the first case, the customer entity has total participation in the bought relationship (all customers have bought a product, and we can’t have a customer who hasn’t bought a product), while in the second case it has partial participation (a customer can buy a product). These are referred to as the participation constraints of the relationship.
Entity or Attribute?
From time to time, we encounter cases where we wonder whether an item should be an attribute or an entity on its own. For example, an email address could be modeled as an entity in its own right. When in doubt, consider these rules of thumb:
Is the item of direct interest to the database?
Objects of direct interest should be entities, and information that describes them should be stored in attributes. Our inventory and sales database is really interested in customers, and not their email addresses, so the email address would be best modeled as an attribute of the customer entity.
Does the item have components of its own?
If so, we must find a way of representing these components; a separate entity might be the best solution. In the student grades for example, we stored the course name, year, and semester for each course that a student takes. It would be more compact to treat the course as a separate entity and to create a class ID number to identify each time a course is offered to students (the “offering”).
Can the object have multiple instances?
If so, we must find a way to store data on each instance. The cleanest way to do this is to represent the object as a separate entity. In our sales example, we must ask whether customers are allowed to have more than one email address; if they are, we should model the email address as a separate entity.
Is the object often nonexistent or unknown?
If so, it is effectively an attribute of only some of the entities, and it would be better to model it as a separate entity rather than as an attribute that is often empty.Consider a simple example: to store student grades for different courses, we could have an attribute for the student’s grade in every possible course because most students will have grades for only a few of these courses, it’s better to represent the grades as a separate entity set.
Entity or Relationship?
An easy way to decide whether an object should be an entity or a relationship is to map nouns in the requirements to entities, and to map the verbs to relations. For example, in the statement, “A degree program is made up of one or more courses,” we can identify the entities “program” and “course,” and the relationship “is made up of.” Similarly, in the statement, “A student enrolls in one program,” we can identify the entities “student” and “program,” and the relationship “enrolls in.” Of course, we can choose different terms for entities and relationships than those that appear in the relationships, but it’s a good idea not to deviate too far from the naming conventions used in the requirements so that the design can be checked against the requirements. All else being equal, try to keep the design simple, and avoid introducing trivial entities where possible; i.e., there’s no need to have a separate entity for the student’s enrollment when we can model it as a relationship between the existing student and program entities.
Intermediate Entities
It is often possible to conceptually simplify many-to-many relationships by replacing the many-to-many relationship with a new intermediate entity (sometimes called an associate entity) and connecting the original entities through a many-to-one and a one-to-many relationship.
Consider the statement: “A passenger can book a seat on a flight.” This is a many-to-many relationship between the entities “passenger” and “flight.” However, let’s look at this from both sides of the relationship:
• Any given flight can have many passengers with a booking.
• Any given passenger can have bookings on many flights.
Hence, we can consider the many-to-many relationship to be in fact two one-to-many relationships, one each way. This points us to the existence of a hidden intermediate entity, the booking, between the flight and the passenger entities. The requirement could be better worded as: “A passenger can make a booking for a seat on a flight.” Each passenger can be involved in multiple bookings, but each booking belongs to a single passenger, so the cardinality of this relationship is 1:N. Similarly, there can be many bookings for a given flight, but each booking is for a single flight, so this relationship also has cardinality 1:N. Since each booking must be associated with a particular passenger and flight, the booking entity participates totally in the relationships with these entities.
Weak and Strong Entities
Context is very important in our daily interactions; if we know the context, we can work with a much smaller amount of information. For example, we generally call family members by only their first name or nickname. Where ambiguity exists, we add further information such as the surname to clarify our intent. In database design, we can omit some key information for entities that are dependent on other entities. For example, if we wanted to store the names of our customers’ children, we could create a child entity and store only enough key information to identify it in the context of its parent. We could simply list a child’s first name on the assumption that a customer will never have several children with the same first name. Here, the child entity is a weak entity, and its relationship with the customer entity is called an identifying relationship. Weak entities participate totally in the identifying relationship, since they can’t exist in the database independently of their owning entity. A weak entity is uniquely identified in the context of its regular (or strong) entity, and so the full key for a weak entity is the combination of its own (partial) key with the key of its owning entity. To uniquely identify a child in our example, we need the first name of the child and the email address of the child’s parent.
Using the university database as an example to explain further the relationships that has been explained
The university database captures the details of students, courses, and grades for the university.
The database stores details about university students, courses, the semester a student took a particular course (and his mark and grade if he completed it), and what degree program each student is enrolled in. The database is a long way from one that will be suitable for a large tertiary institution, but it does illustrate relationships and it’s easy to relate to when you’re learning SQL.
Consider the following requirements list:
• The university offers one or more programs.
• A program is made up of one or more courses.
• A student must enroll in a program.
• A student takes the courses that are part of her program.
• A program has a name, a program identifier, the total credit points required to graduate, and the year it commenced.
• A course has a name, a course identifier, a credit point value, and the year it commenced.
• Students have one or more given names, a surname, a student identifier, a date of birth, and the year they first enrolled. We can treat all given names as a single object for example, “John Paul.”
• When a student takes a course, the year and semester he attempted it are recorded. When he finishes the course, a grade (such as A) and a mark (such as 60 percent) are recorded.
• Each course in a program is sequenced into a year (for example, year 1) and a semester (for example, 1stsemester).
• Student is a strong entity, with an identifier, student_id, created to be the primary key used to distinguish between students (remember, we could have several students with the same name).
• Program is a strong entity, with the identifier program_id as the primary key used to distinguish between programs.
• Each student must be enrolled in a program, so the Student entity participates totally in the many-to-one Enrolls In relationship with Program. A program can exist without having any enrolled students, so itparticipates partially in this relationship.
• A Course has meaning only in the context of a Program, so it’s a weak entity, with course_id as a weak key. This means that a Course is uniquely identified using its course_id and the program_id of its owning program.
• As a weak entity, Course participates totally in the many-to-one identifying relationship with its owning Program. This relationship has Year and Semester attributes that identify its sequence position.
• Student and Course are related through the many-to-many Attempts relationships; a course can exist without a student, and a student can be enrolled without attempting any courses, so the participation is not total.
• When a student attempts a course, there are attributes to capture the Year and Semester, and the Mark and Grade.
The database is rather simple, but this is because the requirements are simple. For a real university, many more aspects would need to be captured by the database. For example, the requirements don’t mention anything about campus, study mode, course prerequisites, lecturers, timetabling details, address history, financials, or assessment details. The database also doesn’t allow a student to be in more than one degree program, nor does it allow a course to appear as part of different programs.
Mapping Entities and Relationships to Database Tables
For each strong entity, create a table comprising its attributes and designate the primary key. The parts of any composite attributes are also included here.
For each weak entity, create a table comprising its attributes and including the primary key of its owning entity. The primary key of the owning entity is known as a foreign key here, because it’s a key not of this table, but of another table. The primary key of the table for the weak entity is the combination of the foreign key and the partial key of the weak entity. If the relationship with the owning entity has any attributes, add them to this table.
For each multivalued attribute of an entity, create a table comprising the entity’s primary key and the attribute.
For each one-to-one relationship between two entities, include the primary key of one entity as a foreign key in the table belonging to the other. If one entity participates totally in the relationship, place the foreign key in its table. If both participate totallyin the relationship, consider merging them into a single table.
For each non identifying one-to-many relationship between two entities, include the primary key of the entity on the “1” side as a foreign key in the table for the entity on the “N” side. Add any attributes of the relationship in the table alongside the foreign key. Note that identifying one-to-many relationships(between a weak entity and its owning entity) are captured as part of the entity-mapping stage.
For each many-to-many relationship between two entities, create a new table containing the primary key of each entity as the primary key, and add any attributes of the relationship. This step helps to identify intermediate entities.
For each relationship involving more than two entities, create a table with the primary keys of all the participating entities, and add any attributes of the relationship.
Keys and Indexes
A primary key uniquely identifies each row in a table. When you declare one to MySQL, it creates a new file on disk that stores information about where the data from each row in the table is stored. This information is called an index, and its purpose is to speed up searches that use the primary key. For example, when you declare PRIMARY KEY (artist_id) in the artist table in the music database, MySQL creates a structure that allows it to find rows that match a specific artist_id (or a range of identifiers) extremely quickly. This is very useful to match artists to albums, tracks, and playlist information.
You can display the indexes available on a table using the SHOW INDEX command:
mysql> SHOW INDEX FROM artist;
+--------+------------+----------+--------------+-------------+-----------+...
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |...
+--------+------------+----------+--------------+-------------+-----------+...
| artist | 0 | PRIMARY | 1 | artist_id | A |...
+--------+------------+----------+--------------+-------------+-----------+...
... +-------------+----------+--------+------+------------+---------+
... | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
... +-------------+----------+--------+------+------------+---------+
... | 6 | NULL | | | BTREE | |
... +-------------+----------+--------+------+------------+---------+
1 row in set (0.00 sec)
The cardinality is the number of unique values in the index; for an index on a primary key, this is the same
as the number of rows in the table.
Note that all columns that are part of a primary key must be declared as NOT NULL, since they must have a value for the row to be valid. Without the index, the only way to find rows in the table is to read each one from disk and check whether it matches the artist_id you’re searching for. For tables with many rows, this exhaustive, sequential searching is extremely slow.
You can create other indexes on the data in a table. You do this so that other searches on other columns or combinations of columns are extremely fast and in order to avoid sequential scans. For example, suppose you often want to search by artist_name. You can drop the table and modify the CREATE TABLE definition to add an extra index:
mysql> DROP TABLE artist;
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE artist (
-> artist_id SMALLINT(5) NOT NULL DEFAULT 0,
-> artist_name CHAR(128) DEFAULT NULL,
-> PRIMARY KEY (artist_id),
-> KEY artist_name (artist_name)
-> );
Query OK, 0 rows affected (0.06 sec)
You can see we’ve used the keyword KEY to tell MySQL that we want an extra index; you can use the word INDEX in place of KEY. Following this, we’ve named the index— in this example, we’ve named it after the column name—and then we’ve included the column to index in parentheses. You can also add indexes after tables are created—infact, you can pretty much change anything about a table after its creation.
You can build an index on more than one column. For example, consider the following customer table:
mysql> CREATE TABLE customer (
-> cust_id INT(4) NOT NULL DEFAULT 0,
-> firstname CHAR(50),
-> secondname CHAR(50),
-> surname CHAR(50),
-> PRIMARY KEY (cust_id),
-> KEY names (firstname, secondname, surname));
Query OK, 0 rows affected (0.01 sec)
You can see that we’ve added a primary key index on the cust_id identifier column, and we’ve also added another index—called names—that includes the firstname,
secondname, and surname columns in this order. Let’s now consider how you can use that extra index.
You can use the names index for fast searching bycombinations of the three name columns. For example, it’s useful in the following query:
mysql> SELECT * FROM customer WHERE
-> firstname = "Rose" AND
-> secondname = "Elizabeth" AND
-> surname = "Williams";
We know it helps the search, because all columns listed in the index are used in the query. You can use the EXPLAIN statement to check whether what you think should happen is in fact happening:
mysql> EXPLAIN SELECT * FROM customer WHERE
-> firstname = "Rose" AND
-> secondname = "Elizabeth" AND
-> surname = "Williams";
+----+-------------+----------+------+---------------+...
| id | select_type | table | type | possible_keys |...
+----+-------------+----------+------+---------------+...
| 1 | SIMPLE | customer | ref | names |...
+----+-------------+----------+------+---------------+...
...+-------+---------+-------------------+------+-------------+
...| key | key_len | ref | rows | Extra |
...+-------+---------+-------------------+------+-------------+
...| names | 153 | const,const,const | 1 | Using where |
...+-------+---------+-------------------+------+-------------+
1 row in set (0.00 sec)
You can see that MySQL reports that the possible_keys are names (meaning that the index could be used for this query) and that the key that it’s decided to use is names. So, what you expect and what is happening are the same.
The index we’ve created is also useful for queries on only the firstname column. For example, it can be used by the following query:
mysql> SELECT * FROM customer WHERE
-> firstname = "Rose";
You can use EXPLAIN to check whether the index is being used. The reason it can be used is because the firstname column is the first listed in the index. In practice, this means that the index clusters, or stores together, information about rows for all people with the same first name, and so the index can be used to find anyone with a matching first name.
The index can also be used for searches involving combinations of first name and second name, for exactly the same reasons we’ve just discussed. The index clusters together people with the same first name, and within that it clusters people with identical first names ordered by second name. So, it can be used for this query:
mysql> SELECT * FROM customer WHERE
-> firstname = "Rose" AND
-> secondname = "Elizabeth";
However, the index can’t be used for this query because the leftmost column in the index, firstname, does not appear in the query:
mysql> SELECT * FROM customer WHERE
-> surname = "Williams" AND
-> secondname = "Elizabeth";
The index should help narrow down the set of rows to a smaller set of possible answers.
For MySQL to be able to use an index, the query needs to meet both the following conditions:
1. The leftmost column listed in the KEY (or PRIMARY KEY) clause must be in the query.
2. The query must contain no OR clauses for columns that aren’t indexed.
Again, you can always use the EXPLAIN statement to check whether an index can be used for a particular query.
How to choose and design indexes.
When you’re considering adding an index, think about the following:
• Indexes cost space on disk, and they need to beupdated whenever data changes. If your data changes frequently, or lots of data changes when you do make a change, indexes will slow the process down. However, in practice, since SELECT statements (data reads) are usually much more common than other statements (data modifications), indexes are usually beneficial.
• Only add an index that’ll be used frequently. Don’tbother indexing columns before you see what queries your users and your applications need. You can always add indexes afterward.
• If all columns in an index are used in all queries, list the column with the highest number of duplicates at the left of the KEY clause. This minimizes index size.
• The smaller the index, the faster it’ll be. If you index large columns, you’ll get a larger index. This is a good reason to ensure your columns are as small as possible when you design your tables.
• For long columns, you can use only a prefix of thevalues from a column to create the index. You can do this by adding a value in parentheses after the column definition, such as KEY names (firstname(3), secondname(2), surname(10)). This means that only the first three characters of firstname are indexed, then the first two characters of secondname, and then 10 characters from surname. This is a significant saving over indexing 50 characters from each of the 3 columns! When you do this, your index will be less able to uniquely identify rows, but it’ll be much smaller and still reasonably good at finding matching rows.
Altering Structures
Adding, Removing, and Changing Columns
You can use the ALTER TABLE statement to add new columns to a table, remove existing columns, and change column names, types, and lengths.
Let’s consider how you modify existing columns. Consider an example in which we rename a table column. The played table has a column also called played that contains the time the track was played. To change the name of this column to last_played, you would write:
mysql> ALTER TABLE played CHANGE played last_played TIMESTAMP;
Query OK, 12 rows affected (0.03 sec)
Records: 12 Duplicates: 0 Warnings: 0
You can see that MySQL processes and alters each row. What actually happens behind the scenes is that MySQL creates a new table with the new structure, copies the data into that table, removes the original played table, and renames the table to played. You can check the result with the SHOW COLUMNS statement:
mysql> SHOW COLUMNS FROM played;
+----------------+-------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+-------------------+-------+
| artist_id | smallint(5) | | PRI | 0 | |
| album_id | smallint(4) | | PRI | 0 | |
| track_id | smallint(3) | | PRI | 0 | |
| last_played | timestamp | YES | PRI | CURRENT_TIMESTAMP | |
+----------------+-------------+------+-----+-------------------+-------+
4 rows in set (0.01 sec)
If you want to rename the artist_name column to artist-name, you would write:
ALTER TABLE artist CHANGE artist_name artist-name CHAR(128) DEFAULT NULL;
If you want to modify the type and clauses of a column, but not its name, you can use
the MODIFY keyword:
mysql> ALTER TABLE artist MODIFY artist_name CHAR(64) DEFAULT "Unknown";
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
You can also do this with the CHANGE keyword, but by specifying the same column name
twice:
mysql> ALTER TABLE artist CHANGE artist_name artist_name CHAR(64) DEFAULT "Unknown";
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0
Be careful when you’re modifying types:
• Don’t use incompatible types, since you’re relying on MySQL to successfully convert data from one format to another (for example, converting an INT column to a DATETIME column isn’t likely to do what you hoped).
• Don’t truncate the data unless that’s what you want. If you reduce the size of a type, the values will be edited to match the new width, and you can lose data.
Suppose you want to add an extra column to an existing table. Here’s how to do it with the ALTER TABLE statement:
mysql> ALTER TABLE artist ADD formed YEAR;
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
You must supply the ADD keyword, the new column name, and the column type and clauses. This example adds the new column, formed, as the last column in the table, as shown with the SHOW COLUMNS statement:
mysql> SHOW COLUMNS FROM artist;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| artist_id | smallint(5) | | PRI | 0 | |
| artist_name | char(64) | YES | | Unknown | |
| formed | year(4) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.02 sec)
If you want it to instead be the first column, use the FIRST keyword as follows:
mysql> ALTER TABLE artist ADD formed YEAR FIRST;
Query OK, 6 rows affected (0.04 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> SHOW COLUMNS FROM artist;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| formed | year(4) | YES | | NULL | |
| artist_id | smallint(5) | | PRI | 0 | |
| artist_name | char(64) | YES | | Unknown | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
If you want it added in a specific position, use the AFTER keyword:
mysql> ALTER TABLE artist ADD formed YEAR AFTER artist_id;
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> SHOW COLUMNS FROM artist;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| artist_id | smallint(5) | | PRI | 0 | |
| formed | year(4) | YES | | NULL | |
| artist_name | char(64) | YES | | Unknown | |
+-------------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
To remove a column, use the DROP keyword followed by the column name. Here’s how to get rid of the newly added formed column:
mysql> ALTER TABLE artist DROP formed;
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
This removes both the column structure and any data contained in that column. It also removes the column from any index it was in; if it’s the only column in the index, the index is dropped, too. You can’t remove a column if it’s the only one in a table; to do this, you drop the table instead. Be careful when dropping columns; you discard both the data and the structure of your table.
When the structure of a table changes, you will generally have to modify any INSERT statements that you use to insert values in a particular order.
MySQL allows you to specify multiple alterations in a single ALTER TABLE statement by separating them with commas. Here’s an example that adds a new column and adjusts another:
mysql> ALTER TABLE artist ADD formed YEAR, MODIFY artist_name char(256);
Query OK, 6 rows affected, 1 warning (0.08 sec)
Records: 6 Duplicates: 0 Warnings: 0
It’s very efficient to join multiple modifications in a single operation, as it potentially saves the cost of creating a new table, copying data from the old table to the new table, dropping the old table, and renaming the new table with the name of the old table for each modification individually.
Adding, Removing, and Changing Indexes
It is often hard to know what indexes are useful before the application you’re building is used. You might find that a particular feature of the application is much more popular than you expected, causing you to evaluate how to improve performance for the associated queries. You’ll therefore find it useful to be able to add, alter, and remove indexes after your application is deployed. Modifying indexes does not affect the data stored in the table.
We’ll start with adding a new index. Imagine that the artist table is frequently queried using a WHERE clause that specifies an artist_name. To speed this query, you’ve decided to add a new index, which you’ve named by_name. Here’s how you add it after the table is created:
mysql> ALTER TABLE artist ADD INDEX by_name (artist_name);
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
Again, you can use the terms KEY and INDEX interchangeably. You can check the results
with the SHOW CREATE TABLE statement:
mysql> SHOW CREATE TABLE artist;
+--------+-----------------------------------------------+
| Table | Create Table |
+--------+-----------------------------------------------+
| artist | CREATE TABLE `artist` ( |
| | `artist_id` smallint(5) NOT NULL default '0', |
| | `artist_name` char(128) default NULL, |
| | PRIMARY KEY (`artist_id`), |
| | KEY `by_name` (`artist_name`) |
| | ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+--------+-----------------------------------------------+
As expected, the new index forms part of the table structure. You can also specify a primary key for a table after it’s created:
mysql> ALTER TABLE artist ADD PRIMARY KEY (artist_id);
Now let’s consider how to remove an index. To remove a non-primary-key index, you do the following:
mysql> ALTER TABLE artist DROP INDEX by_name;
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
You can drop a primary-key index as follows:
mysql> ALTER TABLE artist DROP PRIMARY KEY;
MySQL won’t allow you to have multiple primary keys in a table. If you want to change the primary key, you’ll have to remove the existing index before adding the new one.
Consider this example:
mysql> CREATE TABLE staff (staff_id INT, name CHAR(40));
Query OK, 0 rows affected (0.01 sec)
mysql> ALTER TABLE staff ADD PRIMARY KEY (staff_id);
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE staff ADD PRIMARY KEY (name);
ERROR 1068 (42000): Multiple primary key defined
mysql> ALTER TABLE staff DROP PRIMARY KEY;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Mysql> ALTER TABLE staff ADD PRIMARY KEY (name);
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
MySQL complains when we try to add the second primary key on name; we have to drop the existing primary key on staff_id, and then add one on name.
You can’t modify an index once it’s been created. However, sometimes you’ll want to; for example, you might want to reduce the number of characters indexed from a column or add another column to the index. The best method to do this is to drop the index and then create it again with the new specification. For example, suppose you decide that you want the by_name index to include only the first 10 characters of the
artist_name. Simply do the following:
mysql> ALTER TABLE artist DROP INDEX by_name;
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE artist ADD INDEX by_name (artist_name(10));
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0
Renaming Tables and Altering Other Structures
It’s easy to rename a table. Suppose you want to rename played to playlist. Use the following command:
mysql> ALTER TABLE played RENAME TO playlist;
Query OK, 0 rows affected (0.00 sec)
The TO keyword is optional.
There are several other things you can do with ALTER statements:
• Change the default character set and collation order for a database, a table, or a column.
• Change the order of the rows in a table. This is useful only if you know you want to access the rows in a particular order and you want to help get the data into or near that order.
• Manage and change constraints. For example, you can add and remove foreign keys.
Deleting Structures
Dropping Databases
Removing, or dropping, a database is straightforward. Here’s how you drop the music database:
mysql> DROP DATABASE music;
Query OK, 4 rows affected (0.01 sec)
The number of rows returned in the response is the number of tables removed. You should take care when dropping a database, since all its tables, indexes, and columns are deleted, as are all the associated disk-based files and directories that MySQL uses to maintain them.
If a database doesn’t exist, trying to drop it causes MySQL to report an error. Let’s try dropping the music database again:
mysql> DROP DATABASE music;
ERROR 1008 (HY000): Can't drop database 'music'; database doesn't exist
You can avoid the error, which is useful when including the statement in a script, by using the IF EXISTS phrase:
mysql> DROP DATABASE IF EXISTS music;
Query OK, 0 rows affected, 1 warning (0.00 sec)
You can see that a warning is reported, since the music database has already been dropped.
You can always check what the warning was with the SHOW WARNINGS statement, which has been available since MySQL 4.1.0:
mysql> SHOW WARNINGS;
+-------+------+-----------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------+
| Note | 1008 | Can't drop database 'music'; database doesn't exist |
+-------+------+-----------------------------------------------------+
1 row in set (0.00 sec)
The warning is also generated with the error if you leave out the IF EXISTS clause.
Removing Tables
Removing tables is as easy as removing a database. Let’s create and remove a table from the music database:
mysql> CREATE TABLE temp (temp INT(3), PRIMARY KEY (temp));
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE temp;
Query OK, 0 rows affected (0.00 sec)
Don’t worry: the 0 rows affected message is misleading. You’ll find the table is definitely gone.
You can use the IF EXISTS phrase to prevent errors. Let’s try dropping the temp table again:
mysql> DROP TABLE IF EXISTS temp;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Again, you can investigate the warning indicates with the SHOW WARNINGS statement:
mysql> SHOW WARNINGS;
+-------+------+----------------------+
| Level | Code | Message |
+-------+------+----------------------+
| Note | 1051 | Unknown table 'temp' |
+-------+------+----------------------+
1 row in set (0.00 sec)
You can drop more than one table in a single statement by separating table names with
commas:
mysql> DROP TABLE IF EXISTS temp, temp1, temp2;
Query OK, 0 rows affected, 3 warnings (0.00 sec)
Comments
Post a Comment