Home Home | About Us | Sitemap | Contact  
  • Info For
  • Professionals
  • Students
  • Educators
  • Media
  • Search
    Powered By Google

The Leading Edge: Databases and I-O

R. Jason Weiss
Development Dimensions International

Jeffrey Worst
Private I-O and IT Consultant

This column focuses on budding technologies and their implications for I-O psychology. A review of the technologies discussed so far and some that are on the board for future columns reveals an interesting commonality: Underlying each technology is at least one database playing a key role. Closer to home, databases are at the heart of much of the software that I-O practitioners design and use, such as Human Resource Information Systems (HRIS), applicant screening applications, 360 software, and many others. Still closer to home, most versions of Microsoft Office install an advanced database application, Microsoft Access, by default.

As we thought about it, we found some good arguments for devoting space in Leading Edge to a discussion of databases. As suggested above, databases underlie many of the emerging technologies highlighted in this column; a review of database concepts should help those with less technical backgrounds better evaluate the promises of these technologies. Second, we feel that relational database management systems (RDBMS) have a great deal to offer I-O psychologists of all stripes but have been overlooked in favor of more familiar tools that can emulate some simple database functionality. We hope that a look at some of the advantages of RDBMSs might save someone some difficulty down the road when faced with a data management task that would, say, stretch a spreadsheet beyond its limits.

As we will see, databases range in complexity, capability, and in the types of software used to interact with, and manage them. We start our discussion with simpler databases, known as flat or flat-file databases. As we will note, many of us already create and use flat-file databases due to their simplicity and flexibility. We then discuss relational databases, which have significant advantages over flat-file databases in generating flexible, powerful solutions. We finish with a discussion of resources for those interested in learning more or diving into the world of databases.

 

Keeping It Simple: Flat-File Databases

The first order of business is to define the term database. Many definitions exist, but for our purposes we will define a database as an organized collection of information related to a given topic. The information in a database is organized into files or records in a given file. A typical simple database is the flat-file database. A flat-file database is a single table of information. Each column, or field, in the table represents a variable, or category of information to be stored. Each row, or record, in the table describes a unique person, object, or concept that is described by the variables. The term flat is used to communicate the idea that the database is two-dimensionalall data are contained in the row-column structure of a single table.

Figure 1 is a very simple example of a very small flat-file database. This should look familiar to most readers who use spreadsheets or statistical analysis packages. Lets assume that this is the very beginning of a simple applicant-tracking database for a rural company that primarily hires high school students from the surrounding area. Note how often the data related to school information is repeated (i.e., school name, school district, and address). Since most of this companys applicants come from the surrounding area high schools, a significant amount of the data in this database will be the same.

 

 

Last Name

First Name

Address

School Name

School District

School Address

Cowher

Jill

12 Main St.

Bell High

Fairfax

23 School Place

Lafleur

Gil

121 Oak St.

Putnam High

Springfield

83 Spring Street

Kazor

Jane

341 Pine St

Bell High

Fairfax

23 School Place

Smith

Henry

89 Pond St.

Bell High

Fairfax

23 School Place

Bronte

Charles

71 Lake St.

Putnam High

Springfield

83 Spring Street

Suzuki

Kenji

90 Oak St.

Putnam High

Springfield

83 Spring Street

Figure 1. Sample flat-file database.

 

The Good News

There is a lot to recommend flat-file databases. They are easy to conceptualize and do not require specialized background knowledge. They can be created and managed in almost any modern word processor or spreadsheet. Simple text editors such as the Windows Notepad may even be suitable for some flat-file databases. These are very powerful advantagespeople often create and use flat-file databases without necessarily knowing anything about databases in the abstract. Consider the following representative examples that might be found on a given I-O psychologists computer:

  • A spreadsheet file used to store and perform basic analyses on assessment data, where each record represents a different candidate, and each field a different competency;
  • A collection of names and data stored in a word-processor table, used to generate reports as part of a mail-merge;
  • A text file output by a scan-sheet reader, with names and data from a run of research participants, with responses to each question occupying a given column across all participants.

It seems as though flat-file databases are everywhere when you look for them. They are good, simple ways of working with data. 

The Bad News

The simplicity of using well-known productivity applications like word processors and spreadsheets is attractive and perfectly suitable for a wide range of applications. However, these solutions can also be extremely confining. For example, each software application makes assumptions about the data that it will accommodatea spreadsheet cell can contain text and numerical data, but not binary files, so including graphic or sound files in the database is not possible. A second example of the confining nature of these types of databases is that management of each database is completely in the hands of the person who uses or maintains the file. Database management comprises a variety of functions including adding, updating, and deleting records, and controlling others access to the data, among others. This, too, is not a problem, provided that the database in question is small and easily managed. However, with frequent changes to records or a need to share different data with different people, it can grow very tedious.

Bigger problems arise when requirements increase and a simple solution gets stretched in directions it cannot easily support. Using the spreadsheet-based assessment data spreadsheet described above, lets say that assessments are being run concurrently in many different geographical locations, and the data need to be maintained in a central location. Since a given spreadsheet file can only be edited by one user at a time, updating the file would require either that representatives from each location take turns entering their data or that everyone forward their data to a single person who compiles it all. Either solution is messy and fraught with the danger of data loss or corruption, though potentially manageable. With a little more complexity, the whole situation becomes completely untenable. For example, what if there is a second spreadsheet with resume dataand a third with test score data? Thats when its time to look for a better solution.

Software issues aside, the flat-file database format is not ideal. Consider, for example, the issue of redundancy. The sample database above contains a significant amount of redundant information as data related to high schools are repeated across applicants. Naturally, this amounts to a great deal of wasted space across a large database, bloating the size of the database file. Worse, errors may slip into the odd entry such that Bell High occasionally comes out as Belle High or Bel High. If the recruiter is interested in looking at summary analyses by high school, these types of anomalies will prove to be a source of considerable frustration. Many more issues have been documented but would be beyond the scope of this article. Readers interested in a more complete discussion are directed to Roman (1999).

As we prepare to turn our attention to more sophisticated relational databases, it is necessary that we emphasize that the use of flat-file databases with spreadsheet or word processing software is not a straw man solution. There are many situations in which such approaches make good sense, such as when the database is small and manageable, or if its use will be relatively temporary. It is when database management requirements grow more complex that these tools are found wanting. In such cases, significant power can be harnessed with dedicated database-management software. With this in mind, lets take a look at RDBMSs and the functionality they offer. 

Feel the Power: Relational Database Management Systems

Our discussion of RDBMSs is divided into two parts. We begin with a review of the basic concepts underlying relational database design and the advantages offered by the relational database approach. Next, we describe basic functionality included in common off-the-shelf RDBMS systems. 

The Relational Database Design

Similar to a flat-file database, a relational database uses tables as containers for data. However, a relational database includes multiple tables of information, rather than the single table of a flat-file database. Typically, each table contains a set of fields focused on a specific subject. Lets consider this by taking our flat-file database example above and turning it into two tables. In Figure 2, the first table is the applicant table, showing information about the applicants. The second table is the school table.

 

Student_ID

LastName

FirstName

Address

School_ID

1

Cowher

Jill

12 Main St.

1

2

Lafleur

Gil

121 Oak St.

2

3

Kazor

Jane

341 Pine St

1

4

Smith

Henry

89 Pond St.

1

5

Bronte

Charles

71 Lake St.

2

6

Suzuki

Kenji

90 Oak St.

2

 

School_ID

SchoolName

SchoolDistrict

SchoolAddress

1

Bell High

Fairfax

23 School Place

2

Putnam High

Springfield

83 Spring Street


Figure 2.
Simple Relational Database.

Note that we have added some new fields to help work with the tables. We added a Student ID field to the student table. This ID becomes what is known as the primary key for that table since it uniquely identifies each record in the table. For the school table, the School ID field is the primary key. A School ID field has also been added to the student table so we can determine which school a student has attended. It is called a foreign key in this context because it refers to a primary key in a different table. 

Advantages of Relational Database Design

Relational databases achieve a number of powerful advantages by using additional tables to hold information that would otherwise be repeated across records. Lets just explore a few of these advantages to illustrate the utility of the relational database. First, having a single record for each high school means that all references to that high school refer to the same information. As a result, we would avoid the problem described above where data-entry errors result in minor variations in the high school name and frustrate efforts to generate meaningful summaries across high schools. A related advantage is that if we have to make changes to the high school informationsay Bell High changes school districtswe only have to make the change in one place and can rest assured that the change has been implemented universally. A final advantage we will mentionthough there are many moreis that the universe of information in the database is not limited to what is covered in the individual records. For example, with a relational database, we can represent all high schools in the region, rather than just those which existing applicants have attended. As a result, we can consider not only which schools are producing applicants but which schools arent.

Relational database design is only one side of RDBMS software. The other side is the functionality that supports working with the data. In the following section, we describe typical functions universal to RDBMS packages along with some advanced functionality unique to high-end packages. 

RDBMS Functionality

Queries

If youve had a conversation with someone who works with databases, its likely that you have heard the word query at some point. One way to think about a query is as a business question that can be answered through an analysis of the data in the RDBMS. You ask the database for a subset of its data that relate to the question. Queries are composed using a standard language called Structured Query Language (SQL). The acronym SQL is pronounced as if you were saying the word sequel. Currently, all major RDBMSs (e.g., Oracle, MS SQLServer, MS Access, etc.) use SQL as the language for writing queries and processing data. It is actually an easy language to learn and is very intuitive.

For example, suppose we are asked, Who did we interview from Putnam High? To find the answer, we would write a query that joins together the two tables in our database. Our query might look like the following SQL statement: 

SELECT applicant.lastname, applicant.firstname, school.schoolname
FROM applicant INNER JOIN school
ON applicant.school_id = school.school_id
WHERE ((school.schoolname)="Putnam High"); 

In plain English, it says that we are interested in seeing the applicants last name, first name, and school name (line 1). These are to be found in the applicant and school tables (line 2). We will join these two tables to get the appropriate information by matching records according to the school ID code found in the records of both tables (line 3). Remember, each applicant in the applicant table has a School_ID code representing the high school he or she attended, and this points back to the record associated with the matching School_ID in the school table. Finally, we are interested only in the records where the name of the school is Putnam High (line 4). The results of this query would return in a form similar to Table 1. 

Table 1.
Sample Query Results


Of course, we dont really need to show the school name, since we wrote the query to select only interviewees from Putnam High. We just included it in this example as a way of illustrating the query joining the two tables.

You dont have to be a programmer to write a query. With a minimal amount of instruction, one can begin writing his or her own queries very easily. For those who would like some assistance in the process, RDBMSs such as MS Access have a wizard that guides the user in query construction.

Once a query is written, it can be saved and run againyou can use the query to answer that question whenever necessary. We have found that once people learn the power of queries they start generating all sorts of questions that otherwise might have seemed impossible to consider when the data were confined to a filing cabinet. This is often called mining the data or extracting as much value as possible from the database. 

Reports

Though queries are an exciting idea on their own, the results of a query are typically not much to look at. To help make them presentable, RDBMSs also have the ability to create formatted reports that have the query results embedded within them. Hence, once you have set up your report and linked the applicable queries to it, you can run the report minutes before a meeting and present hot off the press results in a professionally formatted document. You can develop as many reports and queries as necessary to meet your business information reporting needs. 

Database Applications

Finally, a database can serve as what is sometimes called the back end of a software application. For example, when you are browsing through Amazon.com, what you see on your computer screen (i.e., the various Web pages) forms the front end of the Web site. All of the data on books, customers, and orders is contained in a database at the back end. For example, you use the front end of Amazons Web site to enter the name of a book you might like to buy and press a button to execute a search for the book. The front end takes this request, uses it to query the back-end database where the book titles are stored and then returns the result back to you via the front end (i.e., a new Web page). Purchasing a book and entering delivery information might take a number of such front-end/back-end interactions. A similar front-end/back-end approach is taken by nearly all database applications. 

Moving On

We hope that our discussion of databases has accomplished two goals. The first was to communicate how databases have a lot to offer as a way of storing and working with dataas it is, we have barely scratched the surface. The second goal was to generate excitement and encourage those readers who have not done so to consider creating databases to address their own data management needs. We would like to suggest the following resources for those readers interested in creating their own databases or learning more about database technology: 

Database Software

Microsoft Access

As noted above, most versions of Microsoft Office include Microsoft Access, which is a reasonably powerful desktop RDBMS. If you installed Office with all of the defaults, look for an icon featuring a maroon key. Thats Access! The online help offers good information on getting started. Additionally, we have other getting started resources below. 

Freeware Databases

MySQL and PostgreSQL are available on the Web as free downloads. You can find MySQL at http://www.mysql.com, and PostgreSQL at http://www.postgresql.org. Both are available for Windows and Linux, as well as a number of other operating systems. 

Resources on the Web

User Communities

Google Groups. There are several good sources of user communities on the Web. One is the Usenet newsgroups accessible through Google Groups (formerly deja.com), located at http://groups.google.com. The search terms mailing.database yielded a sizeable number of discussion boards.

Microsoft Communities. Microsoft has a set of boards discussing its database offerings (Access and SQL Server). These can be found by starting at http://communities2.microsoft.com/home/default.aspx.

Yahoo Groups. Yahoo has a number of groups associated with a variety of database topics. To locate them, start at http://groups.yahoo.com and search for SQL, Microsoft Access, or other database-related terms. 

Web Sites

A vast number of Web sites are devoted to database-related topics. Here are a few good starting points:

  • http://www.sqlcourse.com A free online tutorial on writing SQL queries. Best of all, it includes a live SQL interpreter, so you can practice what you learn. Very impressive.
  • http://www.mvps.org/access/ The home of the Microsoft Access MVPs, many of whom answer user questions on the Microsoft discussion boards described above. Lots to learn on this site, spanning everything from how-to articles to a compendium of bugs in the software.
  • http://www.microsoft.com/office/using/column06.asp A nice frequently-asked questions guide to databases and Access.

 

Reference

Roman, S. (1999). Access database design and programming. Sebastopol, CA: OReilly.

 

October 2002 Table of Contents | TIP Home | SIOP Home