Is a collection of information that is stored on a computer on a in a highly structured way?

Data vs. Information - Differences in Meaning

"The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning." —Statistician Nate Silver in the book The Signal and the Noise

Data are simply facts or figures — bits of information, but not information itself. When data are processed, interpreted, organized, structured or presented so as to make them meaningful or useful, they are called information. Information provides context for data.

For example, a list of dates — data — is meaningless without the information that makes the dates relevant (dates of holiday).

"Data" and "information" are intricately tied together, whether one is recognizing them as two separate words or using them interchangeably, as is common today. Whether they are used interchangeably depends somewhat on the usage of "data" — its context and grammar.

Examples of Data and Information

  • The history of temperature readings all over the world for the past 100 years is data. If this data is organized and analyzed to find that global temperature is rising, then that is information.
  • The number of visitors to a website by country is an example of data. Finding out that traffic from the U.S. is increasing while that from Australia is decreasing is meaningful information.
  • Often data is required to back up a claim or conclusion (information) derived or deduced from it. For example, before a drug is approved by the FDA, the manufacturer must conduct clinical trials and present a lot of data to demonstrate that the drug is safe.

"Misleading" Data

Because data needs to be interpreted and analyzed, it is quite possible — indeed, very probable — that it will be interpreted incorrectly. When this leads to erroneous conclusions, it is said that the data are misleading. Often this is the result of incomplete data or a lack of context. For example, your investment in a mutual fund may be up by 5% and you may conclude that the fund managers are doing a great job. However, this could be misleading if the major stock market indices are up by 12%. In this case, the fund has underperformed the market significantly.

Video Explaining the Differences

Etymology

"Data" comes from a singular Latin word, datum, which originally meant "something given." Its early usage dates back to the 1600s. Over time "data" has become the plural of datum.

"Information" is an older word that dates back to the 1300s and has Old French and Middle English origins. It has always referred to "the act of informing," usually in regard to education, instruction, or other knowledge communication.

Grammar and Usage

While "information" is a mass or uncountable noun that takes a singular verb, "data" is technically a plural noun that deserves a plural verb (e.g., The data are ready.). The singular form of "data" is datum — meaning "one fact" — a word which has mostly fallen out of common use but is still widely recognized by many style guides (e.g., The datum proves her point.).

In common usage that is less likely to recognize datum, "data" has become a mass noun in many cases and takes on a singular verb (e.g., The data is ready.). When this happens, it is very easy for "data" and "information" to be used interchangeably (e.g., The information is ready.).

References

  • Data - definition and examples on Wiktionary.org
  • Information - Wiktionary.org

Main Body

Adrienne Watt

The way in which computers manage data has come a long way over the last few decades. Today’s users take for granted the many benefits found in a database system. However, it wasn’t that long ago that computers relied on a much less elegant and costly approach to data management called the file-based system.

File-based System

One way to keep information on a computer is to store it in permanent files. A company system has a number of application programs; each of them is designed to manipulate data files. These application programs have been written at the request of the users in the organization. New applications are added to the system as the need arises. The system just described is called the file-based system.

Consider a traditional banking system that uses the file-based system to manage the organization’s data shown in Figure 1.1. As we can see, there are different departments in the bank. Each has its own applications that manage and manipulate different data files. For banking systems, the programs may be used to debit or credit an account, find the balance of an account, add a new mortgage loan and generate monthly statements.

Is a collection of information that is stored on a computer on a in a highly structured way?
Figure 1.1. Example of a file-based system used by banks to manage data.

Disadvantages of the file-based approach

Using the file-based system to keep organizational information has a number of disadvantages. Listed below are five examples.

Data redundancy

Often, within an organization, files and applications are created by different programmers from various departments over long periods of time. This can lead to data redundancy, a situation that occurs in a database when a field needs to be updated in more than one table. This practice can lead to several problems such as:

  • Inconsistency in data format
  • The same information being kept in several different places (files)
  • Data inconsistency, a situation where various copies of the same data are conflicting, wastes storage space and duplicates effort

Data isolation

Data isolation  is a property that determines when and how changes made by one operation become visible to other concurrent users and systems. This issue occurs in a concurrency situation. This is a problem because:

  • It is difficult for new applications to retrieve the appropriate data, which might be stored in various files.

Integrity problems

Problems with data integrity is another disadvantage of using a file-based system. It refers to the maintenance and assurance that the data in a database are correct and consistent. Factors to consider when addressing this issue are:

  • Data values must satisfy certain consistency constraints that are specified in the application programs.
  • It is difficult to make changes to the application programs in order to enforce new constraints.

Security problems

Security can be a problem with a file-based approach because: 

  • There are constraints regarding accessing privileges.
  • Application requirements are added to the system in an ad-hoc manner so it is difficult to enforce constraints.

Concurrency access

Concurrency is the ability of the database to allow multiple users access to the same record without adversely affecting transaction processing. A file-based system must manage, or prevent, concurrency by the application programs. Typically, in a file-based system, when an application opens a file, that file is locked. This means that no one else has access to the file at the same time.

In database systems, concurrency is managed thus allowing multiple users access to the same record. This is an important difference between database and file-based systems.

Database Approach

The difficulties that arise from using the file-based system have prompted the development of a new approach in managing large amounts of organizational information called the database approach.

Databases and database technology play an important role in most areas where computers are used, including business, education and medicine. To understand the fundamentals of database systems, we will start by introducing some basic concepts in this area.

Role of databases in business

Everybody uses a database in some way, even if it is just to store information about their friends and family. That data might be written down or stored in a computer by using a word-processing program or it could be saved in a spreadsheet. However, the best way to store data is by using database management software. This is a powerful software tool that allows you to store, manipulate and retrieve data in a variety of different ways.

Most companies keep track of customer information by storing it in a database. This data may include customers, employees, products, orders or anything else that assists the business with its operations.

The meaning of data

Data are factual information such as measurements or statistics about objects and concepts. We use data for discussions or as part of a calculation. Data can be a person, a place, an event, an action or any one of a number of things. A single fact is an element of data, or a data element.

If data are information and information is what we are in the business of working with, you can start to see where you might be storing it. Data can be stored in:

  • Filing cabinets
  • Spreadsheets
  • Folders
  • Ledgers
  • Lists
  • Piles of papers on your desk

All of these items store information, and so too does a database. Because of the mechanical nature of databases, they have terrific power to manage and process the information they hold. This can make the information they house much more useful for your work.

With this understanding of data, we can start to see how a tool with the capacity to store a collection of data and organize it, conduct a rapid search, retrieve and process, might make a difference to how we can use data. This book and the chapters that follow are all about managing information.

concurrency: the ability of the database to allow multiple users access to the same record without adversely affecting transaction processing

data element: a single fact or piece of information

data inconsistency: a situation where various copies of the same data are conflicting

data isolation: a property that determines when and how changes made by one operation become visible to other concurrent users and systems

data integrity: refers to the maintenance and assurance that the data in a database are correct and consistent

data redundancy: a situation that occurs in a database when a field needs to be updated in more than one table

database approach: allows the management of large amounts of organizational information

database management software: a powerful software tool that allows you to store, manipulate and retrieve data in a variety of ways

file-based system: an application program designed to manipulate data files

  1. Discuss each of the following terms:
    1. data
    2. field
    3. record
    4. file
  2. What is data redundancy?
  3. Discuss the disadvantages of file-based systems.
  4. Explain the difference between data and information.
  5. Use Figure 1.2 (below) to answer the following questions.
    1. In the table, how many records does the file contain?
    2. How many fields are there per record?
    3. What problem would you encounter if you wanted to produce a listing by city?
    4. How would you solve this problem by altering the file structure?
Is a collection of information that is stored on a computer on a in a highly structured way?
Figure 1.2. Table for exercise #5, by A. Watt.

Attribution

This chapter of Database Design (including its images, unless otherwise noted) is a derivative copy of Database System Concepts by Nguyen Kim Anh licensed under Creative Commons Attribution License 3.0 license

The following material was written by Adrienne Watt:

  1. Introduction
  2. Key Terms
  3. Exercises

What is database and information system?

< Information Systems. Databases are organized collections of data typically collected by schemas, tables, queries, reports and views. Databases are typically organized to process data to provide quick information retrieval.
You can describe a database as an organised collection of related information (or data) that is stored in a computer system.

What is a database used for?

Databases are used for storing, maintaining and accessing any sort of data. They collect information on people, places or things. That information is gathered in one place so that it can be observed and analyzed. Databases can be thought of as an organized collection of information.

What is data in database?

Data are observations or measurements (unprocessed or processed) represented as text, numbers, or multimedia. A dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets.