What is the correct term for files copied to a secondary location for preservation purposes?

Applications in Data-Intensive Computing

Inhaltsverzeichnis Show

Applications in Data-Intensive Computing
2.3.1.1 Data Warehouses
Computer Data Processing Hardware Architecture
2.5.3 Archival storage devices
Optical Information Processing
VI.C.3.a Optical disks
Operating Systems
11.2.3 Memory Management
Cloud-Based Smart-Facilities Management
17.3 A cloud-based architecture for smart-facility management
Software Detection and Recovery
8.6.2 Log-Based Backward Error Recovery in Database Systems
Digitization of text and still images
General digitization guidelines
Digitization of audio and moving image collections
Audio digitization process
II.B.3 Computer
Measuring and evaluating
8.6 Can self-study of PIM practices contribute to the larger study of PIM?
What is the term for files copied to a secondary location for preservation purposes?
What is the term used for two or more PCS that are connected and share resources?
What is the three or four letter identifier found at the end of a file name that follows a period called?
What does the abbreviation Soho stand for quizlet?

Anuj R. Shah, ... Nino Zuljevic, in Advances in Computers, 2010

2.3.1.1 Data Warehouses

Commercial enterprises are voracious users of data warehousing technologies. Mainstream database technology vendors supply these technologies to provide archival storage of business transactions for business-analysis purposes. As enterprises capture and store more data, data warehouses have grown to petabyte size. Best known is Wal-Mart's data warehouse, which over the span of a decade has grown to store more than a petabyte of data [58], fueled by daily data from 800 million transactions generated by its 30 million customers.

The data warehousing approach also finds traction in science. The Sloan Digital Sky Survey (SDSS) (http://cas.sdss.org/dr6/en/) SkyServer stores the results of processing raw astronomical data from the SDSS telescope in a data warehouse for subsequent data mining by astronomers. While the SkyServer data warehouse currently only stores terabytes of data, it has been suggested that the fundamental design principles can be leveraged in the design [59] of the data warehouse for the Large Synoptic Survey Telescope (www.lsst.org) that is to commence data production in 2012. The telescope will produce 6 petabytes of raw data each year, requiring the data warehouse to grow at an expected rate of 300 TB/year.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S006524581079001X

Computer Data Processing Hardware Architecture

Paul J. Fortier, Howard E. Michel, in Computer Systems Performance Evaluation and Prediction, 2003

2.5.3 Archival storage devices

Even with all of the disk and tape technology available, not all required data for a computer system can be kept on line. To keep data that are only occasionally needed we require archival storage devices. Archival storage devices typically have removable media. If you have access to the new multimedia systems or have a personal computer or workstation for use, you have interacted with a form of archival device: the removable disk, compact disk, or tape cartridge. This represents the most visible form of archival storage device. Data are loaded into the system as needed and removed when completed. The most recent archival storage device developed, the CD read/write drive, has begun to blur the distinction between archival and on-line storage. Many systems use CD drives as enhanced storage for long-term applications memory. Some systems have even gone to the length where these represent the primary on-line storage.

Other, more elaborate, archival systems have been developed that use a combination of mechanical and electrical systems to port media on line and off line. These are similar to compact disk magazines and resemble jukeboxes. When a particular data item is needed, its physical storage location is found, and the medium is placed into the active storage hierarchy on line where the archived data can now be accessed. Again, this is a useful feature when we are talking about a very large database.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781555582609500023

Optical Information Processing

Mir Mojtaba Mirsalehi, in Encyclopedia of Physical Science and Technology (Third Edition), 2003

VI.C.3.a Optical disks

Today, magnetic hard disks and floppy disks are widely used in electronic computers. A relatively new medium for data storage is optical disks, where the information is recorded and read by a laser beam. The main advantage of optical disks is their high storage capacity. A small 3.5- or 5.5-in. Optical disk is capable of storing 30 to 200 Mbytes of information.

Optical disks are of two types: read-only disks and read–write (erasable) disks. The first type is useful for archival storage and storing data or instructions that do not need to be changed. In the second type, the recorded data can be erased or changed. This type of memory is needed for temporary data storage, such as in digital computing. Some of the materials used for nonerasable disks are tellurium, silver halide, photoresists, and photopolymers. Among the candidate materials for erasable disks, three groups are more promising. These are magneto-optic materials, phase-change materials, and thermoplastic materials.

Optical disks are now used in some models of personal computers, and they are expected to become more common. Also, optical disks have been used for archival storage. Two such systems have been developed and installed by RCA for NASA and Rome Air Development Center in 1985. These are optical disk “jukebox” mass storage systems that provide direct access to any part of a stored data of 1013 bits within 6 sec. These systems have a cartridge storage module that contains 125 optical disks, each of 7.8 × 1010 bits storage capacity. This storage size is beyond the capacities that are currently available with other technologies.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122274105005251

Operating Systems

Thomas Sterling, ... Maciej Brodowicz, in High Performance Computing, 2018

11.2.3 Memory Management

By some measures, including cost, an HPC system is mostly data storage. Program data must reside within the memory system, which seen from the architecture perspective is a multilevel hierarchy including registers, buffers, and three layers of cache: main memory, which may be distributed among all nodes, secondary storage, which is still primarily hard disks but increasingly includes nonvolatile semiconductor storage technology, and tertiary storage employing tape cassettes and drives for archival storage. The tradeoffs of all these layers are speed of access and cost of capacity, with reliability and energy also being important. The OS is responsible for data allocation to memory resources and migration between levels. Memory management is also responsible for address translation between the virtual address blocks of program data, called pages, and blocks of physical storage, called frames. The OS manages the page table that maps the page numbers to the frame numbers. In case that a particular page is not in memory, that is a page fault occurs, the OS has to swap the frame from secondary storage into main memory and update the page table accordingly prior to the related process continuing.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124201583000113

Cloud-Based Smart-Facilities Management

S. Majumdar, in Internet of Things, 2016

17.3 A cloud-based architecture for smart-facility management

Fig. 17.1 displays a cloud-based smart facilities management system. The smart facility, a bridge, in the diagram is only an example and can be replaced by other smart facilities.

Figure 17.1. Cloud-Based Smart-Facility Management

Although Fig. 17.1 shows multiple bridges equipped with wireless sensors, a facility-management system can be dedicated to a single facility as well. As opposed to a data-center cloud that typically handles compute and storage resources, this heterogeneous cloud unifies a diverse set of resources that may include compute and storage servers, software tools for data analysis, archival storage systems, and data bases holding various information about the facility, including its maintenance history and data repositories. The system administrator, bridge engineer, and the bridge operator are personnel that are involved in overseeing the management of the smart facility. As shown in Fig. 17.1, multiple levels of networks may be used. A backbone network typically spans multiple geographic regions, such as provinces within a country. Resources or personnel, for example, may be connected to the backbone network through their respective local access networks.

A layered architecture for the system used for managing a smart facility is presented in Fig. 17.2. The network layer provides the necessary support for communication among the various system-resources that include both hardware resources, such as computing and storage devices, as well as data-analysis software. Messages supported by the messaging layer are typically used for intercommunication among components that include these resources, as well as middleware components that are discussed in the following section. The messaging layer uses the underlying security layer to ensure that communication is performed in a secure manner. A broad range of security may be provided: from using virtual private networks to data encryption. The middleware layer and the interface set provide various services for the proper functioning of the smart facility that are discussed next.

Figure 17.2. System Architecture

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128053959000174

Software Detection and Recovery

In Architecture Design for Soft Errors, 2008

8.6.2 Log-Based Backward Error Recovery in Database Systems

Unlike the somewhat generic forward error recovery technique described in the previous section, the error recovery technique in this section is customized to a database program. A database is an application that stores information or data in a systematic way. It allows queries that can search, sort, read, write, update, or perform other operations on the data stored in the database. Databases form a very important class of application across the globe today and are used by almost every major corporation. Companies store information, such as payroll, finances, employee information, in such databases. Consequently, databases often become mission-critical applications for many corporations.

To avoid data loss, databases have traditionally used their own error recovery schemes. Many companies, such as Hewlett-Packard's Tandem division, sold fault-tolerant computers with their own custom databases to enhance the level of reliability seen by a customer. Databases can get corrupted due to both a hardware fault and a software malfunction. The error recovery schemes for databases are constructed in such a way that they can withstand failures in almost any part of the computer system, including disks. This includes recovering from transient faults in processor chips, chipsets, disk controllers, or any other silicon in the system itself.

To guard against data corruption, commercially available commodity databases typically implement their own error recovery scheme in software through the use of a “log.” Database logs typically contain the history of all online sessions, tables, and contexts used in the database. These are captured as a sequence of log records that can be used to restart the database from a consistent state and recreate the appropriate state the database should be in. The log is typically duplicated to protect it against faults.

The rest of this subsection briefly describes how a database log is structured and managed. For more details on databases and database logs, readers are referred to Gray and Reuter's book on transaction processing [3]. There are three key components to consider for a log: sequential files that implement the log, the log anchor, and the log manager. Logs are analogous to hardware implementations of history buffers (see Incremental Checkpointing Using a History Buffer, p. 278, Chapter 7), but the differences between the two are interesting to note.

Log Files

A log consists of multiple sequential files that contain log records (Figure 8.12). Each log file is usually duplexed—possibly on different disks–to avoid a single point of failure. The most recent sequential files that contain the log are kept online. The rest are moved to archival storage. Duplicate copies of each physical file in a log are allocated at a time. As the log starts to fill up, two more physical files are allocated to continue the log. Because the log consists of several duplicated files, its name space must be managed with care.

FIGURE 8.12. Structure of a database log.

Log Anchor

The log anchor encapsulates the name space of the log files. The redundant log files use standard file names ending with specific patterns, such as LOGA00000000 and LOGB00000000, which allow easy generation and tracking of the file names. The log anchor contains the prefixes of these filenames (e.g., LOGA and LOGB) and the index of the current log file (to indicate the sequence number of the log file among the successive log files that are created).

The log anchor typically has other fields, such as log sequence number (LSN), for various log records and a log lock. The LSN is the unique index of a log record in the log. The LSN usually consists of a record's file number and relative byte offset of the record within that file. An LSN can be cast as an integer that increases monotonically. The monotonicity property of a log is important to ensure that the log preserves the relative timeline of the records as they are created. The log anchor typically maintains the LSN of the most recently written record, LSN of the next record, etc.

The log anchor also controls concurrent accesses to the end of the log using an exclusive semaphore called the log lock. Updates to sequential files happen at the end of the file. Hence, accesses to the end of the log by multiple processes must be controlled with a lock or a semaphore (or a similar synchronization mechanism). Fortunately, updates happen only to the end of the log since intermediate records are never updated once they are written. Access to this log lock could become a performance bottleneck. Hence, the log lock must be implemented carefully.

Log Manager

The log manager is a process or a demon that manages the log file and provides access to the log anchors and log records. In the absence of any error, the log manager simply writes log records. However, when an application, a system, or a database transaction reports an error, the log is used to clean up the state. To return each logged object to its most recent consistent state, a database transaction manager process would typically read the log records via the log manager, “undo” the effect of any fault by traversing backward in time, and then “redo” the operations to bring the database back to its most recent consistent state.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123695291500100

Digitization of text and still images

Iris Xie PhD, Krystyna K. Matusiak PhD, in Discover Digital Libraries, 2016

General digitization guidelines

The purpose of guidelines is to ensure the creation of high-quality, sustainable digital objects that support current and intended use and are interoperable and consistent across collections and institutions. The guidelines that have emerged in the cultural heritage community, especially in the United States, are advisory rather than prescriptive in nature. They offer a range of general and technical recommendations but do not constitute a set of formal standards. The Framework of Guidance for Building Good Digital Collections is described as a recommended “best practice” (NISO Framework Working Group, 2007). The most recent guidelines issued by the division of the American Library Association stress that “at this point there is no official standard for digitization, but institutions are discussing how they can collaborate and share digitized content” (ALCTS, 2013, p. 2). This approach offers individual institutions some flexibility but has also resulted in a plethora of published guides and tutorials. Conway (2008) examines 17 guides to best practices in digitizing visual resources and concludes that the lack of standardization has implications for the quality and integrity of digitized objects and may be a hindrance to wider adoption of the guidelines by small and midsize cultural institutions.

The development of best practice guides was spurred by the early adopters of digital technology, such as Cornell University Libraries, the Library of Congress, US National Archives and Records Administration (NARA), and organizations such as Digital Library Federation (DFL), International Library Federation Association (IFLA), and Research Library Group (RLG). Conway (2008) also recognizes the seminal work of imaging specialists and pioneers of digitization, including Michael Ester (1996), Anne Kenney and Steve Chapman (1996), Franziska Frey and James Reilly (1999), Steve Puglia (2000), and Steve Puglia et al. (2004). Their work on imaging concepts and specifications provided the necessary theoretical and technical foundations for developing guides to best practices. The tutorial Moving Theory into Practice developed at the Cornell University Libraries has contributed significantly to the training of librarians and archivists in the concepts and procedures of digitization (Kenney et al., 2000). In addition to the guidelines developed by the Library of Congress and NARA, major collaborative digitization initiatives, such as the California Digital Library (2011) and Colorado Digitization Program (BCR, 2008) issued their own sets of recommendations. Those guides to imaging best practices have in turn influenced the development of guidelines at the state and institutional levels (see Appendix A for an annotated bibliography of selected guides).

The majority of published tutorials and guides to best practice focus on static textual and visual resources, but the underlying principles can also be applied to time-based media. The guidelines emphasize digitization at the highest quality to capture informational content and attributes of analog source materials in order to create accurate and authentic digital representations. Recently released guidelines build upon foundational concepts but offer higher technical specifications that reflect the current digital environment. The approach that has emerged is to offer minimum capture recommendations for a variety of static and time-based media with an understanding that unique characteristics of source materials may require variations in the specifications. A set of accepted minimums is, however, recommended to create sustainable digitized content (ALCTS, 2013).

The following list provides a summary of the general digitization principles presented in a number of currently available guides (ALCTS, 2013; BCR, 2008; FADGI, 2010; Yale University, 2010):

•

Digitize at the highest resolution appropriate to the nature of the source material

•

Use standard targets for measuring and adjusting the capture metric of a scanner or digital camera. Grayscale or color targets provide an internal reference within the image for linear scale and color information.

•

Create and preserve master files that can be used to produce derivative files and serve a variety of current and future use needs

•

Create digital objects that are accessible and interoperable across collections and institutions

•

Ensure a consistent and high-level quality of digitized objects

•

Digitize at an appropriate level of quality to avoid recapture and rehandling of the source materials

•

Digitize an original or first generation of the source material

•

Create meaningful metadata for digitized objects

•

Provide archival storage and address digital preservation of digitized objects

The general guidelines assume a use-neutral approach that has been strongly recommended since the early days of digitization projects (Besser, 2003; Ester, 1996; Kenney, 2000). It implies that a source item is digitized once and at the highest level of quality affordable to meet the needs not only of an immediate project but also of a variety of future uses. The goal of this approach is to create high-quality digital representations and to avoid redigitizing in the future. The use-neutral approach is an important component of digitization best practices, as it addresses not only the current needs but also, as Besser (2003) emphasizes, “all potential future purposes” (p. 43). It includes the notion of digital master files (sometimes referred to as archival masters) and derivatives. Ester (1996), who introduced the concepts of digital archival and derivative images, notes “an archival image has a very straightforward purpose: safeguarding the long-term value of images and the investment in acquiring them” (p. 11). In addition to the difference in purpose and use, digital masters and derivatives also differ in regard to file attributes such as size, compression, dimensions, and format.

Digital masters are created as a direct result of the digital capture process and should represent the essential attributes and information of the original material. Digital masters are supposed to be “the highest quality files available” (Besser, 2003, p. 3). They should not be edited or processed for any specific output. Because the process of creating digital masters usually results in large file sizes, digital masters are not used for online display. In fact, many archival formats such as TIFF are not supported by major web browsers. Their primary function is to serve as a long-term archival file and as a source for derivative files. Digital masters are stored in digital repositories for long-term preservation. General recommendations for digital master file creation include:

•

Digitize at the highest quality affordable

•

Save as an uncompressed file

•

Use standard, nonproprietary file formats, such as TIFF for static media (text or still images) or WAV for audio

•

Do not save any enhancements in an archival copy

•

Use an established file-naming convention

Derivatives are created from digital master files for specific uses including presentation in digital collections, print reproductions, and multimedia presentations. General recommendations for derivative files include:

•

Reduce the file size so it can load quickly and be transferred over networks

•

Use standard formats with lossy compression such as JPEG

•

Use standard formats supported by major web browsers

Table 3.1 provides a summary of formats recommended for digital masters and derivatives based on analog source type. File format is an essential component, as it provides an internal structure and a “container” for digitized content. Unlike physical objects, digital files do not exist in an independent material form. Digital data is stored in file formats and requires hardware and software to be rendered. The Sustainability of Digital Formats site at the Library of Congress provides a working definition of formats as “packages of information that can be stored as data files or sent via network as data streams (also known as bitstreams, byte streams)” (Library of Congress, 2013).

Table 3.1. Recommended File Formats for Digital Masters and Derivative Files

Analog Material	Digital Masters	Derivatives
Text	TIFF	JPEG, PDF
Photographic images (prints, negatives, slides)	TIFF	JPEG, JPEG 2000
Audio recordings	WAV/BWF	MP3
Moving image (video, film)	JPEG 2000/MXF	MPEG-4 (MP4)

File formats vary in their functionality and attributes. The master file format needs to be platform independent and have a number of attributes, such as openness, robustness, and extendibility, to support the rich data captured during the conversion and to ensure its persistence over time as technology changes (Frey, 2000a). The selection of an appropriate format has implications for access across platforms and transfer over networks as well as storage and long-term preservation. The Framework of Guidance for Building Good Digital Collections states as one of its principles: “a good object exists in a format that supports its intended current and future use” (NISO Framework Working Group, 2007, p. 26). The section of this chapter on technical factors provides an overview of the recommended formats for static media, including TIFF, JPEG, JPEG 2000, PDF, and PNG. Audio and moving image formats are discussed in more detail in Chapter 4.

General guidelines also include recommendations for establishing a file-naming convention. File names for digital masters and derivatives need to be determined before the digital capture process begins and preferably follow a convention adopted by the parent institution or department. Digital files should be well organized and named consistently to ensure easy identification and access. Systematic file naming helps not only to manage the project but also ensures system compatibility and interoperability. File names can be either nondescriptive or meaningful. Both approaches are valid, but each has its pros and cons (Frey, 2000a; Zhang and Gourley, 2009). Selecting a file-naming convention for digitization requires long-term thinking and a good understanding of the scope of the project and/or the size of the original collection. File-naming recommendations include:

•

Assign unique and consistent names

•

Use alphanumeric characters—lowercase letters and numbers 0 through 9

•

Avoid special characters, spaces, and tabs

•

Include institutional IDs (if available)

•

Number files sequentially using leading zeros

•

Use a valid file extension, such as .tif, .jpg, or .pdf

•

Limit file names to 31 characters, including the three-character extension; or if possible, use 8.3 convention (8 characters plus three-character extension)—for example, aa000001.tif

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012417112100003X

Digitization of audio and moving image collections

Iris Xie PhD, Krystyna K. Matusiak PhD, in Discover Digital Libraries, 2016

Audio digitization process

The process of converting analog sound recordings into a digital format and creating sustainable digital assets consists of multiple phases, including planning and selection, digital capture, processing, metadata creation, ingesting into a digital library management system, and digital preservation. Similar to the digitization of other materials, whether static or time based, the actual conversion is one of the many steps in the cycle of preservation reformatting. The general digitization steps and principles described in Chapter 3 apply to the conversion of sound recordings. Audio digitization also makes a distinction between master files and derivatives. Master files created as a direct result of audio capture serve as preservation copies and a source of smaller derivatives for online access. Audio obviously requires different conversion equipment than static media and raises unique challenges related to its time-based nature and preservation concerns.

Each digitization project comes with its own set of unique requirements and demands individualized planning with regard to technological requirements, selection and restoration of source items, staffing, cost, and archival storage (Mariner, 2014). Time is an important factor that needs to be taken into consideration during the planning phase. Unlike a relatively fast scanning of documents, digitization of time-based media involves playing an analog recording in real time. A 60-min cassette tape actually requires 60 min to convert to a digitized copy. The condition of the analog source items needs to be assessed during the selection process to identify the best copy and/or to address the conservation needs of degraded or damaged materials. The preparation of materials for reformatting requires restorative procedures, and depending on the level of degradation, may include cleaning, flattening discs, straightening twisted tapes, or rehousing them into new shells (Graves, 2014; IASA, 2009).

Digital capture represents the most critical part of the conversion process. As IASA guidelines emphasize, “optimal signal extraction from original carriers is the indispensable starting point of each digitization process” (IASA, 2009, Section 1.4). During the capture or, using IASA terminology, extraction process, an analog source recording is played using an appropriate playback device, such as a tape or record player. An analog sound wave is sampled through an analog-to-digital converter and the digital signal is recorded, processed in audio editing software, and stored, preferably in a file-based repository system. The files created as a result of the extraction process should represent high-quality masters and should be saved uncompressed in the standard preservation format. Audio digitization guidelines recommend creating high-quality master files for preservation purposes and derivatives for access (CARLI, 2013a; IASA, 2009). The IASA guidelines cite two major reasons for digitization at the highest quality possible: “firstly, the original carrier may deteriorate, and future replay may not achieve the same quality, or may in fact become impossible, and secondly, signal extraction is such a time-consuming effort that financial considerations call for an optimization at the first attempt” (IASA, 2009, Section 5.1.1). The converted files usually require some processing in order to adjust audio quality and remove signal distortion. The enhancements are limited by the quality of original sound recording. As Weig et al. (2007) note, “regrettably, little can be done to correct analog recordings that are, for whatever reason, marred by distortion from the beginning” (p. 5).

Weig et al. (2007) describe the workflow of the audio conversion project conducted by the Louie B. Nunn Center for Oral History and University of Kentucky Libraries. The selection of oral history interviews on audiotapes and preparation of tapes were followed by analog-to-digital conversion and master file generation, quality enhancement, and the production of derivative files in the mp3 format. Master files and edited service files were archived, while derivatives with associated metadata and transcripts were uploaded to the server for online access. Metadata creation occurred at several points in the workflow.

Detailed metadata is essential for resource discovery, access, and retrieval in digital collections but is especially important in the case of sound recordings because audio content can’t be browsed visually or searched by keyword. Metadata records provide the only access points to the rich content of sound recordings. Access to oral history narratives and other voice recordings can be enhanced by adding transcripts. This approach, although time-consuming if transcripts have to be generated as part of a digitization project, provides an option of presenting a textual version of the recording alongside the playable audio. Transcripts can provide full-text searchability and often include time stamps to enable the user to select parts of a recording or to follow it alongside the text. As described by Weig et al. (2007), transcript and metadata creation represented an independent step in the digitization of oral histories at the Louie B. Nunn Center for Oral History, but metadata was also recorded at other steps in the conversion cycle. Fig. 4.5 demonstrates an example of a transcript presented along with an oral history recording from the Robert Penn Warren Civil Rights Oral History Project created at the Louie B. Nunn Center for Oral History. The excerpt comes from an interview with Martin Luther King, Jr conducted by Robert Penn Warren on Mar. 18, 1964. The interview is available at http://nyx.uky.edu/oh/render.php?cachefile=02OH108RPWCR03_King.xml.

Figure 4.5. Oral History Recording with a Transcript

Robert Penn Warren Civil Rights Oral History Project.

Access files with associated metadata are ingested into a digital library management system (DLMS) for online presentation. Online delivery of audio recordings also requires a streaming service. Many open source and proprietary DLMS, including Omeka, Collective Access, and CONTENTdm, include audio players and support standard access formats, such as mp3. Ingesting digitized audio files with associated metadata into a standard-compatible DLMS ensures interoperability and allows for integrating sound recordings with other digitized objects in digital library systems. Hosting options are available to cultural heritage institutions with limited digital library infrastructure and/or no access to streaming servers. Internet Archive provides a free platform to educational institutions and individuals and offers support for hosting and preserving audio and video files (Internet Archive, 2015). Audio and video objects represent a significant portion of the Internet Archives’ collections. The Avalon Media System is a new, open source system for managing and providing access to large collections of digital audio and video. It was developed by Indiana University Bloomington and Northwestern University with support from the National Leadership Grant from the Institute of Museum and Library Services. The Avalon Media System is freely available to libraries and archives and provides online access to their audiovisual collections for teaching, learning and research, and preservation and long-term archiving (Avalon Media System, 2015).

Digital preservation involves depositing master files into a trusted institutional or shared repository, the ongoing management of deposited audio files, and long-term preservation planning. As emphasized in the IASA guidelines, “preservation planning is the process of knowing the technical issues in the repository, identifying the future preservation direction (pathways), and determining when a preservation action, such as format migration, will need to be made” (IASA, 2009, Section 6.4.1.3). Archival storage of audio master files is a major concern because of the large size of individual files. For example, 1 h of audio digitized at 96 kHz and 24 bit with 2 channels produces a file of 1.93 GB (CDP, 2006). Digital repositories have to not only provide sufficient storage space for audio digitization but also supply capabilities for efficient transfer, management, and long-term preservation. Digital preservation is discussed in more detail in Chapter 9.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124171121000041

Tomography

Z.H. Cho, in Encyclopedia of Physical Science and Technology (Third Edition), 2003

II.B.3 Computer

The need for multidimensional image processing with more rapid reconstruction and higher image quality expedited the evolution of computer technology designed for imaging in general and for CT in particular. The amount of data memory and computations necessary for the reconstruction is large, and the trend is toward even larger amounts with higher resolution X-ray CT systems and with NMR CT. For example, to obtain a single slice image of 512 × 512 pixel size in X-ray CT, the required number of projections from different view angles is ∼800 each, with more than 512 sample points. The use of a large number of sampling points is designed primarily to reduce interpolation errors. For 12-bit resolution, a memory of more than 5 megabits is needed to hold the measurement data. Reconstruction, in which convolution operations are often performed with fast Fourier transform, requires more than 40 million operations of multiplication and 250 million operations of addition. As an example, the number of required computations is summarized in Table II for both FB reconstruction and direct Fourier imaging.

TABLE II. Number of Computations Required for Reconstructing a Cross Section of 512 × 512 Image by Convolution Backprojection and Direct Fourier Imaging

Convolution backprojection	1024-Point FFTa	20,000	Multiplications and additions
	Kernel multiply	2,000	Multiplications and additions
	1024-Point IFFTb	20,000	Multiplications and additions
	Interpolation/view	10,000	Multiplications and additions
	Backprojection/view	262,000¯	Additions
		42 × 106	Multiplications
		251 × 106	Additions
Direct Fourier imaging	Phase correction/line	2,000	Multiplications and additions
	512-Point FFTa	10,000¯	Multiplications and additions
		11 × 106	Multiplications and additions

aFFT, Fast Fourier transform.bIFFT, Inverse fast Fourier transform.

The general structure of a CT computing system is illustrated in Fig. 5. In this figure, the measured data are transmitted directly to a computer processing system or to an archival storage with magnetic disk, magnetic tape, or refresh memory. The measured data are transferred to a processor, either on-line or after completion of scanning, depending on the measurement speed and computer processing capability. In some CT systems, simple operations are carried out during data acquisition, and partially processed data are stored in memory or on disk. The processor can be either a general-purpose computer or special processor. The computational speed can sometimes be increased by as much as 10 to 100 times by combining special processors, such as array processors or backprojectors. They can be used most efficiently for structured data formats such as arrays or vectors.

FIGURE 5. Block diagram of the general CT computational system.

The internal structure of the array processor comprises four functional units interconnected by internal buses (Fig. 6). The functional units are a host interface (which is system specific and provides communication with the host bus), a control processor (which controls the overall subsystem), a data memory (which acts as a data and table storage area), and a pipelined arithmetic unit (which performs high-speed computation). In addition, an input–output interface can be used to store measured data and to transfer the reconstructed image directly to the display system without using the host computer. The speed advantage gained by the array processor is made through the parallel processing of a large number of data read from the memory of the processor, which uses its own bus, and through the use of a pipeline structure in the arithmetic unit. Distributive processing techniques can also be used to divide the processing load between the host and array processor, maximizing the efficiency of both systems.

FIGURE 6. Block diagram of the internal structure of an array processor.

The backprojector performs high-speed back-projection. For example, many CT systems, especially X-ray CT, are currently hindered by the backprojection operation, and therefore additional hardware computing devices like backprojectors are usually added to allow the whole image reconstruction process to be carried out almost instantaneously.

The computational speed of the special computing processor is often measured in units of million (mega) floating point operations per second (MFLOP). Through the use of array processors, more than 100 MFLOP can be easily attained. The entire processing of the data for the reconstruction depicted in Table II, for example, can be completed in only a few seconds.

Finally, reconstructed images are stored in the refresh memory, magnetic disk, or magnetic tape, depending on the amount of storage, transfer rate, access time, cost, and other factors.

Besides reconstruction, the computer provides machine control, pulse sequence control, display, and data handling. The dedicated microcomputer controller elements are therefore becoming increasingly important. They allow the implementation of highly efficient input–output operations as well as computational structures. The host computer therefore becomes an interface with the operator and data acquisition elements in the CT scanner and related data bank.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122274105009650

Measuring and evaluating

Bob Boiko, Daniel M. Russell, in Keeping Found Things Found, 2008

8.6 Can self-study of PIM practices contribute to the larger study of PIM?

The study of PIM—whether to understand current practices or to evaluate a proposed change, a new tool or strategy—is not easy. Let's consider again some of the challenges:

•

A person's practice of PIM is unique and a “work in progress.” It reflects the person's personality, cognitive abilities, experience, training, various roles at work and elsewhere, available tools, available spaces, and so on. Even people who have a great deal in common with respect to profession, education, and computing platform nevertheless show great variation in their practices of PIM.27

•

PIM happens broadly across many tools, applications, and information forms. Moreover, people freely convert information from one form to another to suit their needs: emailing a document, for example, or printing out a web page. Studies and evaluations that focus on a specific form of information and supporting applications—email, for example—run the risk of optimizing for that form of information but at the expense of a person's ability to manage other forms of information.

•

PIM happens over time. Personal information has a life cycle—moving, for example, from a “hot” pile to a “warm” project folder and then, sometimes, into “cold” archival storage.28

Because PIM activities also combine over time, point-in-time evaluations can be very misleading. For example, a tool may make it very easy to create web bookmarks—a keeping activity—but then provide little support for remembering to use these bookmarks later—a finding activity. Support for both activities must be considered. However, for a given information item, acts of finding and keeping may be separated from each other by months or even years.

•

Getting people to participate is not easy. Busy people who manage lots of information are of special interest in the study of PIM. But these are the very people least likely to have time to participate. College students are sometimes willing to participate for money or for college credit. Colleagues or friends may participate as a favor or out of curiosity. Every participant, even fellow researchers in an academic or corporate research lab, have something to contribute to the study of PIM.

But different groups and different individuals have very different slants on PIM. This was made clear for me as I taught a class on PIM to undergraduates—all in their early to mid twenties—at the University of Washington. I used a discussion on email management similar to one I had presented several times before to older audiences. But reactions were not the same. Instead of nods of agreement, I got blank stares. As I probed, I came to understand that these students had not yet experienced email management problems of the kind I was describing. In particular, they did not need to manage multiple conversational threads relating to multiple projects extending over a period of weeks or months. Their uses of email, intermixed with instant messaging and text messaging on their phones, was much more immediate (e.g., “What are you doing tonight?”). Conversations were short-lived, and the students had little need to manage multiple conversations over extended periods of time. On the other hand, students described their use of elaborate systems to synchronize collections of music and video between different computers and playing devices.

My research colleagues, by contrast, produce—well—research, in the form of “papers” published in proceedings and journals. They also teach courses. In their informational worlds, issues relating to the management of text and graphics are of critical importance. Such issues include version control and the appropriate reuse of components.

The challenges faced by students and researchers, in turn, are each very different from those of the proverbial soccer mom who may be shifting rapidly between several different activities in a given day as she balances between the demands of job, family, and perhaps school or community volunteer work. If our soccer mom also manages other people at her job, this adds further challenges of information management. And if some day she or a family member is diagnosed with a serious illness, she will face yet another kind of information management challenge.

Participant availability acts as a kind of lamppost. Certainly we'll look under the lamppost, and some things we see there are true for the dark spaces beyond. But … are there ways to light up the rest of the street as well? One approach is to provide new ways for a greater variety of people to contribute to the study of PIM. People are, or can be with some self-observation, experts on their own practices of PIM. How can this expertise be shared? How can our collective experiences with our practices of PIM inform the study of PIM?

8.6.1 A shared study of PIM

One approach is to engage people as active collaborators rather than merely the “subjects” of a study. Certainly, those of us who have done studies in PIM see a familiar pattern repeat: people initially reluctant to participate in a study may then, voluntarily, continue enthusiastically discussing their own practice of PIM well beyond the scheduled period of an interview or observation. Why? Obviously, self-interest may be involved. If interviews encourage study participants to reflect on their own practices of PIM, they may arrive at useful insights. Partly also, people like talking about themselves, and many participants are justly proud of their own creative, “home-grown” solutions to PIM. People do like to share their experiences for other reasons as well.

A testament to such sharing is a diversity of problem-specific bulletin boards to which people contribute. In the Keeping Found Things Found (KFTF) project at the University of Washington, we have developed a bulletin board called “Tales of PIM” (http://talesofpim.org) in an effort to provide people with a forum for sharing their PIM-related experiences—good and bad, successes and failures.

Are there other venues that might work as well? We might imagine highly motivated people forming collectives for the purpose of exchanging problems and insights concerning their practices of PIM. It is already a common sight to see two or more people showing each other the features of their smart phones or digital cameras and exchanging tips on how to use these gadgets. A similar exchange can happen as one person looks over the shoulder of another while a computer is being used. Tips may be exchanged concerning how to use an email application, the desktop, or a web browser. However, the context of a PIM cooperative might encourage a wider exchange of PIM-related information.

People might be motivated between meetings to keep a diary or to subscribe to a variation of experience sampling where they are periodically prompted (perhaps by their mobile phone) to relate PIM events that have happened recently. Both the diary and experience sampling are methods of data collection used in the study of PIM. These methods might also be used in the practice of PIM. And with some incentive—the recognition of a published report, for example—cooperatives might be motivated to share their results with others to further the study of PIM.

What Now for You and Me?

Methods of measurement and data collection described in this chapter—CIT, ESM, and the use of a PSI confidant—may work for some of us some of the time. In other cases, our evaluating will be guided by our memories and data already gathered or easily collected. Regardless, we should take care in the way we frame and situate our evaluations.

•

Make the choice real. Be sure to consider real choices rather than “Do you like it, yes or no” questions. When considering adoption of a new tool, scheme of organization, or strategy, the status quo—the current version of same—is always an option to be included in considerations.

•

Situate. Consider choices with respect to and, preferably in, real situations in your practice of PIM. For example, if you're considering the purchase of a new laptop and routinely do some of your work at a local coffee shop, then see if you can borrow a friend's laptop of the same make and model. Try it out at the coffee shop. You may be surprised by the results. That wonderful big screen is nice, but the increased size of the new laptop makes it impossible to actually use in your lap while sitting in a chair and balancing your coffee cup with one hand.

•

Sample over two or more days if time permits and under different situations of information management and use.

•

There is no such thing as not deciding. Recognize that any decision you make—even a decision to postpone or not decide—is itself a decision. Saying yes to a bad change means costs of a false positive. Saying no (if only for now) to a good change means the costs of a miss (“miss” as in missed opportunities, for example).

•

Frame your choices in terms of cost as well as benefit. If your preferences change, then look more closely at the alternatives and at the values you are assigning to their relevant costs, benefits, and the variability of these. Consider your choices again on successive days or over some predetermined period of time.

•

For any choice, consider its many impacts—direct and indirect, intended (e.g., by tool designers) and not. Consider the impact with respect to standard measures of usability as discussed in Section 8.3. More important, consider its potential impacts on your four precious resources (money, energy, attention, time), the six senses in which information can be personal to you and, especially, the seven kinds of PIM activity (as listed in Table 8.1).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123708663500102

What is the term for files copied to a secondary location for preservation purposes?

Backup. files copied to a secondary location for preservation purposes.

A network consists of two or more computers that are linked in order to share resources (such as printers and CDs), exchange files, or allow electronic communications.

What is the three or four letter identifier found at the end of a file name that follows a period called?

A file extension is a three- or four-letter identifier found at the end of a file name and following a period. These extensions tell you about the characteristics of a file and its use.