This interdisciplinary glossary for research data management was developed to facilitate communication between the diverse stakeholders engaged in this area. The initial compilation from a variety of online sources was reviewed and refined by a wide range of vocabulary and domain specific experts. Community sourcing was used to refine the glossary.
The need for such a glossary emerged as an increasing number of new terms have come into use to refer to new concepts and as terms borrowed from other disciplines have been given new meaning. The aim is a stable glossary of community accepted definitions kept relevant by maintaining a 'living resource' that is updated when necessary.
IRiDiuM was originally developed and maintained by Research Data Canada (RDC) Standards & Interoperability Committee (SINC) in partnership with the international Consortia Advancing Standards in Research Administration Information (CASRAI).
Version 0.1.0 (alpha)
Short Definition: The continued, available for use, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for. Users who have access can retrieve, manipulate, copy, and store copies on a wide range of hard drives and external devices.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://www.techopedia.com/definition/26929/data-access
Short Definition: A list is used for permission matched against credentials.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Peter Wittenberg, Tobias Weigel and Tim Dilauro
Short Definition: Given a data object name, access controls define access relationships between the following metadata: data object name, a user name (or user group, or user role), and access permission. The information can be stored as metadata information associated with each data object. The information can be generated dynamically by applying the access controls of the collection that organizes the data objects (if a collection sticky bit is turned on).
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; "Broadening the Scope of Nanopublications?" by Kuhn et al (20130 See http://www.tkuhn.ch/pub/kuhn2013eswc.pdf P. Groth, A. Gibson, and J. Velterop. The anatomy of a nano-publication. Information Services and Use, 30(1):51-56, 2010 http://nanopub.org/guidelines/working_draft/ A Nanopublication Framework for Biological Networks using Cytoscape.js ttp://ceur-ws.org/Vol-1327/icbo2014_paper_57.pdf
Synonym: Sticky bit
Short Definition: A URL of the resource that gives access to a distribution of the dataset. e.g., landing page, feed, SPARQL endpoint. The access URL should be used for the URL of a service or location that can provide access to this distribution, typically through a Web form, query or API call.
Reference: Data Catalog Vocabulary (DCAT) version 2 (w3.org)
Short Definition: A type of access entity that contains the services and functions which make the data object holdings and their information content and related services visible to data consumers.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
Short Definition: Numbers used by the National Center for Biotechnology Information (NCBI) that are unique and citable.
Reference: MIT data management and publishing
Short Definition: A storage location for a collection of data that is too valuable to discard, but is only accessed occasionally.
Short Definition: A "data object" which is generated dynamically by executable code, but that can be referred to and thus be cited.
Synonym: Active collection
Short Definition: Something that happens or is done only when the situation makes it necessary or desirable, rather than being arranged in advance or being part of a general plan.
Short Definition: Testing carried out using no recognised test case design technique. A test executed without prior planning, especially if the expected test outcome is not predicted beforehand. An undocumented test.
Synonym: Ad lib testing
Short Definition: Information collected primarily for administrative, and not research purposes. It includes profiles and curriculum vitae of researchers, the scope and impact of research projects, funding, citations, and research outcomes. This type of data is collected by government departments and other organisations for the purposes of registration, transaction and record keeping, usually during the delivery of a service. These data are also recognized as having research value.
Reference: U.K. Administrative Data Research Network ; DAMA Dictionary of Data Management. For alternative definitions ; Data Curation Centre (DCC)/TC3+; TBS Standard on Geospatial Data (Government On-line Metadata Standard); TBS Standard for Electronic Documents and Records Management Solutions; IOC Oceanographic Data Exchange Policy; UNESCO Open Access Policy Guidelines; Environment Canada data stewardship handbook (draft).
Short Definition: Used to manage administrative aspects of the digital objects such as intellectual property rights and acquisition. Administrative metadata also documents information concerning the creation, alteration, and version control of the metadata itself. This is sometimes known as meta-metadata.
Reference: DCC/TC3+
Synonym: Meta-metadata
Short Definition: Data that are expressed in a summary form (e.g., summary statistics)
Short Definition: The bringing together of elements. Types of aggregations differ by the nature of the processes by which elements are brought together and the reason understood for aggregating or contained as a unit. Aggregations differ in the nature of relations between the member parts.
Short Definition: A computable set of steps to achieve a desired result.
Reference: NIST Dictionary of Algorithms and Data Structures
Short Definition: 1. In a legal context: the transfer of ownership of property. 2. In the context of records management: the transfer of ownership of data.
Short Definition: Data in the form of analogue materials.
Synonym: Analogue materials
Short Definition: Non-digital materials that have a physical presence (e.g., written and printed material).
Short Definition: Continuous electronic signals.
Short Definition: All those processes and procedures designed to ensure that the results of laboratory analysis are consistent, comparable, accurate and within specified limits of precision.
Short Definition: The discovery of meaningful multidimensional patterns in data.
Short Definition: 1. A rule, practice, or observation that is different from what is normal or usual.
Short Definition: A form of privacy that is not usually needed or wanted. There are occasions, however, when a user may want anonymity (for example, to report a crime). The need is sometimes met through the use of a site, called a remailer that re-posts a message from its own address, thus disguising the originator of the message. Unfortunately, many spam distributors also take advantage of remailers.
Short Definition: an XML definition for exchange of information relating to security vulnerabilities of applications exposed to networks.
Short Definition: The application of existing scientific and professional knowledge to develop practical applications in a scientific field (e.g., actuarial science, agriculture, biology, chemistry, forestry, meteorology, physics, planetary and earth sciences), scientific regulation, or patent.
Long Definition: Activities include the: (a) Evaluation of actuarial liabilities and the determination of premiums and contributions in respect of insurance, annuity and pension plans; (b) Promotion, development and regulation of the agricultural industry and trade including the planning or conduct of quality control, regulatory and production programs, the analysis of agricultural markets and production trends, the regulation of market practices or the administration of financial incentives or subsidies; (c) Analysis, identification, interpretation, classification, measurement, survey and management of biological resources, organisms or systems; and the analysis and interpretation of biological data; (d) Analysis, interpretation, classification, measurement and survey of the chemical composition, properties and behaviour of matter; the development of analytical or survey methods, instruments or standards; the planning, conduct and evaluation of studies or projects; the integration of scientific information from different specialized areas; and the writing, reviewing or evaluation of papers, reports, contracts or agreements; (e) Promotion and development of forest resources; the planning, design, development and maintenance of forest surveys, inventories and databases; the management of forests; the administration of cooperative arrangements with government and non-governmental organizations providing assistance in the various fields of forestry; the provision of advice on the development of forest policy and liaison with the various communities of interest in the forest industry; and the coordination and transfer of technology on a regional, national and international level; (f) Analysis and forecasting of weather and climatic phenomena; the development of instruments, methods and standards for observing and recording atmospheric phenomena; and the development, application and provision of data, information and advice in the application of meteorology to economic and environmental problems in a country; (g) Analysis, identification, interpretation, classification, measurement, survey and management of earth resources and the behaviour of earth and space and related systems; and the provision of data, information and advice in the application of geosciences and physical sciences to economic and environmental issues; (h) Study of the physical properties of medical devices or radiation emitting devices for the purpose of evaluating their safety or efficacy; the development of analytical or survey methods, instruments or standards; the planning, conduct and evaluation of studies or projects; and the integration of scientific information from different specialized areas; (i) Inspection or evaluation of techniques and technical processes and products to ascertain conformity with prescribed standards; the regulation of the distribution and control of drugs liable to abuse; the appraisal of submissions with respect to drugs; the inspection of the manufacture, storage, disposal, transportation and handling of dangerous commodities; and the regulation of environmental hazards; (j) Inspection of the manufacture, processing, distribution, labelling or advertising of foods, drugs, cosmetics or medical devices for the purpose of protecting the public from health hazards or fraudulent or misleading advertising or labelling; and the provision of regulatory advice for the determination of the status of a particular product as to its identity as a drug, food, cosmetic or medical device; (k) Inspection of the manufacture, storage, disposal, transportation and handling of dangerous commodities such as flammable, explosive, poisonous, corrosive and radioactive materials; (l) Inspection for the assurance of the quality of goods and services purchased under contract; (m) Development of regulations and policies dealing with regulated products, foods, cosmetics, explosives and other consumer products and the evaluation of proposed regulatory actions resulting from inspection; (n) Planning and conduct of studies, the evaluation and interpretation of information and scientific research papers, reports, contracts or agreements, and the provision of advice in the above programs; (o) Planning, coordination and management of technology transfer in any of the above activities; and, (p) Leadership of any of the above activities.
Short Definition: Fundamental organization of a system embodied in its components, their relationships to each other and to the environment, and the principles guiding its design and evolution. The term is not always used in normative or prescriptive ways. In some cases, the architecture may need to be flexible and thus more of an open framework rather than being a fixed set of components and services equal to everyone.
Short Definition: The ongoing usefulness or significance of records, based on the administrative, legal, fiscal, evidential, or historical information they contain, justifying their continued preservation.
Long Definition: In general, records with archival value are estimated to make up only a small percentage of an organization's records. In most organizations, the determination of which records are considered to have archival value is made by archivists. Sometimes, archivists distinguish between the concepts of historical value and archival value. In such cases, historical value is defined narrowly as the value of a record to support research in the history of people and the world, and archival value is defined broadly to encompass value that supports any type of research using permanent records, including psychological, sociological, and other types of scientific research.
Reference: Society of American Archivists dictionnary
Synonym: Historical value
Short Definition: A place or collection containing static records, documents, or other materials for long-term preservation.
Reference: ACTI-DM Working Group/Educause
Short Definition: 1. A curation activity that ensures that data are properly selected, stored, and can be accessed, and for which logical and physical integrity are maintained over time, including security and authenticity.
Reference: JISC/TC3+
Short Definition: Artificial intelligence (AI) is the simulation of human intelligence processes by computer systems. This includes learning, reasoning, problem-solving, perception, human language understanding, and autonomous decision-making. AI methods include rule-based expert systems, machine learning, neural networks, deep learning, computer vision, and natural language processing. Generative AI refers to a category of artificial intelligence that is designed to generate new content, such as text, images, audio, video, computer code, or other types of data.
Short Definition: Data that are at risk of being lost. At-risk data include data that are not easily accessible, have been dispersed, have been separated from the research output object, are stored on a medium that is obsolete or at risk of deterioration, data that were not recorded in digital form, and digital data that are available but are not useable because they have been detached from supporting data, metadata, and information needed to use and interpret them intelligently.
Reference: http://www.iedro.org ; http://www.wmo.int/pages/prog/hwrp/datarescue.php
Short Definition: An independent evaluation of an organization, system, process, project or product.
Synonym: Investigation
Short Definition: 1. The process of confirming the identity of a principal. Since computer identification cannot be absolute (e.g., passwords can be stolen), authentication relies on a related concept of level of trust, in which an institution relies on good identity management practice (so that the institution believes they have correctly identified an individual) and secures mechanisms for sharing identity. This is sometimes referred to as AuthN (authentication), in contrast to AuthZ (authorization). 2. A mechanism which attempts to establish the authenticity of digital materials at a particular point in time. For example, digital signatures.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Internet 2/Educause; Digital preservation coalition
Short Definition: A type of metadata that conveys information needed to link a data object to its original source.
Short Definition: Automation uses technology to perform tasks with minimal human intervention, typically repetitive or rule-based processes. Automation executes predefined, repetitive tasks efficiently. It includes robotics, process automation, and algorithmic tasks without the need for learning or adaptation.
Short Definition: Observable and measurable knowledge, skills, abilities or personal characteristics needed to achieve performance output or outcome needs.
Short Definition: A technique or methodology that, through experience and research, has proven to reliably lead to a desired result. A commitment to using the best practices in any field is a commitment to using all the knowledge and technology at one's disposal to ensure success. The term is used frequently in the fields of health care, government administration, the education system, project management, hardware and software product development, analytical chemistry, and elsewhere.
Synonym: Professional standard
Short Definition: 1. An evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that have the potential to be mined for information. 2. Data that would take too much time and cost too much money to load into relational databases for analysis (typically petabytes and exabytes of data). 3. Extensive datasets/collections/linked data primarily characterized by big volume, extensive variety, high velocity (creation and use), and/or variability that together require a scalable architecture for efficient data storage, manipulation, and analysis. In general, the size is beyond the ability of typical database software tools to capture, store, manage and analyze. It is assumed that as technology advances over time, the size of datasets that qualify as big data will increase. Also the definition can vary by sector, depending on what kind of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; McKinsey Global Institute - Big data: the next frontier for innovation, competition and productivity as quoted by the TC3+ in their October 2013 consultation document: Capitalizing on Big Data: Towards a Policy Framework for Advancing Digital Scholarship in Canada. http://bigdatawg.nist.gov/_uploadfiles/M0392_v1_3022325181.pdf ; http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data
Short Definition: A representation of digital content in an assembly of the fundamental unit of digital bits.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Weigel et al., 2013. "A Framework for Extended Persistent Identification of Scientific Assets". http://dx.doi.org/10.2481/dsj.12-036
Short Definition: An unstructured sequence of bits that is identified as a unit (e.g., bits in a communication transmission). It may be stored as a unit or may exist as a pattern and be generated. A digital object may be represented as a bit stream of finite length that encodes its informational content.
Short Definition: Any device whose workings are not understood by or accessible to its user.
Short Definition: A design for a framework that can be re-used and re-purposed by applying minor changes that do not require changing the underlying design principles.
Short Definition: Digital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form. This term is used to differentiate them from 1) digital materials which have been created as a result of converting analogue originals; and 2) digital materials, which may have originated from a digital source but have been printed to paper, e.g. some electronic records.
Reference: Digital preservation coalition
Short Definition: A data value that corresponds to a minimum or maximum input or output value specified for a system or component.
Short Definition: A coding error in a computer program which causes the program to perform in an unintended or unanticipated manner.
Short Definition: A data collection that has been normalized by some established criteria to allow for effective data management. Examples include: data files that belong to a certain experiment, all files that are created by one specific simulation, all files that belong to a specific observation (same day, same place, etc.).
Short Definition: 1. A type of collection that describes, and points to features of another collection. 2. In the context of a data catalogue: A curated collection of metadata about resources (e.g., datasets and data services).
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Rajasekar, R., M. Wan, R. Moore, W. Schroeder, S.-Y. Chen, L. Gilbert, C.-Y. Hou, C. Lee, R. Marciano, P. Tooby, A. de Torcy, B. Zhu, "iRODS Primer: Integrated Rule-Oriented Data System", Morgan & Claypool, 2010. ; https://www.w3.org/TR/vocab-dcat-3/#Class:Data_Service
Short Definition: An intellectual process of describing objects in accordance with accepted library principles, particularly those of subject and classification order.
Short Definition: The capacity of one variable to influence another. The first variable may bring the second into existence or may cause the incidence of the second variable to fluctuate.
Synonym: Correlation
Short Definition: A product that has been inspected, evaluated, tested, or otherwise determined to be in conformance or compliance with applicable or specified provisions of referenced standards, codes, or other requirements and certified by an authority which is recognized or has the legal power to grant such certification. Certified products imply a guarantee or warranty of product conformance and that the product is under the test and surveillance procedures of a specified certification system.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: Tracks the progress of each change from submission through review, approval, implementation and closure. The log can be managed manually by using a document or spreadsheet, or it can be managed automatically with a software or Web-based tool.
Short Definition: A systematic approach to dealing with change, both from the perspective of an organization and on the individual level.
Short Definition: To test if a file has changed over time. A checksum is a type of metadata and an important property of a data object to allow verifying identity and integrity. Also called a hash, a checksum is a randomly generated piece of data that is used to verify the fixity or stability of a digital object. It is most commonly used to detect whether some representation of a digital object has changed over time. This is associated with PIDs but can be found and tested independently of PID systems.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; https://wiki.duraspace.org/display/DPNC/Glossary
Short Definition: A corporate officer responsible for enterprise-wide governance and utilization of information as an asset, via data processing, analysis, data mining, information trading and other means. Chief Data Officers have various reporting lines including to the Chief Technology Officer, Chief Information Officer, Chief Executive Officer, Chief Marketing Officer, or the Chief Strategy Officer.
Synonym: Chief Technology Officer; Chief Information Officer; Chief Executive Officer; Chief Marketing Officer; Chief Strategy Officer
Short Definition: A person who helps an organisation drive growth by converting traditional "analog" activities to digital ones, and overseas operations in the rapidly changing digital sectors such as mobile applications, social media and related applications, virtual goods, as well as "wild" web-based information management and marketing.
Short Definition: The most senior executive in an enterprise responsible for the information technology and computer systems that support enterprise goals. Generally, the Chief Information Officer reports to the Chief Executive Officer, Chief Operating Officer, or Chief Financial Officer.
Synonym: Chief Executive Officer; Chief Operating Officer; Chief Financial Officer.
Short Definition: Typically is at the same level as, or reporting directly to the chief information officer, the Chief Technology Officer is primarily concerned with long-term and "big picture" issues (while still having deep technical knowledge of the relevant field).
Short Definition: A type of referable data that has undergone quality assessment and can be referred to as citations in publications and as part of research objects.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Peter Wittenburg
Short Definition: Individuals, units or organizations using a service or product.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Short Definition: Security classification that designates the level of protection against access the data or information requires when unauthorized disclosure could reasonably be expected to cause injury to the national interest - defence and maintenance of the social, political and economic stability of the nation. Classification levels: Unclassified, Restricted (disadvantageous to the interests of the nation), Confidential (injury to the national interest), Secret (serious injury to the national interest), Top Secret (exceptionally grave injury to the national interest).
Reference: termiumplus.gc.ca ; https://www.tpsgc-pwgsc.gc.ca/esc-src/documents/levels-of-security.pdf ; https://www.tpsgc-pwgsc.gc.ca/esc-src/documents/levels-of-security.pdf
Synonym: Security classification
Related Term: Protected information
Short Definition: Internal access only.
Reference: https://theodi.org/insights/tools/the-data-spectrum/ ; https://www.alerc.org.uk/uploads/7/6/3/3/7633190/an_introduction_to_open_shared_and_closed_data.pdf
Related Term: Open access; Shared access
Short Definition: A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically- scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over the Internet. Key elements: it is a specialized distributed computing paradigm; it is massively scalable; it can be encapsulated as an abstract entity that delivers different levels of services to customers outside the Cloud; it is driven by economies of scale; and, the services can be dynamically configured (via virtualization or other approaches) and delivered on demand.
Reference: GRDI 2020/TC3+
Short Definition: An ecosystem that includes not only traditional elements of cloud computing such as software and infrastructure, but also consultants, integrators, partners, third parties and anything in their environments that has a bearing on the other components.
Short Definition: A type of data provenance that adds metadata to identify data collections. The organization doing the collection management is stated in the metadata along with the provenance of collection management events such as source of data acquisition, conservation, and movement.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://www.merriam-webster.com/dictionary/proposition, meaning 2a; Sowa, 2000, p. 501; usage of abstract and entity per Sowa.
Short Definition: A file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a comma from the next column's value and each row starts a new line.
Synonym: CSV.
Related Term: Character separated values
Short Definition: The final step in the successful completion of a previously started database change as part of handling a transaction in a computing system
Short Definition: Conformance with a law or regulation.
Short Definition: An entity with a discrete structure within a system considered at a particular level of analysis (e.g., an assembly or software module). Components may be characterized by the services they offer and the internal structures that are required to offer those services.
Short Definition: Any computer application that requires a lot of computation, such as meteorology programs and other scientific applications.
Short Definition: 1. Computer code, or source code: A series of computer instructions written in some human readable computer language, usually stored in a text file. Computer code should include explanatory comments. 2. Machine code: Source code is 'compiled' or 'interpreted' to produce computer executable code. 3. A code is a collection of mandatory standards, which has been codified by a governmental authority and thus become part of the law for the jurisdiction represented by that authority. Examples include the National Building Code and the National Electrical Code.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Synonym: Code; Source code; Script
Short Definition: Any computing application that requires the resources of a lot of computers, such as grid computing.
Short Definition: The application of computer systems knowledge to the planning, development, installation and maintenance of information technology processing systems to manage, administer or support programs and activities.
Long Definition: Activities include the: (a) Conduct of analyses and design and programming activities for the development, implementation and maintenance of administrative, scientific and technological information processing systems; and the customization and maintenance of generalized application software and system software packages; (b) Conduct and control of emergency repairs to application and system software; (c) Analysis and design of business systems and supporting infrastructures and the construction and maintenance of the related software; (d) Design, implementation, installation and servicing of databases and database software, the control of the integrity, security and modification of the databases and the provision of database recovery/backup facilities; (e) Capacity management, configuration, performance measurement and optimization of hardware, software and network systems; (f) Development, application or enforcement of standards and procedures, and quality assurance pertaining to information technology processing systems and activities; (g) Development and conduct or determination of the: (i) technical evaluation of information technology processing systems; (ii) technical specifications for the evaluation, testing, acquisition, installation and acceptance of information technology processing goods and services, such as computer system and related hardware, computer or computer network hardware and software; and, (iii) associated support services; (h) Provision of advice and consultation on information technology processing systems, facilities and applications including the evaluation of the technical security of these systems; (i) Conduct of planning and research into existing and future information technology processing systems capacity, capability, applications and requirements; (j) Development and delivery of training programs in the above activities; and, (k) Leadership of any of the above activities.
Synonym: CS.
Related Term: Science, Technology, Applied science, Health Science, Physical science, Engineering and scientific support
Short Definition: 1. Anything that can be described in writing. 2. The smallest, unambiguous unit of thought that is uniquely identifiable.
Short Definition: A security classification concerning documents, information and material the unauthorized disclosure of which would be prejudicial to the interests or prestige of the nation, would cause damage to an individual, or would be of advantage to a foreign power.
Reference: termiumplus.gc.ca
Related Term: Personal information; Personal data; Personally identifiable information
Short Definition: 1. The duties and practices of people and organizations to ensure that individuals personal information only flows from one entity to another according to legislated or otherwise broadly accepted norms and policies. 2. In the context of of health data: Confidentiality is breached whenever personal information is communicated that is not authorized by legislation, professional obligations, or under contractual duties.
Reference: Council of Canadian Academies (2015). ACCESSING HEALTH AND HEALTH-RELATED DATA IN CANADA The Expert Panel on Timely Access to Health and Social Data for Health Research and Health System Innovation.
Short Definition: The state of having satisfied the requirements of some specific standard(s) and/or specification(s). Conformance is used with respect to voluntary standards and specifications, whereas compliance is used with respect to mandatory standards and regulations.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: Developed through the cooperation of all parties who have an interest in participating in the development and/or use of the standards. Consensus requires that all views and objections be considered, including feasability, and that an effort be made toward their resolution. Consensus implies more than the concept of a simple majority, but not necessarily unanimity. Consensus standards should be viewed as minimum acceptable standards, not an ideal or maximum target objective.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: The information trail customers leave behind as a result of their Internet use. This data, which sometimes comprises personal information, comes from such sources and channels as social media networks, marketing campaigns, customer service requests, call centre communications, online browsing data, mobile applications, purchasing history and preferences, and more.
Short Definition: 1. Specifies the party which can provide information. 2. In the context of Data Management Plans (DMP), this refers to the party who can provide any information on the DMP. This is not necessarily the DMP creator, and it can be a person or an organization.
Reference: RDA maDMP common standard
Short Definition: Something able to hold objects. A data repository can hold data objects and collections. In this case the data repository may be considered a type of data container.
Short Definition: The set of information that is the original target object that has been registered and is under preservation.
Short Definition: A type of Digital migration where there is no change to the Packaging Information, the Content Information, or the PDI. The bits used to represent these Information Objects are preserved in the transfer to the same or new media instance.
Short Definition: 1. Person involved in the process of creation of a work, but that does not rise to the level of authorship; 2. Party involved in the process of data management described by a Data Management Plan (DMP), or party involved in the creation and management of the DMP itself.
Reference: RDA maDMP common standard.
Short Definition: A list of standardized terminology, words, or phrases, used for indexing or content analysis and information retrieval, usually in a defined information domain.
Reference: DAMA Dictionary of Data Management; TBS Standard On Metadata
Short Definition: A set of documents that has a scientific meaning. A corpus can be produced by an individual researchers activity (including its archival materials), or from a laboratory research, field campaign or science and culture heritage project, a survey, etc.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; OAI-OR
Short Definition: A statistical measure that indicates the extent to which two or more variables fluctuate together. Correlation does not imply causation. There may be, for example, an unknown factor that influences both variables similarly.
Short Definition: Deterioration of computer data as a result of some external agent such as viruses, hardware or software incompatibility, flaws, or failures, power outages, dust, water, extreme temperatures, etc.
Short Definition: The ability to have an innovative approach to research by creating new or modified current concepts, theories, approaches and/or solutions.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Short Definition: Explains aspects of one discipline in terms of another (e.g., the physics of music; the politics of literature)
Short Definition: The activity of managing and promoting the use of data from their point of creation to ensure that they are fit for contemporary purpose and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep them fit for purpose. Higher levels of curation will also involve links with annotation and with other published materials.
Reference: JISC e-Science Curation Report/TC3+
Short Definition: A type of workflow that includes active steps to curate data as an aid to on-going management of data through its lifecycle.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Weigel et al., 2013. "A Framework for Extended Persistent Identification of Scientific Assets". http://dx.doi.org/10.2481/dsj.12-036
Short Definition: 1. Operational data that are not being used, such as information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes. Such data are seen as an economic opportunity for companies if they can take advantage of it to drive new revenues or reduce internal costs. Examples include server log files that can give clues to website visitor behavior; client call detail records that can indicate consumer sentiment; and, mobile geolocation data that can reveal traffic patterns to aid in business planning. 2. Data that can no longer be accessed because they have been stored on devices that have become obsolete.
Synonym: At risk-data
Short Definition: A document creation and management specification that builds content reuse into the authoring process.
Short Definition: Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be in any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms/code/scripts, or statistical records.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Landry et al. (1970); Carol Tenopir (2007); Michael Buckland (2007); See Zin et al. (2007) for an analysis of 130 definitions of data, information and knowledge provided by an expert panel of 45 leading scholars in information science, and the development of 5 models for defining data, information, and knowledge.
Short Definition: A system that allows outsiders to be granted access to databases without overloading the system.
Reference: Open Data 101 (Government of Canada)
Short Definition: The process of acquiring data from some source. For example, data may be acquired by download from a repository, transfer from a data logger, data capture, etc.
Synonym: Data reception; Data download.
Related Term: Data capture
Short Definition: A data lifecycle stage that involves the techniques that produce synthesized knowledge from organized information. A process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Wikipedia/Educause
Short Definition: An archival service providing the long-term permanent care and accessibility for digital objects with research value. The standard for such repositories is the Open Archival Information System reference model.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Mapping the Data Landscape 2011 Summit. ISO 14721:2003
Synonym: Repository; Trusted Digital Repository
Short Definition: The state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user.
Short Definition: The process or means of obtaining and storing external data, particularly images or sounds, for use at a later time. In biometric security systems, for example, capture is the acquisition of, or the process of acquiring an identifying characteristic such as a finger image, palm image, facial image, iris print, or voice print. In order to capture the data, a transducer is employed that converts the actual image or sound into a digital file. The file is then stored. At a later time, it can be analyzed by a computer, or compared with other files in a database to verify identity or to provide authorization to enter a secured system. Screen capture is the acquisition and storage of an image on a monitor or display exactly as it appears at a specific time. This can sometimes (but not always) be done by hitting the "print screen" key, in which case the image appears as a bitmap file in the clipboard. It can also be done by photographing the display screen with a digital camera external to the computer. Electronic signals from scientific instruments, dataloggers, sensors, etc., can also be captured, converted to data, and stored for use at a later time.
Synonym: Data acquisition
Short Definition: A curated collection of metadata about datasets and their data elements.
Short Definition: A facility providing IT services, such as servers, massive storage, and network connectivity.
Synonym: Research data centre.
Related Term: Digital Infrastructure for related concepts
Short Definition: Offers proper recognition to authors as well as permanent identification through the use of global persistent identifiers in place of URLs which can change frequently. Use of universal numerical fingerprints (UNFs) guarantees to the scholarly community that future researchers will be able to verify that data retrieved is identical to that used in a publication decades earlier, even if it has changed storage media, operating systems, hardware, and statistical program format. Data citation is provided in a similar way that researchers routinely include bibliographic references to traditionally published resources. Data citation should include the following elements: (a) Name Principal Investigator/Author/Data Creator; (c) Release Date/Year of Publication - year of release, for a completed dataset; (d) Title of Data Source - formal title of the dataset; (e) Version/Edition Number - the version of the dataset used in the study; (f) Format of the Data - physical format of the data; (g) 3rd Party Data Producer - refers to data accessed from a 3rd party repository; (h) Archive and/or Distributor - the location that holds the dataset; (i) Locator or Identifier - includes Digital Object Identifiers (DOI), Handles, Archival Resource Key (ARK), etc.; (j) Access Date and Time - when data is accessed online; (k) Subset of Data Used - description based on organization of the larger dataset; (l) Editor or Contributor - reference to a person who compiled data, or performed value-added functions; (m) Publication Place - city, state, and country of the distributor of the data; and, (n) Data within a Larger Work - refers to the use of data in a compilation or a data supplement (such as published in a peer-reviewed paper).
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; thedata.org/Educause ; http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines
Short Definition: Data cleaning is a continuous process that requires corrective actions throughout the data lifecycle. Data cleaning is the process of detecting and correcting corrupt or inaccurate records from a dataset. Data cleaning involves identifying, replacing, modifying, or deleting incomplete, incorrect, inaccurate, inconsistent, irrelevant, and improperly formatted, data. Typically, the process involves updating, correcting, standardizing, and de-duplicating records to create a single view of the data, even if they are stored in multiple disparate systems.
Synonym: Data cleansing; Data scrubbing
Short Definition: The degree to which all required measures are known. Values may be designated as "missing" in order not to have empty cells, or missing values may be replaced with default or interpolated values. In the case of default or interpolated values, these must be flagged as such to distinguish them from actual measurements or observations. Missing, default, or interpolated values do not imply that the dataset has been made complete.
Short Definition: Data compliance consists of the ongoing processes to ensure adherence of data to both enterprise business rules (government department, university, industry, or agency), and to legal, regulatory and accreditation requirements. Data compliance includes five areas: controls, audit, legal compliance, regulatory compliance, and accreditation conformance.
Short Definition: A software stack that is chunking digital objects at a physical layer. Typical containers are file systems, database management systems, content management systems, clouds etc. The software stack implies some form of encapsulation of the digital object.
Short Definition: A managed process, throughout the data lifecycle, by which data/data collections are cleansed, documented, standardized, formatted and inter-related. This includes versioning data, or forming a new collection from several data sources, annotating with metadata, adding codes to raw data (e.g., classifying a galaxy image with a galaxy type such as "spiral"). Higher levels of curation involve maintaining links with annotation and with other published materials. Thus a dataset may include a citation link to publication whose analysis was based on the data. The goal of curation is to manage and promote the use of data from its point of creation to ensure it is fit for contemporary purpose and available for discovery and re-use. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Special forms of curation may be available in data repositories. The data curation process itself must be documented as part of curation. Thus curation and provenance are highly related.
Short Definition: A data custodian is an IT individual or organization responsible for the IT infrastructure providing and protecting data in conformance with the policies and practices prescribed by data governance.
Reference: DAMA Dictionary of Data Management
Synonym: Technical data steward; Data manager.
Related Term: Data stewardship
Short Definition: Removing noise from data
Short Definition: The process of destroying data stored on tapes, hard disks and other forms of electronic media so that it is completely unreadable and cannot be accessed or used
Short Definition: 1. A data dictionary is a crucial part of a relational database as it provides additional information about the relationships between multiple tables in a database. The data dictionary in DBMS [database management system] helps the user to arrange data in a neat and well-organized way, thus preventing data redundancy. 2. A collection of descriptions of the data objects or items in a data model. A first step in analyzing a system of objects with which users interact is to identify each object and its relationship to other objects. This process is called data modeling and results in a picture of object relationships. After each data object or item is given a descriptive name, its relationship is described (or it becomes part of some structure that implicitly describes relationship), the type of data (such as text or image or binary value) is described, possible predefined values are listed, and a brief textual description is provided. This collection can be organized for reference into an eBook called a data dictionary.
Reference: Termiumplus.gc.ca; Casrai
Short Definition: A data mining practice in which large volumes of data are analyzed seeking any possible relationships between data. The traditional scientific method, in contrast, begins with a hypothesis and follows with an examination of the data. Data dredging often circumvents traditional data mining techniques and may lead to premature conclusions. Uncovered patterns may be presented as statistically significant without any specific hypothesis as to the underlying causality. Data dredging is sometimes described as "seeking more information from a dataset than it actually contains."
Reference: https://en.wikipedia.org/wiki/Data_dredging
Synonym: Data fishing
Short Definition: An approach to governance that values decisions that can be backed up with data that can be verified. The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation. Errors can creep into data analytics processes at any stage of the endeavor and serious issues can result when they do.
Synonym: DDDM
Short Definition: A serious problem caused by one or more ineffective data analysis processes. In addition to the financial burden, problems with data quality and analysis can have a serious impact on security, compliance, project management and human resource management, among others. Error can creep into data analytics at any stage. The data quality may be inadequate in the first place. The data could be incomplete, inaccurate, not current, or may not be a reliable indicator of what they are intended to represent. Data analysis and interpretation are prone to a similar number of pitfalls. There can be confounding factors and the mathematical method can be flawed or inappropriate. Correlation can be erroneously considered to suggest causation. Statistical significance may be mistakenly attributed when the data do not support it. Even if the data and analytic processes are valid, data may be deliberately presented in a misleading manner to support an agenda. Problems arise when insufficient resources are applied to data processes and too much confidence placed in their validity. To prevent data-driven disasters, it's crucial to continually examine data quality and analytic processes, and to pay attention to common sense and even intuition. When data seem to be indicating something that does not make logical sense or just seems wrong, it is time to re-examine the source data and the methods of analysis.
Short Definition: In a database, an example of a data element is a data field. One also says that a data element is an attribute of a data entity. A unit of data for which the definition, identification, representation (term used to represent it), and permissible values are specified by means of a set of attributes. For example, the data element "age of a person" with values consisting of all combinations of 3 decimal digits; A personnel record may include the data elements "name" and "address". In the context of the personnel record, "name" and "address" function as an indivisible unit, e.g., the data element "name" and the data element "address" each can be stored and retrieved as an indivisible unit. However, in a different context, "address" itself may be considered a record that contains its own data elements "street address", "city", "postal code", "country".
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Brown, A. (2008). White paper: Representation information registries. Retrieved June19, 2009, from http://www.planets-project.eu/docs/reports/Planets_PC3-D7_RepInformationRegistries.pdf
Short Definition: An object, event, or phenomenon about which data are stored in a database and which has intermediate representation in a Data Model.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Bechhofer, Sean, et al. "Why linked data is not enough for scientists." Future Generation Computer Systems 29.2 (2013): 599-611.
Short Definition: Data exploration involves summarizing the main characteristics of a dataset using visualization and should be the first step in data analysis.
Short Definition: The layout of a file in terms of how the data within the file are organized. A program that uses the data in a file must be able to recognize and possibly access data within the file. A particular file format is often indicated as part of a file's name by a filename extension (suffix). Conventionally, the extension is separated by a period from the name and contains three or four letters that identify the format. There are as many different file formats as there are different programs to process the files. Examples include: Word documents (.doc), Web text pages (.htm or .html), Web page images (.gif and .jpg), Adobe Postscript files (.ps), Adobe Acrobat files (.pdf), Executable programs (.exe), Multimedia files (.mp3 and others). Preferred formats are those designated by a data repository for which the digital content is maintained. If a data file is not in a preferred format, a data curator will often convert the file into a preferred format, thus ensuring that the digital content remains readable and usable. Usually, preferred formats are the de facto standard employed by a particular community.
Reference: Policy-making for Research Data in Repositories: A Guide 2009/TC3+
Synonym: Data structure
Short Definition: The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets. It refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures.
Reference: DAMA Dictionary of Data Management
Short Definition: In the context of epidemiology: Making data from different sources comparable. The processes involved in producing inferentially equivalent data.
Synonym: Data integration
Short Definition: The collective processes conducted to ensure the cleanliness of data. Data are considered clean when they are relatively error-free.
Synonym: Dirty data
Short Definition: An identifier that uniquely distinguishes one set of data from all others. Examples include: Archival Resource Key (ARK); Digital Object Identi?ers (DOI); Extensible Resource Identi?er (XRI); HANDLE; Life Science ID (LSID); Object Identi?ers (OID); Persistent Uniform Resource Locators (PURL); URI/URN/URL; UUID.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://www.dcc.ac.uk/how-discover-requirements
Short Definition: The process of obtaining, importing, and processing data for later use or storage in a database. This process often involves altering individual files by editing their content and/or formatting them to fit into a larger document. An effective data ingestion methodology begins by validating the individual files, then prioritizes the sources for optimum processing, and finally validates the results. When numerous data sources exist in diverse formats (the sources may number in the hundreds and the formats in the dozens), maintaining reasonable speed and efficiency can become a major challenge. To that end, several vendors offer programs tailored to the task of data ingestion in specific applications or environments.
Synonym: Ingestion
Short Definition: Combining diverse datasets from disparate sources into one unified dataset or database. Data are accessed and extracted, moved, validated, cleaned, transformed and loaded.
Reference: DAMA Dictionary of Data Management; Other
Synonym: Data fusion; Data pooling; Data compilation.
Related Term: Data linkage; Privacy-preserving data linkage
Short Definition: Data fusion; Data pooling; Data compilation
Short Definition: A type of data element that expresses a proposition that binds one or more property values to some data entity.
Short Definition: 1. Data experts who have a librarian background. Data librarians often carry out curation and metadata related work. There is much overlap between data librarians, data managers, and data stewards. 2. Librarians who focus on providing research support services such as finding, using, and retrieving, collecting data for use in secondary analysis.
Short Definition: Refers to all the stages in the existence of digital information from creation to destruction. A lifecycle view is used to enable active management of the data objects and resource over time, thus maintaining accessibility and usability.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Klein et al (2013) A Technical Framework for Resource Synchronization
Short Definition: The process of bringing together from two or more different sources, data that relate to the same individual, family, place or event). Example: linkage may be used to bring together information about an individual's health status, prescription drug use, and social media habits.
Reference: Holman, C. D. J., Bass, A. J., Rosman, D. L., Smith, M. B., Semmens, J. B., Glasson, E. J.,... Stanley, F. J. (2008) A decade of data linkage in Western Australia: Strategic design, applications and benefits of the WA data linkage system. Australian Health Review, 32(4), 766-777.
Synonym: Linkage.
Related Term: Data integration
Short Definition: 1. Research data management (RDM) refers to the processes applied through the lifecycle of a research project to guide the collection, documentation, storage, sharing and preservation of research data. 2. The activities of data policies, data planning, data element standardization, information management control, data synchronization, data sharing, and database development, including practices and projects that acquire, control, protect, deliver and enhance the value of data and information.
Reference: Canada Tri-Agency; Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Mapping the Data Landscape 2011 Summit; TBS Information Management Glossary (BC Government Information Resource Management); DAMA Dictionary of Data Management
Short Definition: An infrastructure used to provide data management and enforce data management policies. A data management infrastructure would include resources such as a data repository and an information catalogue.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Klein et al, 2013 A Technical Framework for Resource Synchronization
Short Definition: A Data Management Plan (DMP) provides information required by various stakeholders (e.g., funders, managers, data stewards, researchers, scientists, librarians, IT support, etc.) about a specific dataset or about a project and its data for the purpose of costing, project management, data management, and curation.
Long Definition: See, machine-actionable Data Management Plan (maDMP).
Short Definition: A written document backed by management describing policy and providing guidance to ensure that appropriate standards, consistent guidelines, and common strategies are used, providing linkages to and consistency with other similar systems, and fostering a true network across an organization producing data.
Short Definition: A repository of data designed to serve a particular community of knowledge workers. The goal of a data mart is to meet the particular demands of a specific group of users. Because data marts are optimized to look at data in a unique way, the design process tends to start with an analysis of user needs. Generally, an organization's data marts are subsets of the organization's data warehouse. A data mart tends to be tactical and aimed at meeting an immediate need. Data virtualization software can be used to create virtual data marts, pulling data from disparate sources and combining it with other data as necessary to meet the needs of specific business users. A virtual data mart provides knowledge workers with access to the data they need while preventing data silos and giving the organization's data management team a level of control over the organization's data throughout its lifecycle. The difference between a data warehouse and a data mart can be confusing because the two terms are sometimes used incorrectly as synonyms.
Synonym: Data warehouse
Short Definition: The process of transferring data between storage types, formats, information technologies, or computer systems. A data migration project is usually undertaken to replace or upgrade servers or storage equipment, for a website consolidation, to conduct server maintenance or to relocate a data center.
Reference: Wikipedia/Educause
Synonym: Data acquisition; Data sharing
Short Definition: The process of analyzing multivariate datasets using pattern recognition or other knowledge discovery techniques to identify potentially unknown and potentially meaningful data content, relationships, classification, or trends. Data mining parameters include: Association (looking for patterns where one event is connected to another event); Sequence or path analysis (looking for patterns where one event leads to another later event); Classification (looking for new patterns); Clustering (finding and visually documenting groups of facts not previously known); Forecasting, or predictive analytics (discovering patterns in data that can lead to reasonable predictions about the future.
Reference: UNESCO Open Access Policy Guidelines; TBS Information Management Glossary (BC Information Resource Management); DAMA Dictionary of Data Management.
Short Definition: A model that specifies the structure or schema of a dataset. The model provides a documented description of the data and thus is an instance of metadata. It is a logical, relational data model showing an organized dataset as a collection of tables with entity, attributes and relations.
Synonym: Data modeling
Short Definition: Data modeling formalizes and documents existing processes and events. It captures and translates complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering. A data model can be thought of as a diagram or flowchart that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, it's an important step and should not be rushed. Well-documented models allow stakeholders to identify errors and make changes before any programming code has been written. Data modellers often use multiple models to view the same data and ensure that all processes, entities, relationships and data flows have been identified. There are several different approaches to data modeling, including: Conceptual Data Modeling (identifies the highest-level relationships between different entities); Enterprise Data Modeling (similar to conceptual data modeling, but addresses the unique requirements of a specific organization); Logical Data Modeling (illustrates the specific entities, attributes and relationships involved in a business function. Serves as the basis for the creation of the physical data model); Physical Data Modeling (represents an application and database-specific implementation of a logical data model).
Short Definition: A series of potentially destructive or irrevocable changes to a piece of data or a file. Common munging operations include removing punctuation or html tags, data parsing, filtering, and transformation.
Short Definition: Scales all numeric variables in the range [0, 1]. Also known as "0-1 scaling." Each variable in the dataset is recalculated as (V - min V)/(max V - min V), where V represents the value of the variable in the original dataset. This method allows variables to have differing means and standard deviations but equal ranges. In this case, there is at least one observed value at the 0 and 1 endpoints.
Related Term: Data standardization]]
Short Definition: Denotes the complexity of measures that are used by a repository to form aggregations of data objects (including collections and metadata) to describe the properties of data objects, to register PIDs, to build the PID records, to link between all components, and to set up the containers (software stack) that are used to store all components.
Short Definition: An organizations stated data/information management processes designed to assist and protect the organization's data research assets. It is a set of high-level principles that establish a guiding framework for data management. A data policy can be used to address strategic aspects such as data access, relevant legal matters, data stewardship issues and custodial duties, data acquisition and other issues.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Mapping the Data Landscape 2011 Summit
Short Definition: Any type of processing performed on raw data to prepare it for another processing procedure. Preprocessing may include: data sampling, data transformation, de-noising, data normalization, data standardization, or feature extraction.
Short Definition: In the context of batch priority-based scheduling, used to declare relative priorities and to determine the processing order of jobs and business processes. Values for scheduling priority: Low, Normal (default value), High, Critical, and Reserved capacity (highest priority).
Reference: Microsoft
Short Definition: A generic concept referring to all kinds of procedures being executed on data at any point in the data life cycle.
Short Definition: Includes all activities involved in the planning, collecting, processing, analysis and maintenance of data in the original research project. Among these activities are selecting a study design, constructing instruments for data collection, conducting data collection/creation, performing data editing/verification/validation, analyzing data, backing up data versions and preparing and tagging metadata.
Short Definition: The statistical analysis and assessment of the quality of data values within a dataset for consistency, uniqueness and logic. The data profiling process cannot identify inaccurate data; it can only identify business rules violations and anomalies. The insight gained by data profiling can be used to determine how difficult it will be to use existing data for other purposes. It can also be used to provide metrics to assess data quality and help determine whether or not metadata accurately describes the source data. Profiling tools evaluate the actual content, structure and quality of the data by exploring relationships that exist between value collections both within and across datasets. For example, by examining the frequency distribution of different values for each column in a table, an analyst can gain insight into the type and use of each column. Cross-column analysis can be used to expose embedded value dependencies and inter-table analysis allows the analyst to discover overlapping value sets that represent foreign key relationships between entities.
Synonym: Data archeology
Short Definition: The release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way. Data publishing occurs via dedicated data repositories and/or (data) journals which ensure that the published research objects are well documented, curated, archived for the long term, interoperable, citable, quality assured and discoverable - all aspects of data publishing that are important for future reuse of data by third party end-users.
Reference: Bloom T, Dallmeier-Tiessen* S, Murphy* F, Austin CC, Whyte A, Tedds J, Nurnberger A, Raymond L, Stockhause M, Vardigan M (2015 Preprint). Workflows for Research Data Publishing: Models and Key Components. International Journal on Digital Libraries, Research Data Publishing Special Issue. 27 pages, June 30, 2015. Bechhofer, Sean, et al. "Why linked data is not enough for scientists." Future Generation Computer Systems 29.2 (2013): 599-611.
Short Definition: The reliability and application efficiency of data. It is a perception or an assessment of dataset's fitness to serve its purpose in a given context. Aspects of data quality include: Accuracy, Completeness, Update status, Relevance, Consistency across data sources, Reliability, Appropriate presentation, Accessibility. Within an organization, acceptable data quality is crucial to operational and transactional processes and to the reliability of analytics, business intelligence, and reporting. Data quality is affected by the way data are entered, stored and managed. Maintaining data quality requires going through the data periodically and scrubbing it. Typically this involves updating, standardizing, and de-duplicating records to create a single view of the data, even if it is stored in multiple disparate systems. Data quality assurance (DQA) is the process of verifying the reliability and effectiveness of data.
Synonym: Data cleaning
Short Definition: The process of restoring data that have been lost, accidentally deleted, corrupted or made inaccessible for any reason. The data recovery process may vary, depending on the circumstances of the data loss, the data recovery software used to create backups, and backup target media. In some cases, end users may be able to restore lost files themselves. Restoration of a corrupted database from a tape backup is a more complicated process that requires specialized intervention. Data that were not backed up and were accidentally deleted from a computer's file system may sometimes be recovered from file fragments that remain on the disk. An organization's disaster recovery plan should make known who in the organization is responsible for recovering data, provide a strategy for how data will be recovered and document acceptable recovery point and recovery time objectives.
Synonym: Data restoration
Short Definition: The process of reducing the amount or size of stored data. This may be achieved by eliminating redundant copies of data files, deduplicating data files by removing redundant records, or by compressing the data files.
Short Definition: A framework whose primary purpose is to enable information sharing and reuse across the federal government via the standard description and discovery of common data and the promotion of uniform data management practices.
Short Definition: A curation process on a data object by which it receives a persistent object identifier (PID) from a trusted registration authority. Registration must be accompanied by the step(s) to upload the data object to a persistent repository.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; NISO (2004) Understanding Metadata. Bethesda, MD: NISO Press, p.1 http://marciazeng.slis.kent.edu/metadatabasics/types.htm
Synonym: Repository; Persistent identifier
Short Definition: A type of data management using repositories. It is the set of policies that govern the organization, control, and properties of the repository such as: required file formats, access control restrictions, integrity, replication, retention, disposition, etc.
Short Definition: An object describing the context of the data, including provenance, description, structural, and administrative information.
Short Definition: Recovery and/or transformation and digitization of dark data and at-risk data so that they can be preserved, accessed, shared, and used. Data rescue also involves the addition of rich metadata to make the content understandable and more easily re-usable.
Reference: http://www.wmo.int/pages/prog/hwrp/datarescue.php ; http://iedro.org/data-rescue-process/
Short Definition: The physical or geographic location of an organization's data or information. Data residency also refers to the legal or regulatory requirements imposed on data based on the country or region in which it resides. Cloud computing, which allows organizations to deliver hosted services over the Internet, can create data residency concerns. Users need to know where their cloud provider's data centres are located across the globe as well as the different data residency policies for each respective location. To protect data from unwanted access regardless of its geographic location, organizations can use cloud data encryption tools that transform user data into ciphertext before it is stored in the cloud (e.g., CipherCloud; Vormetric; Perspecsys).
Short Definition: An organization's established protocol for retaining information for operational, legal, or regulatory compliance needs. The objectives of a data retention policy are to keep important information for future use or reference, to organize information so it can be searched and accessed at a later date, and to dispose of information that is no longer needed. A data retention policy must consider both the value of data over time, and regulations to which the data may be subject.
Short Definition: An activity through which the correctness conditions of the data are verified. It also includes the specification of the type of the error or condition not met, and the qualification of the data and its division into the "error-free" and "erroneous" data. Data review consists of both error detection and data analysis, and can be carried out in manual or automated mode.
Reference: https://stats.oecd.org/glossary/detail.asp?ID=3400 ; http://www.unece.org/fileadmin/DAM/stats/publications/editingglossary.pdf
Synonym: Data validation; High quality data; Dirty data; Data cleaning; Data processing; Data integrity
Short Definition: Selection of a statistically representative subset from a large population of data
Short Definition: Techniques used to deal with parameters having different units and scales.
Synonym: Data rescaling.
Related Term: Data standardization
Short Definition: A process that creates a new dataset from an original source. Examples include: creating a subset of the data,querying a database.
Short Definition: The practice of making data available for reuse. This may be done, for example, by depositing the data in a repository, through data publication.
Synonym: Data dissemination; Data posting.
Related Term: Repository; Data publication
Short Definition: An approach to protecting sensitive data from unauthorized access by encrypting the data and storing different portions of a file on different servers. An unauthorized person would need to know the locations of the servers containing the parts, be able to get access to each server, know what data to combine, and how to decrypt it. Data splitting can be made even more effective by periodically retrieving and recombining the parts, and then splitting the data in a different way among different servers, and using a different encryption key.
Short Definition: In the context of data analysis and data mining: Where "V" represents the value of the variable in the original datasets: Transformation of data to have zero mean and unit variance. Techniques used include: (a) Data normalization; (b) z-score scaling; (c) Dividing each value by the range: recalculates each variable as V /(max V - min V). In this case, the means, variances, and ranges of the variables are still different, but at least the ranges are likely to be more similar; and, (d) Dividing each value by the standard deviation. This method produces a set of transformed variables with variances of 1, but different means and ranges.
Synonym: Data normalization; Data z-score normalization
Short Definition: Data stewardship is a shared responsibility between Principal Investigators and data stewards. Principal Investigators are responsible for, and data stewards provide support for: (a) Data collection, data integration, or reuse of existing data; (b) Review of data quality; (c) Description of scientific workflow/process; (d) Provision of standards-compliant metadata; and, (e) Submission of data and data productions. Data stewards are responsible for, and Principle Investigators are consulted and informed on: (a) Preservation of data and data products; and, (b) Provision of formats (e.g., web services, NetCDF, etc.) for data discovery and integration. In addition, Principal Investigators are also responsible for data citation, as appropriate, when preparing documentation, reports, or references.
Long Definition: A data steward manages and oversees an organization's data assets to provide data users with high quality data that are easily accessible in a consistent manner. While data governance generally focuses on high-level policies and procedures, data stewardship focuses on tactical coordination and implementation. A data steward is responsible for carrying out data usage and security policies as determined through enterprise data governance initiatives, acting as a liaison between the IT department and the business side of an organization. An effective data steward maintains agreed-upon data definitions and formats, identifies data quality issues and ensures that business users adhere to specified standards. An organization may use a data stewardship program as part of its overall data lifecycle management effort and/or to help with data quality improvement projects. A data steward collaborates with Principal Investigators, data custodians, data architects, business intelligence developers, extract, transform and load (ETL) designers, business data owners, and others to uphold data consistency and data quality metrics. Data quality tools, including data profiling software, are key technology components of many data stewardship programs.
Reference: http://searchdatamanagement.techtarget.com/definition/data-stewardship
Synonym: Principal Investigator; Data custodianship; Data citation
Short Definition: A repository for persistently storing collections of data, such as a database, a file system or a directory. The data stored can be of any type that can be rendered in digital format and placed in electronic media. Examples include text, image, video files and audio files.
Short Definition: A sequence of digitally encoded, coherent signals used to send or receive a representation of information content as transmitted.
Short Definition: A specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways. In computer programming, a data structure may be selected or designed to store data for the purpose of working on it with various algorithms.
Reference: National Institute of Science and Technology NIST Dictionary of Algorithms and Data Structures
Short Definition: The continuum of data structure that includes unstructured data, semi-structured data, and structured data.
Short Definition: 1. A field or column in a database table. It is an abbreviation for 'physical data attribute' which is a single data element related to a data object, such as a table in a database. The database schema associates one or more attributes with each database entity (i.e. table). 2. A term for a logical or conceptual attribute such as in an entity-attribute-relationship (EAR) data model.
Short Definition: Human tension and/or stress related to the sharing or release of data resulting from concerns about: (a) unknowns about users, uses, and what users will learn from the data before the data producers themselves learn it; (b) what users will learn from the data; (c) data quality; (d) data traceability (or lack thereof); (e) potential requests for additional documentation and metadata; (f) potential questions concerning methodology used to produce the data; (g) lack of resources to support data sharing; (h) governance; (i) social or political interests and impact; (j) data ownership; (k) the desire to hold back data to give researchers the time to publish articles based on those data; and/or (l) perceived risk of data misuse or misinterpretation.
Short Definition: Data traceability follows the lifecycle of data to track all access and changes to the data. It helps demonstrate transparency, compliance and adherence to regulations. Data traceability, along with data compliance, can be considered part of a data audit process. Data traceability is fundamental to reproducible research.
Short Definition: Manipulation of raw data to produce a single output.
Synonym: Data selection; Data processing; Data pre-processing
Short Definition: 1. A registry that links data types of all sorts with the executable data processing functions that can be useful for working with a specific data type. Examples include: complex file types in biology (diagnosis), registering categories that appear in PID records to describe data properties. Data types range from complex digital objects to simple categories that occur in digital objects. 2. A type of registry for data types supporting their standardization, uniqueness and discoverability.
Short Definition: A collection of interrelated data often with controlled redundancy, organized according to a scheme to serve one or more applications; the data are stored so that they can be used by several programs without concern for data structures or organization.
Short Definition: Provides well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system. Data validation checks that data are valid, sensible, reasonable, clean, usable, and secure before they are processed. Failures or omissions in data validation can lead to data corruption, security vulnerability. Improperly validated data can cause computer code processing the data to crash, generate error messages, behave in an unanticipated manner, or generate incorrect results that may be difficult or impossible to detect.
Synonym: High quality data; Dirty data; Data cleaning; Data processing; Data integrity
Short Definition: A central repository for all or significant parts of the data that an organization's various business systems collect. A data warehouse tends to be a strategic but somewhat unfinished concept. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the end user who may need access to specialized data marts. There are two approaches to data warehousing: The top down approach spins off data marts for specific groups of users after the complete data warehouse has been created. The bottom up approach builds the data marts first and then combines them into a single, all-encompassing data warehouse.
Reference: DAMA Dictionary of Data Management
Synonym: Data mart
Short Definition: The process of manually or semi-automatically converting or mapping data from one form into another format that allows for more convenient consumption of the data with the help of semi-automated tools. Gathering and organizing disparate data from different sources, often collected by many different investigators. Activities include developing and supporting search tools that utilize standardized metadata, harmonizing the coding of data for specific variables, engineering new methods of combining data. with the help of semi-automated tools. The result of data wrangling is repurposed data.
Synonym: Repurposed data
Short Definition: Variables are recalculated as (V - mean of V)/s, where "V" represents the value of the variable in the original dataset, and "s" is the standard deviation. As a result, all variables in the dataset have equal means (0) and standard deviations (1) but different ranges. Also known as z-score scaling.
Short Definition: A collection of data that is organised in a according to a conceptual structure/model describing the characteristics of these data and the relationships among their corresponding entities, supporting one or more application areas. A database allows its contents to be easily accessed, managed and updated. The type of database used depends on the requirements of the study. A common type is the relational database, where data are related to each other in a systematic manner so that they can be reorganised and accessed in a number of different ways. A database may house one or many datasets.
Short Definition: The function of managing the physical aspects of data resources, including database design and integrity, backup and recovery, performance and tuning.
Reference: DAMA Dictionary of Data Management
Short Definition: Any organized collection of data in a computational format, defined by a theme or category that reflects what is being measured/observed/monitored. The presentation of the data in the application is enabled through metadata.
Long Definition: Dataset can be understood as a logical entity depicting data, e.g. raw data. It provides high level information about the data. The granularity of dataset depends on a specific setting. In edge cases it can be a file, but also a collection of files in different formats.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Mapping the Data Landscape 2011 Summit; TBS Standard on Geospatial Data (ISO 19115:2003); Environment Canada data stewardship handbook (draft).
Short Definition: A collection of datasets sharing the same product specification. A dataset series is a type of aggregation or collection with some "logical grouping" such as by a topic (specification) with the (product) unit being a dataset series. Example: A series of earth observations. Each year, month or week (depending on the volume) might be a dataset and the series could run from a specified year to the present.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Government of Canada TBS Standard on Geospatial Data (ISO 19115:2003) ; W3C DCAT specification
Short Definition: A standard way to express a numeric calendar date that eliminates ambiguity, acceptable formats being defined by ISO 8601. ISO 8601 is applicable whenever representation of dates in the Gregorian calendar, times in the 24-hour timekeeping system, time intervals and recurring time intervals or of the formats of these representations are included in information interchange. It includes calendar dates expressed in terms of calendar year, calendar month and calendar day of the month; ordinal dates expressed in terms of calendar year and calendar day of the year; week dates expressed in terms of calendar year, calendar week number and calendar day of the week; local time based upon the 24-hour timekeeping system; Coordinated Universal Time of day; local time and the difference from Coordinated Universal Time; combination of date and time of day; time intervals; recurring time intervals.
Synonym: ISO date format; ISO time format
Short Definition: A standard that is widely accepted and used, but lacks formal approval by a recognized standards developing organization (e.g., the QWERTY keyboard).
Short Definition: De-anonymization is a reverse engineering process in which de-identified data are cross-referenced with other data sources to re-identify the personally identifiable information. This could occur if a de-identification process had not been not successfully performed, or had not been undertaken in the first place.
Short Definition: 1. The act of minimally perturbing individual-level data to decrease the probability of discovering an individuals identity. It involves masking direct identifiers (e.g., name, phone number, address) as well as transforming indirect identifiers that could be used alone or in combination to-identify an individual (e.g., birth dates, geographic details, dates of key events). If done correctly, de-identification is a defensible, repeatable, and auditable process that consistently provides assurance, based on generally accepted and repeatable statistical methodologies, that there is a very small risk of re-identification of any data that are released. 2. The use of one or more techniques designed to make it impossible -- or at least more difficult -- to identify a particular individual from stored data related to them. The purpose of data anonymization is to protect the privacy of the individual and to make it legal for governments and businesses to share their data without obtaining permission. Such data have proven to be very valuable for researchers, particularly in health care. Data anonymization methods include removing personally identifiable information (e.g., names, addresses, social insurance numbers, Medicare numbers, etc.), or using obfuscation methods such as encryption, hashing, generalization, pseudonymization, and perturbation. As governments move forward with open government initiatives, more data are becoming publicly available over the Internet. Much of these data have been scrubbed to create "limited datasets".
Reference: Council of Canadian Academies (2015). ACCESSING HEALTH AND HEALTH-RELATED DATA IN CANADA: The Expert Panel on Timely Access to Health and Social Data for Health Research and Health System Innovation. ISBN 978-1-926522-05-0 (pdf). http://www.scienceadvice.ca/en/assessments/completed/health-data.aspx ; El Emam et al., 2011 ; Internet 2/Educause; Open Data 101 (Government of Canada)
Synonym: Anonymization
Short Definition: A storage location for data that will probably not be accessed again, but must be kept in case of a compliance audit or some other reason.
Short Definition: Non-conformance to requirements.
Short Definition: In the context of computer networks: A physical or logical sub-network that separates an internal local area network (LAN) from other untrusted networks, usually the Internet. External-facing servers, resources and services are located in the DMZ so they are accessible from the Internet but the rest of the internal LAN remains unreachable. This provides an additional layer of security to the LAN as it restricts the ability of hackers to directly access internal servers and data via the Internet. Abbreviated as DMZ.
Synonym: Mud room; DMZ
Short Definition: In a relational database, denormalization is an approach to speeding up read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized. A denormalized database should not be confused with a database that has never been normalized. After data has been duplicated, the database designer must take into account how multiple instances of the data will be maintained. One way to denormalize a database is to allow the database management system (DBMS) to store redundant information on disk. This has the added benefit of ensuring the consistency of redundant copies. Another approach is to denormalize the actual logical data design, but this can quickly lead to inconsistent data. Rules called constraints can be used to specify how redundant copies of information are synchronized, but they increase the complexity of the database design and also run the risk of impacting write performance.
Short Definition: The results of applying a procedure to transform a data object in order to obtain a desired data product that is stored in a repository along with the provenance and descriptive metadata.
Synonym: Data transformation
Short Definition: 1. Information that establishes the conceptual content, access rights and physical attributes of [data] to support their discovery. 2. Enables identification, location, and retrieval of information resources by users, often including the use of controlled vocabularies for classification and indexing and links to related resources.
Reference: Library and Archives Canada ; DCC/TC3+
Short Definition: A record created digitally in the day-to-day business of the organisation and assigned formal status by the organisation. Examples include: word processing documents, emails, databases, or intranet web pages.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Digital preservation coalition; http://archives.govt.nz/advice/continuum-resource-kit/glossary/definitions-full-list#Record
Synonym: Electronic record
Short Definition: 1. In the context of library and archiving communities: Digital archiving is often used interchangeably with digital preservation. 2. In the context of computing Digital archiving is process of backup and ongoing maintenance as opposed to strategies for long-term digital preservation.
Reference: Digital preservation coalition
Synonym: Archiving
Short Definition: Data in the form of digital materials.
Synonym: Digital materials
Short Definition: Those layers that sit between base technology (a computer science concern) and discipline-specific science. The focus is on value-added systems and services that can be widely shared across scientific domains, both supporting and enabling large increases in multi- and interdisciplinary science while reducing duplication of effort and resources (e.g., including hardware, software, personnel, services and organizations). In Canada, the preferred term has become Digital Infrastructure to refer to what is also known as Cyber-Infrastructure or e-Research Infrastructure.
Synonym: Cyber-infrastructure
Short Definition: A broad term encompassing: (a) digital surrogates created as a result of converting analogue materials to digital form (digitisation); (b) "born digital" for which there has never been and is never intended to be an analogue equivalent; and, (c) digital records.
Reference: Digital preservation coalition
Synonym: Born digital; Digital objects; Digital records; Digital data; Electronic records
Short Definition: A digital object is editable, interactive, accessible and modifiable by means of digital objects other than the one governing its behaviour, and is distributed over information infrastructures. It is a machine-independent data structure consisting of one or more elements in digital form that can be parsed by different information systems; the structure helps to enable interoperability among diverse information systems in the Internet." A digital object is composed of structured sequence of bits/bytes. As an object it is named. The bit sequence realizing the object can be identified and accessed by a unique and persistent identifier or by use of referencing attributes describing its properties.
Synonym: Digital entity
Short Definition: A name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. A DOI is a type of Persistent Identifier (PID) issued by the International DOI Foundation. This permanent identifier is associated with a digital object that permits it to be referenced reliably even if its location and metadata undergo change over time.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; MIT data management and publishing
Synonym: DOI
Short Definition: The series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation is defined very broadly and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change. Those materials may be records created during the day-to-day business of an organisation; ""born-digital"" materials created for a specific purpose (e.g. teaching resources); or the products of digitisation projects. This definition specifically excludes the potential use of digital technology to preserve the original artefacts through digitisation.
Reference: Digital preservation coalition
Synonym: Digitisation; Preservation
Short Definition: Research data which is in digital form. It may have been originally created in digital form, or it may have been converted from paper, or other form to a digital representation.
Short Definition: Incorporates: building a digital collection of information for further study and analysis; creating appropriate tools for collection- building; creating appropriate tools for the analysis and study of collections; using digital collections and analytical tools to generate new intellectual products; and, creating authoring tools for these new intellectual products, either in traditional forms or in digital form.
Reference: Our Cultural Commonwealth
Short Definition: Non-continuous electronic signals.
Short Definition: The process of creating digital files by scanning or otherwise converting analogue materials. The resulting digital copy, or digital surrogate, would then be classed as digital material and then subject to the same broad challenges involved in preserving access to it, as "born digital" materials.
Reference: Digital preservation coalition
Short Definition: Data that contain errors. Dirty data can be caused by a number of factors including: inaccurate, incomplete or erroneous data such as spelling or punctuation errors, incorrect data or incorrect data type associated with a field, incomplete or outdated data, duplicate data, inconsistent data, incorrectly ordered data, improper parsing of fields from disparate systems, etc. Errors can be introduced at any stage as data are entered, stored and managed. Using a dirty dataset can lead to spurious associations, false conclusions and misdirected investments.
Synonym: Dirty dataset
Short Definition: In the context of records management: Destruction, alienation, or transfer to archives.
Long Definition: 1. Disposition refers to the process which enables governments or institutions to dispose of records [data] which no longer have operational value, either by permitting their destruction, by requiring their transfer to archives, or by agreeing to their alienation from the control of the government or institution. 2. Disposition is the action taken in regard to the disposal of Records, which can involve physical destruction by means of burning, pulping, shredding or recycling; secure deletion of electronic records, or physical destruction of electronic storage media; transfer to archival storage for selective or full retention; or special disposal through sale, grant or other formal act of alienation from the custody of government or institution that owns the data.
Reference: termiumplus.gc.ca ; https://universitycounsel.ubc.ca/files/2022/05/Records-Management-Policy_GA4.pdf
Short Definition: The act of interpreting an author's intended use of a word that has multiple meanings or spellings.
Short Definition: Distribution points to a specific instance of a dataset. A dataset can have several distributions.
Long Definition: A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above). Examples of distributions include a CSV file, a netCDF file, a data-cube, files made accessible according to different profiles such as XML or JSON schemas or ShEx or SHACL expressions, anonymized/de-identified files, etc.
Reference: RDA DMP Common Standard; https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution ; https://www.w3.org/TR/vocab-dcat-3/
Short Definition: The building blocks of an XML document.
Short Definition: Data that are delivered with all associated metadata, data dictionary, description of methods and instruments used to collect and process the data, and other supporting data (e.g., duplicate sample results, replicate analyses, percent recovery, etc.).
Short Definition: The URL of the downloadable file in a given format. e.g., CSV file or RDF file. It should be used for the URL at which this distribution is available directly, typically through a HTTP Get request.
Reference: Data Catalog Vocabulary (DCAT) version 2 (w3.org)
Short Definition: An initiative to create a digital "library card catalog" for the Web. Dublin Core is made up of 15 metadata elements that offer expanded cataloging information and improved document indexing for search engine programs. The 15 metadata elements used by Dublin Core are: title (the name given the resource), creator (the person or organization responsible for the content), subject (the topic covered), description (a textual outline of the content), publisher (those responsible for making the resource available), contributor (those who added to the content), date (when the resource was made available), type (a category for the content), format (how the resource is presented), identifier (numerical identifier for the content such as a URL), source (where the content originally derived from), language (in what language the content is written), relation (how the content relates to other resources, for instance, if it is a chapter in a book), coverage (where the resource is physically located), and rights (a link to a copyright notice).
Short Definition: Data the content of which is changing frequently and at asynchronous moments. Examples include: Data streams that are generated by sensors when it is unpredictable when data segments will appear in time (i.e. data streams have gaps); Data streams that are generated by humans in crowdsourcing scenarios where it is not clear when which cell in a database will be filled.
Short Definition: Computationally intensive, large-scale, networked and collaborative forms of research and scholarship across all disciplines, including all of the natural and physical sciences, related applied and technological disciplines, biomedicine, social science and the digital humanities.
Short Definition: Comprises the ICT assets, facilities and services that support research within institutions and across national innovation systems, and that enable researchers to undertake excellent research and deliver innovation outcomes.
Short Definition: Science supported to a significant degree by digital information-processing and/or computational technologies, or wholly based on these. Note that such a definition is functional, not some intrinsic property of the science. Data-based science, that is science which is based wholly or in part on exploiting existing information, is included within this definition. E-Science includes a very broad class of activities, as nearly all information gathering is computer based, or uses information technologies for measuring, recording, reporting, analysing. E-Science often involves intensive use of such technologies: advanced in technique, collaborative or on a large scale (over various possible measures: volumes of information, computational intensity, extent of distribution, variety of information types handled). E-Science can be conducted equally by individuals and small units - in other words, e-Science is equally relevant to small science, and indeed e-Science brings big science within the grasp of the less well-equipped - all you need is a computer.
Reference: Digital preservation coalition
Short Definition: The complex of a community of organisms and its environment functioning as an ecological unit.
Reference: Merriam-Webster dictionary
Short Definition: 1. A compilation of core electronic health data submitted by various healthcare providers and organizations, accessible by numerous authorized parties from a number of points of care, possibly even from different jurisdictions. 2. An official health record for an individual that is shared among multiple facilities and agencies. 3. Electronic health records typically include: Contact information, Information about visits to health care professionals, Allergies, Insurance information, Family history, Immunization status, Information about any conditions or diseases, A list of medications, Records of hospitalization, Information about any surgeries or procedures performed. Digitized health information systems are expected to improve efficiency and quality of care and, ultimately, reduce costs. The benefits of electronic health records include: The ability to automatically share and update information among different offices and organizations, More efficient storage and retrieval, The ability to share multimedia information, such as medical imaging results, among locations, The ability to link records to sources of relevant and current research, Easier standardization of services and patient care, Provision of decision support systems (DSS) for healthcare professionals, Less redundancy of effort, Lower cost to the medical system once implementation is complete, The governments of many countries are working to ensure that all citizens have standardized electronic health records and that all records include the same types of information. The major barrier for the adoption of electronic health records is cost.
Reference: Canadian Medical Protective Association (2014) Electronic Records Handbook
Synonym: Digital medical record.
Related Term: Electronic medical record
Short Definition: An electronic version of the paper record that doctors have traditionally maintained for their patients and which is typically only accessible within the facility or office that controls it.
Reference: Canadian Medical Protective Association (2014) Electronic Records Handbook
Synonym: Electronic health record
Short Definition: Machine processable specifications which define the structure and syntax of metadata specifications in a formal schema language.
Reference: Rhys Francis/TC3+
Short Definition: A technical service involved in the performance, inspection and leadership of skilled technical activities.
Long Definition: Examples include the: (a) Planning, design and making of maps, charts, drawings, illustrations and art work; (b) Design of three-dimensional exhibits or displays within a predetermined budget and pre-selected theme; (c) Conduct of analytical, experimental or investigative activities in the natural, physical and applied sciences; the preparation, inspection, measurement and analysis of biological, chemical and physical substances and materials; the design, construction, modification and assessment of technical systems and equipment or the calibration, maintenance and operation of instruments and apparatus used for these purposes; and the observation, calculation, recording and the interpretation, presentation and reporting of results of tests or analyses, including the: (i) performance of activities involving the application of the principles, methods, and techniques of engineering technology and a practical knowledge of the construction, application, properties, operation and limitations of engineering or surveying systems, processes, structures, buildings or materials, and machines or devices; (ii) planning of approaches, the development or selection and application of methods and techniques, including computer software, to conduct analytical, experimental or investigative activities; the evaluation and interpretation of results; and the preparation of technical reports; (iii) observation and recording of events and the analysis of information relating to such fields as meteorology, hydrography, or oceanography and the presentation of the results of such studies; and the provision of data and information relating to meteorology; (iv) monitoring and investigating of environmental hazards or the provision of advice on those issues impacting upon compliance with public health legislation; and, (v) design, development or application of tests, procedures and techniques in support of the diagnosis, treatment and prevention of human and animal diseases and physical conditions; (d) Application of statutes, regulations and standards affecting agricultural, fishery and forestry products; (e) Capture and development of images involving the operation and use of cameras, accessories and photographic processing and reproduction equipment; (f) Operation of television cameras and video recording systems and equipment; (g) Inspection and evaluation of quality assurance systems, processes, equipment, products, materials and associated components including electronic equipment used in trade measurement; the development, recommendation or enforcement of statutes, regulations, standards, specifications or quality assurance policies, procedures and techniques; and the investigation of accidents, defects and/or disputes; (h) Construction and repair of prostheses and orthoses; (i) Writing of standards, specifications, procedures or manuals related to the above activities; (j) Performance of other technical functions not included above; and, (k) Planning, development and conduct of training in, or the leadership of, any of the above activities.
Short Definition: A noteworthy improvement to a product as part of a new version of it.
Short Definition: 1. The difference between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition. 2. An incorrect step, process, or data definition. 3. An incorrect result. 4. A human action that produces an incorrect result.
Short Definition: The process of intentionally adding known faults to those already in a computer program for the purpose of monitoring the rate of detection and removal, and estimating the number of faults remaining in the program.
Short Definition: Ethical data means that: (a) Data are collected and managed in compliance with relevant government and professional codes of conduct, values and ethics, scientific integrity and responsible conduct of research; (b) Restricted, confidential, and sensitive data are handled appropriately, for example by implementing user authentication and controlled access to the data and and/or data anonymization and de-identification; (c) A statement is made as to whether or not Indigenous considerations exist and where applicable, Indigenous data sovereignty is respected and data are managed in accordance with CARE, OCAP and UNDRIP principles; (d) Data assets are managed in a manner such that data used as input to Big Data or Artificial Intelligence applications can be confirmed to be relevant, accurate, and up-to-date, and can be tested for unintended biases (GC 2019); (e) Authors and contributors are identified and contact person information is provided.
Long Definition:
Reference: CARE; OCAP; UNDRIP
Short Definition: Evaluation is a decision about significance, value, or quality of something, based on careful study of its good and bad features.
Short Definition: A position located no more than three hierarchical levels below the highest level in an organization, and that have significant executive managerial or executive policy roles and responsibilities or other significant influence on the direction of the organization. Executives are responsible and accountable for exercising executive managerial authority or providing recommendations and advice on the exercise of that authority.
Short Definition: Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.
Synonym: XML
Short Definition: A defining scheme used for identification of resources (including people and organizations) and the sharing of data across domains, enterprises, and applications. XRI TC will define a Uniform Resource Identifier (URI) scheme and a corresponding Uniform Resource (URN) namespace.
Short Definition: ETL involves the following steps: (a) Extract data from homogeneous or heterogeneous data sources which are often managed by different people. An intrinsic part of the extraction involves data validation to confirm whether the data pulled from the sources have the correct/expected values; (b) Transform the data for storing it in proper format or structure for querying and analysis purposes; An important function of transformation is the cleaning of data; and, (c) Load the data into the final target (database, operational data store, data mart, or data warehouse). ETL processes can involve considerable complexity, and significant operational problems can occur.
Synonym: ETL
Short Definition: The inability of a system or component to perform its required functions within specified performance requirements.
Short Definition: A mnemonic acronym, FAIR = Findable + Accessible + Interoperable + Reusable
Reference: FAIR Principles - GO FAIR (go-fair.org)
Short Definition: A mnemonic acronym, FAIRER = FAIR + Ethical + Reproducible
Long Definition: FAIR (Findable, Accessible, Interoperable, and Reusable) principles emphasize machine-actionability and have become a global norm across all data domains. However, data that are FAIR in all respects are not necessarily transparent although ethical and reproducible principles are also required across all domains.
Reference: CINECA 2023
Short Definition: A legal concept that allows the reproduction of copyrighted material for certain purposes without obtaining permission and without paying a fee or royalty. Purposes permitting the application of fair use generally include review, news reporting, teaching, or scholarly research. When in doubt, the quickest and simplest thing may to request permission of the copyright owner.
Short Definition: Selecting specific data that are significant in some particular context
Short Definition: A data table column name.
Synonym: Column.
Related Term: Attribute
Short Definition: A series of characters used to identify a computer file in a system.
Reference: Library and Archives Canada
Synonym: File name
Short Definition: An emergency allocation of resources, required to deal with an unforeseen problem.
Short Definition: Data that are not, under normal circumstances, subject to change. Examples of fixed data include results from concluded research, medical records, and historical data.
Synonym: Reference data; Archival data; Fixed-content data; Permanent data
Short Definition: Property of a computer file that indicates it has remained unaltered at the bit level between two points in time.
Reference: Library and Archives Canada
Short Definition: Foundational interoperability allows data exchange from one information technology system to be received by another and does not require the ability for the receiving information technology system to interpret the data.
Reference: Healthcare information management and systems society
Short Definition: A real or conceptual structure intended to serve as a support or guide for the building of something that expands the structure into something useful. The ability to make refinements may require that the design is fully known, and this is not necessarily known at the outset. "Framework" is thus sometimes used as a 'fuzzy' term. In computer systems, a framework is often a layered structure indicating what kind of programs can or should be built and how they would interrelate. Some computer system frameworks also include actual programs, specify programming interfaces, or offer programming tools for using the frameworks. A framework may be for a set of functions within a system and how they interrelate; the layers of an operating system; the layers of an application subsystem; how communication should be standardized at some level of a network; and so forth. A framework is generally more comprehensive than a protocol and more prescriptive than a structure. Examples of frameworks that are currently used or offered by standards bodies or companies include: Resource Description Framework (a set of rules from the World Wide Web Consortium for how to describe any Internet resource such as a Web site and its content); Internet Business Framework (a group of programs that form the technological basis for the mySAP product from SAP, the German company that markets an enterprise resource management line of products); Sender Policy Framework (a defined approach and programming for making email more secure; Zachman framework (a logical structure intended to provide a comprehensive representation of an information technology enterprise that is independent of the tools and methods used in any particular IT business).
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page
Short Definition: 1. A single, well-defined version of all the data entities in an organizational ecosystem. In this context, a golden record is sometimes called the "single version of the truth," where "truth" is understood to mean the reference to which data users can turn when they want to ensure that they have the correct version of a piece of information. The golden record encompasses all the data in every system of record within a particular organization. A system of record is an information storage and retrieval system that serves as the authoritative source for a particular data element in a system containing multiple sources of the same element. To ensure data integrity, a single system of record must always exist for each and every data element. A well-maintained, current golden record should be a fundamental element of the Master Data Management policy for every enterprise. 2. The word "golden" is sometimes used in information technology to express the importance of some type of source. In the context of virtualization, for example, a golden image is a template for a virtual machine, virtual desktop, servers, or hard disk drive.
Short Definition: 1. Exercising authority to provide direction and to undertake, coordinate, and regulate activities in support of achieving this direction and desired outcomes. 2. Governance can be thought of as the role of an organizations board of directors or its equivalent that is focused on defining that organizations purpose and the development of the strategies, objectives, values, and policies that frame how that purpose will be pursued. It includes the development of such things as mission statements, statements of organizational objectives and values, logic models, organizational performance metrics, risk management frameworks, policies and guidelines for financial and operational matters, stakeholder relations, etc.
Short Definition: Provides the relationship and process context for working together to ensure outcomes are achieved.
Short Definition: The size in which data fields are sub-divided. [Lengthy definition describes coarse, fine and even finer granularity.]]
Reference: DAMA Dictionary of Data Management; Wikipedia
Short Definition: An imaginary creature that causes trouble in devices and systems of all kinds. During the Second World War, the term was used by British airmen to refer to ongoing trouble with aircraft in spite of mechanics' best efforts. Gremlins sometimes appear today in computer systems and networks. Although gremlins never do their dirty work in plain sight, a gremlin is usually portrayed as a small troll-like creature. No one has ever seen one and caught it in the act of sabotaging an aircraft, radio transmitter, computer, robot, or other system. The instant any human looks for a gremlin, it vanishes, although evidence of its mischief may remain. Gremlins are particularly adept at causing intermittent malfunctions, which have been the bane of technicians and engineers for centuries.
Synonym: Moof monster; Murphy's Law
Short Definition: Any distributed infrastructure that is federated to combine resources from multiple organizations managed by different administrative domains. The Grid aims to coordinate the sharing of resources in a dynamic and multi-institutional setting to provide additional functionality beyond its constituent parts: brokering, workflow coordination, integration of computing and storage. In order for this to happen, interoperability and standards need to be defined at various levels: for resource access, for coordination and business logic, for data storage and management, for network access and so forth.
Reference: European Commission, Advancing Technologies and Federating Communities
Short Definition: 1. The transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. 2. Used in many encryption algorithms.
Short Definition: The application of a comprehensive knowledge of professional specialties in the fields of dentistry, medicine, nursing, nutrition and dietetics, occupational and physical therapy, pharmacy, psychology and social work to the safety and physical and mental well-being of people; and, in the field of veterinary medicine, to the prevention, diagnosis and treatment of animal diseases and the determination of the human safety of veterinary drugs. Activities include the: (a) Prevention, diagnosis and treatment of dental disease and abnormal dental conditions, and the management of dental health programs; (b) Conduct and management of programs to promote public and individual health and the reduction of disease; (c) Prevention, diagnosis and treatment of disease, disability and abnormal physical and mental conditions; (d) Assessment of the incidence and prevalence of diseases; the assessment of the fitness for work; the medical assessment of applicants for immigration; and the assessment of the medical fitness of specific occupations; (e) Assessment of medical fitness for the determination of disability, special equipment and services; (f) Appraisal of drugs and medical devices for safety and efficacy under the conditions of their intended use; (g) Care of patients and the treatment and management of illness in co-operation with medical doctors, and the provision of specialized nursing services; (h) Evaluation of nursing policies, procedures, standards and practices and the conduct of related research and education; (i) Development of standards and guides in the field of nutrition and dietetics; the assessment of nutritional requirements and provision of nutrition and dietetic services; the provision of nutritional education and information; the management of nutritional programs; and the management of food services; (j) Assessment and treatment of clients for whom occupational or physical therapy services are required for the improvement or maintenance of their well-being; (k) Planning and management of client treatment or health education programs delivered by health care providers; (l) Compounding and dispensing of drugs; and the maintenance and control or the audit of drug stocks; (m) Conduct of research in human behaviour, the assessment of human motives, abilities, skills, decisions and acts, and the treatment of human behaviour; (n) Promotion of individual, group and community well-being through the identification and assessment of social needs; and the planning, development and delivery and management of social programs and social work services with the objective of lessening, removing or preventing the physical, emotional and material problems of individuals, families or groups; (o) Prevention, diagnosis and treatment of animal diseases; the examination of animals, organs and tissues to determine whether they are diseased or potentially harmful to people or animals; and the evaluation of veterinary drugs to determine their human safety; (p) Provision of advice in the above fields; and, (q) Leadership of any of the above activities.
Short Definition: A two-dimensional representation of data in which values are represented by colors. Heat maps communicate relationships between data values that would be would be much more difficult to understand if presented numerically in a spreadsheet.
Short Definition: High-quality data are complete, timely, accurate, consistent, relevant, reliable, traceable, cleaned, validated, and well documented.
Short Definition: 1. A system where data are stored; 2. A data repository (e.g. a Core Trust Seal certified repository located in Europe that uses DOIs); 3. A system where data are stored and processed during research (e.g., a high performance computer that uses fast storage with two daily backups).
Reference: RDA maDMP common standard
Short Definition: Data and code that are commented so that humans can understand what it represents, its design, and purpose.
Reference: Wilson G, Aruliah DA, Brown CT, Hong NPC, Davis M, Guy RT, Haddock SHD, Huff K, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P (2012). Best practices for scientific computing , arXiv, 29 November, 1-6.
Short Definition: Hypermedia As The Engine Of application State.
Synonym: HATEOS
Short Definition: More formally known as the National Strategy for Trusted Identities in Cyberspace, the identity ecosystem is a proposal from the United States federal government to improve identity authentication on the Internet and make online transactions safer. The proposal has four goals: To develop a comprehensive Identity Ecosystem framework; To build and implement an interoperable identity infrastructure aligned with the framework; To enhance confidence and willingness to participate in the Identity Ecosystem; To ensure the long-term success of the Identity Ecosystem; The proposal invites citizens to suggest ideas for creating the ecosystem. "We seek a future where individuals can voluntarily choose to obtain a secure, interoperable, and privacy-enhancing credential from a variety of service providers - both public and private - to authenticate themselves online for different types of transactions." Although there has already been some resistance to the scheme from privacy advocates, it is likely that to the end user, an Identity Ecosystem will be familiar and work in much the same way as the financial ecosystem that allows people to withdraw cash from an ATM machine, even when the machine belongs to another bank in another city. The Identity Ecosystem will add another layer of security, aimed at reducing identity theft and simplifying the user experience for various other types of electronic transactions. Once the Identity Ecosystem standards are defined, the government hopes to spark adoption by making the federal government an early adopter and offering businesses financial incentives for developing, training and implementing the framework.
Short Definition: In the context of a researcher's activities, impact is the consequence of the research and new knowledge on the advancement of the specialty. Science-based policies, regulations, services and technology transfers are some examples of ways target results can be achieved and impact demonstrated. Impact is one of four valued outcomes. In a 5-level incumbent-based process, demonstrated valued outcomes of impact in research, development and analysis (RDA) include: (1) contributions to the area of specialty; (2) Influenced research project results in the area of specialty; (3) Influenced the achievements or results in the area of specialty relevant to a program (policies, regulations, services, technology transfers or partnerships), and within the science community; (4) Influenced changes research priorities and directions; Influenced the setting or achievement of intermediate term results; Influenced decision-making at the program level; and, (5) Multi-year contributions in multiple/broad areas or in depth that are relevant to strategic results or public good; Contributions leading the discussions at national and international bodies to establish standards, regulations, agreements, programs, policies, services, etc.; Led changes within area of specialty at the national or international level. Demonstrated valued outcomes of impact in managing research include: (1) Not applicable; (2) Influenced the delivery of projects; (3) Influenced program planning and resources allocation affecting delivery of projects; (4) Influenced program planning, policies and resources allocation affecting program delivery; and, (5) Influenced the development of plans, policies and operations related to strategic outcomes. Demonstrated valued outcomes of impact in representation and client services include: (1) Not applicable; (2) Influenced the delivery of clients research project in area of specialty; (3) Influenced the directions of clients research projects in area of specialty; Influenced the directions of local, regional, national or inter-organizational bodies related to area of specialty; (4) Influenced the directions of clients and of national or international bodies related to areas of specialty; (5) Influenced the directions of clients and of national and international bodies related to strategic outcomes.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Valued outcome; Incumbent-based; Research, development and analysis; Managing research; Representation and client services
Short Definition: Use an application (software) to open a file that is in a format different from the format the application creates on its own. Assuming the application knows how to import (reformat) the file, it does so and then opens it for the user to work on. After working with the opened file, the application user can usually simply "Save" or "Close" the file, leaving it in the current format, or can export it to another format.
Short Definition: In the context of a researcher's job, an incumbent-based position means that the researchers achievements in research contexts determine his/her level for initial appointments and promotion in a job. Incumbents are promoted by appointment to a higher level in their own positions based upon the incumbent's' qualifications.. Only valued outcomes are used to assess a researchers level. Valued outcomes are identified as those being expected at the respective level, for entry into that level, for each of three research contexts. At each level, the competencies, responsibilities and valued outcomes of the previous level are assumed to be included. Evidence at each level may be similar but is expected to be more extensive at higher levels. Other job types are governed by a position-based approach (e.g., Research manager).
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Researcher; Principal Investigator; Research manager; Project manager; Laboratory manager; Manager; Researcher level, Incumbent-based; Research context; Valued outcome
Short Definition: Employment of no fixed duration, whether part-time, full-time or seasonal.
Short Definition: 1. Knowledge captured in any format, such as facts, events, things, processes, or ideas, that can be structured or unstructured, including concepts that within a certain context have particular meaning. Information includes data. 2. The aggregation of data to make coherent observations about the world, meaningful data, or data arranged or interpreted in a way to provide meaning.
Reference: https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=32603 ; Carol Tenopir (2007); William Hersh 2007). See, Zins (2007)
Short Definition: How organizations manage the way information and data are handled within the health and social care system. It covers the collection, use, access and decommissioning as well as requirements and standards organizations and their suppliers need to achieve to fulfill the obligations that information is handled legally, securely, efficiently, effectively and in a manner which maintains public trust.
Reference: U.K. Government, 2013
Related Term: Privacy governance; Governance and accountability mode]]
Short Definition: A discipline that directs and supports effective and efficient management of information and data in an organization, from planning and systems development to disposal or long-term preservation.
Reference: https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=32603
Short Definition: A person having a broad knowledge of information management disciplines and who provides guidance and support to program and staff functions on all aspects of managing the information resource.
Short Definition: A person who is expert in one or more of the information management disciplines that support the effective and efficient management of information.
Short Definition: Heterogeneous data sources.
Short Definition: Any equipment or system that is used in the acquisition, storage, manipulation, management, movement, control, display, switching, interchange, transmission, or reception of information or data. It includes all matters concerned with the design, development, installation and implementation of information systems and applications.
Reference: GC Policy on Service and Digital - Canada.ca
Short Definition: Information systems and technology infrastructure manager, expert, or technician.
Short Definition: In the context of a researcher's activities, innovation is the development of modified or novel approaches, theories, concepts, ideas or solutions. Innovation is one of four valued outcomes.
Long Definition: In a 5-level incumbent-based process, demonstrated valued outcomes of innovation in research, development and analysis (RDA) include: (1) Contributions to improvements in theories and/or techniques in a novel way and that have helped produce results; contributions to novel research, development or analysis proposals/initiatives that have been accepted and implemented within a project; (2) Improvements of theories and/or techniques that have been successfully adopted and applied to resolve issues related to project results; Preparation of novel research development or analysis proposals/initiatives that have been accepted and implemented within a program; (3) Conceptualization of improvements of theories and/or techniques, and development of their application that enabled contributions to the achievement results; Preparation of in-depth novel research, development or analysis proposals/initiatives that have been accepted and implemented across programs or by stakeholders; (4) Development of novel theories and/or techniques which represent an advance in a specific area of research, development or analysis, or has applied existing theories and/or techniques to new areas where such application had not been obvious before, enabling contributions to achieving results; Proposal of novel research, development or analysis (RDA) projects which have been accepted and implemented by some stakeholders; and, (5) Integration of areas of inquiry, development of new theories and techniques representing an advance, and application of these to achieve breakthroughs in scientific advancement and the achievement of results; Proposal of novel initiatives that have received acceptance, recognition and support, that have been accepted and implemented by a significant number of stakeholders. In research, innovation is not applicable in the research contexts of managing research, or representation and client services.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Valued outcome; Incumbent-based; Research, development and analysis; Managing research; Representation and client services
Short Definition: A variable (whether stored within a component or outside it) that is read by the component.
Short Definition: 1. A tool or device that is used to do a particular task. 2. A device that is used for making measurements of something.
Short Definition: Raw electronic data generated by an instrument, analyzer, or data logger before any human action on the data and before any processing of the data by automated or semi-automated 3rd-party software or algorithms.
Synonym: Raw data
Short Definition: A combination of business processes, policies and technologies that allows organizations to provide secure access to confidential data. Integrated access management software is used by enterprises to control the flow of sensitive data in and out of the network.
Short Definition: 1. The act of bringing together smaller components into a single system that functions as one. 2. In the context of information technology: The end result of a process that aims to stitch together different, often disparate, subsystems so that the data contained in each becomes part of a larger, more comprehensive system that, ideally, quickly and easily shares data when needed. This often requires that organizations build a customized architecture or structure of applications to combine new or existing hardware, software and other communications.
Short Definition: In the context of data and network security: The assurance that information can only be accessed or modified by those authorized to do so. Measures taken to ensure integrity include controlling the physical environment of networked terminals and servers, restricting access to data, and maintaining rigorous authentication practices. Data integrity can also be threatened by environmental hazards, such as heat, dust, and electrical surges.
Short Definition: The capacity to influence stakeholders and the direction of research activities; the ability to shape others' understanding in ways that capture interest, inform and gain support; and, the capacity to influence the actions and opinions of others.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Short Definition: A study undertaken by scholars from two or more distinct scientific disciplines. The research is based upon a conceptual model that links or integrates theoretical frameworks from those disciplines, uses study design and methodology that is not limited to any one field, and requires the use of perspectives and skills of the involved disciplines throughout multiple phases of the research process.
Reference: Aboelela, S. W., Larson, E., Bakken, S., Carrasquillo, O., Formicola, A., Glied, S. A., Haas, J. and Gebbie, K. M. (2007), Defining Interdisciplinary Research: Conclusions from a Critical Review of the Literature. Health Services Research, 42: 329-346. doi: 10.1111/j.1475-6773.2006.00621.x
Short Definition: Testing conducted to evaluate whether systems or components pass data and control correctly to each other.
Short Definition: In the context of chemistry: The International Chemical Identifier is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations. International Union of Pure and Applied Chemistry (IUPAC)
Reference: MIT data management and publishing
Short Definition: A standard that is used in multiple nations and whose development process is open to representatives from all countries. Some international standards are promulgated by multinational treaty organizations (e.g., the International Telecommunications Union (ITU); the United Nations Food and Agriculture Organization (FAO)). Some international standards are promulgated by multinational non treaty organizations (e.g., the International Organization for Standardization (ISO); the International Electrotechnical Commission (IEC)). Some international standards are promulgated by organizations that originated as national industry associations, professional societies, or standards developers, but over time evolved into a global presence with multinational participation (e.g., ASTM International, SAE International, and NFPA International). Annex 4 of the World Trade Organization (WTO) Committee on Technical Barriers to Trade Report 2000 contains a good discussion of what constitutes an international standard. In short, the WTO suggests that a standard may be considered international if the processes and procedures used to develop it are transparent, open, impartial, and provide meaningful opportunities for WTO members, as a minimum, to contribute to the development of the standard so that the standard does not favor any particular suppliers, countries, or regions. Equally important, the standard must have a global relevance and use.
Short Definition: A worldwide federation of national standards bodies from 143 countries. ISO is a non-governmental organization that promotes the development of standardization and related activities to facilitate the international exchange of goods and services, and to develop cooperation in intellectual, scientific, technological, and economic activity. The results of ISO technical work are published as international standards.
Synonym: ISO
Short Definition: The capability to communicate, execute programs, or transfer data among various functional units in a useful and meaningful manner that requires the user to have little or no knowledge of the unique characteristics of those units. Foundational, syntactic, and semantic interoperability are the three necessary aspects of interoperability.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; The Open Group TOGAF Documentation. TBS Standard On Metadata (Dublin Core Metadata Initiative); DAMA Dictionary of Data Management; ISO/IEC 2382-01, Information Technology Vocabulary, Fundamental Terms
Short Definition: 1. In the context of research, development and analysis: The quest to find the answer to a question using the scientific method. 2. In a legal or administrative context: An inquiry into concerns or allegations related to wrongdoing or illegal activity.
Synonym: Audit
Short Definition: Specifies the general requirements for the competence to carry out tests and/or calibrations, including sampling. It covers testing and calibration performed using standard methods, non-standard methods, and laboratory-developed methods. Originally known as Guide 25, ISO 17025 was initially issues in 1999. The 2005 revision of ISO 17025 introduced greater emphasis on the responsibilities of senior management, the identification of internal communication needs, and explicit requirements for continual improvement of the management system itself, and particularly, communication with the client.
Short Definition: A metadata profile that specifies the elements and syntax to be used when implementing the international geospatial standard (ISO 19115: 2003) in North America.
Reference: Government of Canada, Environment Canada data stewardship handbook (draft)
Synonym: North American Profile for ISO 19115; NAP.
Short Definition: ISO/TS 8000-1:2011 contains a statement of the scope as a whole, principles of data quality, the high-level data architecture of ISO 8000, a description of the structure of ISO 8000, and a summary of the content of the other parts of the general data quality series of parts of ISO 8000. It also describes the relationship between ISO 8000 and other standards. ISO/TS 8000-100:2009 addresses the quality of master data, specifies the scope of the master data quality series of parts of ISO 8000, introduces master data, describes the data architecture for master data, and gives an overview of the remaining parts of the series. ISO 8000-2:2012 establishes the vocabulary for ISO 8000.
Short Definition: A family of standards and guidelines related to quality management systems, terminology, and tools, such as auditing. It states requirements for what an organization must do to manage processes influencing quality. While ISO 9000 is primarily concerned with processes and not products, the way an organization manages its processes affects the final product and helps ensure that its products conform to the clients requirements.
Short Definition: JavaScript Object Notation is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.
Reference: https://www.json.org/json-en.html
Short Definition: A subset of Stakeholders who, if their support were to be withdrawn, would cause the project to fail.
Reference: Cornell Project Management Methodology
Short Definition: The rules and organizing principles gleaned from aggregated data. The internalized or understood information that can be used to make decisions.
Reference: William Hersh (2007); Carol Tenopir (2007). See, Zins (2007)
Short Definition: The person who is responsible for the overall administration and the scientific and technical operation of a laboratory including the supervision of tests and the reporting of results of tests. The laboratory manager is responsible for assuring that the laboratory complies with all laws and regulations and is in conformance with all applicable standards. General responsibilities include personnel, facilities and safety, test procedures, quality management, consultation and education, communication, and operational management. Although it takes many people in a variety of roles to make a laboratory run well, the ultimate accountability for success or failure rests with the laboratory manager. It is imperative that the laboratory manager have adequate authority to discharge this accountability.
Reference: https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&ved=0CE4QFjAHahUKEwjBw4G3npLHAhXD_R4KHcQeDp4&url=http%3A%2F%2Fwww.springer.com%2Fcda%2Fcontent%2Fdocument%2Fcda_downloaddocument%2F9780387368382-c1.pdf%3FSGWID%3D0-0-45-359001-p173670241&ei=DCnCVcGNNMP7e8S9uPAJ&usg=AFQjCNGoAOnhtHKaJw8wvGT9jJpu9apBBA&sig2=dG7bWdT7QCU00TlWShkG8Q&bvm=bv.99261572,d.dmo
Synonym: Laboratory director; Laboratory Head; Unit Head
Short Definition: A person who under the general supervision of a laboratory manager supervises laboratory personnel and who may perform tests requiring special scientific skills.
Synonym: Laboratory technical director
Short Definition: A person who under direct supervision performs laboratory tests which require limited technical skill and responsibilities.
Short Definition: A person who under general supervision performs tests which require the exercise of independent judgment.
Short Definition: An area defined by elements and their interaction (interfaces, protocols) where many specifications are unclear, but where it is nonetheless possible to indicate some essential functions even though not all elements (components, services) are known. A landscape may contain multiple frameworks at different stages of development and sophistication.
Short Definition: Data that fall into the category of dark data or at-risk data.
Synonym: Dark data; At-risk data
Short Definition: Data where relationships/connections between them are available to allow easy data access. A typical case of a large Linked dataset is DBPedia (http://dbpedia.org/), which essentially makes the content of Wikipedia available in RDF. This related collection of interrelated datasets is stored on the Web and available via a common format -RDF. http://www.w3.org/standards/semanticweb/data#summary
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://dbpedia.org/ ; http://www.w3.org/standards/semanticweb/data#summary
Short Definition: Long-term preservation - Continued access to digital materials, or at least to the information contained in them, indefinitely.
Reference: Digital preservation coalition
Short Definition: A machine-actionable Data Management Plan (maDMP), updated during the entire data lifecycle, provides information about a project and its data in a discipline agnostic standardized manner that is readable and reusable by both humans and automated systems. maDMPs facilitate collaboration, reporting, compliance, and integration with automated systems.
Long Definition: Machine-actionable Data Management Plans (maDMPs), are an enterprise solution that operationalizes FAIRER (Findable, Accessible, Interoperable, Reusable, Ethical, and Reproducible) data management principles and enables an organization to plan more easily, document costing and funding, track inputs and outputs, provide customized reports, and ensure transparency throughout the data lifecycle. They provide information about contributors, partner agreements, distributions and licensing, storage, technical resources and computing needs, processing workflows, associated code and software, security and privacy, data quality, ethical issues, Indigenous considerations, retention and disposition, approvals, and more. maDMPs are the means for rapidly building reliable, lightweight, scalable, and easily customized automated systems with appropriate access controls while maximizing interoperability. (See, also, the definition “FAIRER”)
Reference: https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard ; https://fairerdata.github.io/maDMP-Standard/
Short Definition: In a form that can be used and understood by a computer.
Short Definition: A broad term encompassing: (a) digital surrogates created as a result of converting analogue materials to digital form (digitisation); (b) "born digital" for which there has never been and is never intended to be an analogue equivalent; and, (c) digital records.
Reference: Open Data 101 (Government of Canada); Other
Synonym: Born digital; Digital objects; Digital records; Digital data; Electronic records
Short Definition: Implement the policies that govern the arrangement, naming, descriptive metadata, provenance metadata, representation metadata, administrative metadata, access controls, retention, disposition, integrity, and replication of digital objects.
Short Definition: Implement the policies that govern the choice of metadata schema, reserved vocabularies, metadata organization in tables, and metadata properties (creation date, access control, ownership, etc.).
Short Definition: Program delivery managers and support function managers, at all levels in an institution who are accountable for the direct delivery and support of programs and services within their domain of business responsibility.
Synonym: Chief.
Related Term: Research manager; Project manager; Principal Investigator
Short Definition: In the context of a researcher's activities, Managing research is the processes related to the planning, organizing, setting objectives, controlling and evaluating of RDA activities and their associated human and financial resources. It includes the provision of leadership to, and assessment of, other scientists, engineers, technologists, and/or other staff. Managing research is one of the three research contexts in which a researcher is expected to conduct his/her activities. Managing research is distinct from the position-based role of a research manager.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Research context; Research manager; Program manager; Laboratory manager; Manager
Short Definition: Requires compliance because of a government statute or regulation, an organization internal policy, or contractual requirement. Failure to comply with a mandatory standard usually carries a sanction, such as civil or criminal penalties, or loss of employment.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: Lightweight composite applications that source all of their content from existing systems and data sources; they have no native data store or content repository. To access the resources that they leverage, mashups employ the technologies of the Web, including representational state transfer (REST) APIs, RSS and ATOM feeds and widgets.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://www.gartner.com/it-glossary/mashups
Short Definition: The application of a set of data transformation techniques to de-identify data without any concern for the analytical utility of the data. This is a good approach for fields that are not required to be analyzed. Masking is applied to direct identifiers such as name and phone number. Masking techniques include, among others, removal of direct identifiers or replacement of direct identifiers with pseudonyms
Reference: El Emam, K. (2013). Privacy Analytics White Paper: Overview of Re-identification Risk Assessment and Anonymization Process. Ottawa (ON): Privacy Analytics, Inc.
Short Definition: In the context of health information technology (HIT): defines minimum government standards for using electronic health records (EHR) and for exchanging patient clinical data between healthcare providers, between healthcare providers and insurers, and between healthcare providers and patients.
Short Definition: Medium-term preservation - Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely.
Reference: Digital preservation coalition
Short Definition: A program, system, dataset, or product that meets predefined requirements for a stated purpose - or does not meet the requirements. Preferable to using words such as perfect, outstanding, excellent, extremely good, very good, reasonable, acceptable, fine, adequate, all right, satisfactory, ok, or tolerable which are ambiguous unless defined in terms of specific, predefined requirements.
Synonym: Requirements analysis
Short Definition: In an open network such as the Internet, message privacy, particularly for e-commerce transactions, requires encryption. The most common approach on the Web is through a public key infrastructure (PKI). For e-mail, many people use Pretty Good Privacy (PGP), which lets an individual encrypt a message or simply send a digital signature that can be used to verify that the message was not tampered with enroute.
Synonym: Pretty Good Privacy ID; Pretty Good Privacy fingerprint
Short Definition: Literally, "data about data"; data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes. Business metadata includes the names and business definitions of subject areas, entities and attributes, attribute data types and other attribute properties, range descriptions, valid domain values and their definitions. Technical metadata includes physical database table and column names, column properties, and the properties of other database objects, including how data is stored. Process metadata is data that defines and describes the characteristics of other system elements (processes, business rules, programs, jobs, tools, etc.). Data stewardship metadata is data about data stewards, stewardship processes and responsibility assignments.
Short Definition: A document that specifies which metadata elements are to be used, how they are to be used, and under what constraints, often by combining terms from multiple metadata standards. It includes guidelines for implementation and ensures interoperability and consistency. A metadata application profile is a mechanism for semantic interoperability in FAIRER and open data publishing.
Long Definition: A metadata application profile (MAP) is a set of recorded decisions about a shared data target for a given community. MAPs declare what models are employed (what types of entities will be described and how they relate to each other), what controlled vocabularies are used, the cardinality of fields/properties (what fields are required and which fields have a cap on the number of times they can be used), data types for string values, and guiding text/scope notes for consistent use of fields/properties. A MAP may be a multipart specification, with human-readable and machine-readable aspects, sometimes in a single file, sometimes in multiple files (e.g., a human-readable file that may include input rules, a machine-readable vocabulary, and a validation schema). EXAMPLE: A MAP for a digital library might include elements such as "title," "author," "publication date," and "subject," with guidelines on how to enter data for each element and constraints such as mandatory fields or specific formats. Although MAPs do not necessarily have to be machine-actionable (e.g., https://pro.dp.la/hubs/metadata-application-profile), there are benefits to generating these decisions in a machine-parseable way: MAPs are more explicit when machine-actionable, providing less room for interpretation--no matter the platform used to create the data, the output should be interoperable. If the syntax of the MAP (e.g., SHACL, ShEx, YAML, JSON Schema, etc.) is supported by tooling it can possibly be reused in web form configurations or as a validation target. Machine-actionable MAPs can be used to generate human-readable documentation for catalogers, but it is unlikely human-readable MAPs can easily produce machine-actionable files.
Reference: https://www.loc.gov/aba/pcc/taskgroup/Metadata-Application-Profiles.html ; https://www.sciencedirect.com/science/article/pii/S2543925124000044
Related Term: metadata schema
Short Definition: A catalogue containing metadata records in XML-encoded (machine-readable and human-readable) format that enables services to find data and services.
Short Definition: A collection of data defined by a theme, category, which reflects what is being measured, observed, monitored at the various sites. The Metadata Record is an information resource of business value.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; DCC/TC3+
Short Definition: A formal, machine-readable specification that defines the structure, content, and constraints of a set of metadata elements, often using formats like JSON Schema, XML Schema (XSD), or RDF Schema.
Long Definition: A metadata schema is a formal, machine-readable specification that defines the structure, constraints, and data types of a set of metadata elements. It is used to validate and enforce the structure of metadata instances against the defined rules. COMPONENTS: Structure (Defines the hierarchical organization of elements; Data Types (Specifies the types of data that each element can hold (e.g., string, date, URI)); Validation Rules (Includes constraints such as required elements, allowed values, and format specifications); and, Interoperability (Ensures that metadata adheres to a consistent structure for processing and exchange). EXAMPLE: A JSON Schema or XML Schema Definition (XSD) for metadata might define that the "title" element is a string, the "publication date" is a date format, and the "author" element is an object containing "first name" and "last name" fields.
Related Term: metadata application profile
Short Definition: Computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to perform communication and input/output, so they can focus on the specific purpose of their application.
Reference: Research Data Canada Infrastructure Committee; Wikipedia
Short Definition: A means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next. The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the refreshing of storage media in that it is not always possible to make an exact digital copy or replicate original features and appearance and still maintain the compatibility of the resource with the new generation of technology.
Reference: Digital preservation coalition
Short Definition: A description with very little curation that would include at least a name and PID of a data object. Minimal metadata is only marginally targeted at discovery since there is much better infrastructure to accomplish this.
Short Definition: A vague and indefinable source of trouble for users of information technology. The term is used especially by people who frequent Internet Relay Chat (IRC) channels. If you are suddenly disconnected from your channel, it can be attributed to the moof monster. You are said to have been "moofed." The term seems reminiscent of the gremlin of the 1940 era. This term is apparently unrelated to the "moof!", the sound of the dogcow in an Apple Macintosh.
Synonym: Gremlin; Murphy's Law
Short Definition: The original Murphy's Law was "If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it."
Long Definition: The law's author was Edward A. Murphy, Jr., a U.S. Air Force engineer, who, in 1947, was involved in a rocket-sled experiment in which all 16 accelerator instruments were installed in the wrong way, resulting in Murphy's observation. Murphy's Law is sometimes expressed as "Anything that can go wrong, will -- at the worst possible moment." In that format, the Law was popularized by science-fiction writer Larry Niven as "Finagle's Law of Dynamic Negatives " (sometimes known as "Finagle's corollary to Murphy's Law"). Extrapolating from the original, we arrive at Murphy's Laws of Information Technology, a set of principles that may seem to be jokes but which events sometimes prove to be fundamental truths. Examples of Murphy's Laws relative to hardware: Law of Inconvenient Malfunction (A device will fail at the least opportune possible moment); Law of Cable Compatibility (If you choose a cable and a connector at random, the probability that they are compatible is equal to zero); Law of Hardware Compatibility (The probability of a given peripheral being compatible with a PC is inversely proportional to the immediate need for that peripheral); Law of Bad Sectors (The probability that an untested diskette will have bad sectors is directly proportional to the importance of the data written onto the diskette); First Law of Selective Gravitation: (When an object is dropped, it will fall in such a way as to cause the greatest possible damage to itself and/or other objects on which it lands); Second Law of Selective Gravitation (The tendency for an object to be dropped is directly proportional to its value); Law of Reality Change (Unalterable hardware specifications will change as necessary to maximize frustration for personnel affected by said specifications); Law of Noise (Noise bursts occur so as to cause the most, and/or most serious, errors in data communications, regardless of the actual amount of noise present); Law of Expectation (Consumer expectations always outpace advances in hardware technology); and, Law of the Titanic (If a device cannot malfunction, it wil). Examples of Murphy's Laws as they relate to programming: Law of Debugging (The difficulty of debugging software is directly proportional to the number of people who will ultimately use it); Law of Neurosis (The chances of software being neurotic (developing bugs spontaneously without apparent reason) is directly proportional to the confusion such neurosis can cause); Law of Available Space (If there are n bytes in a crucial software program, the available space for its convenient storage or loading is equal to n-1 bytes); First Law of Bad Sectors (The probability of software being mutilated by bad sectors is directly proportional to the value and/or importance of the programs); Second Law of Bad Sectors (When a program is mutilated by bad sectors, the damage will occur at the point(s) that result in the most frequent and/or severe errors when the program is run); Law of Noise (When a downloaded program is corrupted by noise, the corruption will occur at the point(s) that result in the most frequent and/or severe errors when the program is run); Law of Software Compatibility (If two programs are chosen at random, the probability that they are compatible is equal to zero); Law of Option Preferences (When two people share a computer, their software option preferences will differ in every possible way); Law of Expectation (Consumer expectations always outpace advances in software technology); and, Law of the Titanic (Bug-free software isn't).
Synonym: Gremlin; Moof monster
Short Definition: Uniquely identifies a set of names so that there is no ambiguity when objects having different origins but the same names are mixed together. Using the Extensible Markup Language (XML), an XML namespace is a collection of element type and attribute names. These element types and attribute names are uniquely identified by the name of the unique XML namespace of which they are a part. In an XML document, any element type or attribute name can thus have a two-part name consisting of the name of its namespace and then its local (functional) name.
Short Definition: From an "official" perspective, a national standard is adopted by a national standards body (e.g., Standards Council of Canada, American National Standards Institute, British Standards Institution) and made available to the public. Practically speaking, however, a national standard is any standard that is widely used and recognized within a country. In this context, even government standards, such as those issued by the Occupational Safety and Health Administration (OSHA), can be considered national standards.
Short Definition: Testing aimed at demonstrating that something does not work.
Synonym: Dirty testing
Short Definition: Meaningless data, including: Any data that cannot be understood and interpreted correctly by machines, such as unstructured text; any data that has been received, stored, or changed in such a manner that it cannot be read or used by the program that originally created it.
Synonym: Corrupt data
Short Definition: Data that could not lead to the identification of a specific individual, to distinguishing one person from another, or to personally identifiable information. These may be data that have been de-identified, or that could not lead to personally identifiable information in the first place.
Synonym: Non personally identifiable information
Short Definition: Data that could not lead to the identification of a specific individual, to distinguishing one person from another, or to personally identifiable information. These may be data that have been de-identified, or that could not lead to personally identifiable information in the first place.
Short Definition: Data for which no injury would result from their compromise.
Reference: Canada Directive on Security Management, Standard on Security Categorization
Short Definition: The process of organizing data into tables in such a way that the results of using the database are always unambiguous and as intended. Normalization is typically a refinement process after the initial exercise of identifying the data objects that should be in the database, identifying their relationships, and defining the tables required and the columns within each table. First normal form (1NF) is the "basic" level of normalization: Data and information are contained in two-dimensional tables with rows and columns. Each column corresponds to a sub-object or an attribute of the object represented by the entire table. Each row represents a unique instance of that sub-object or attribute and must be different in some way from any other row (that is, no duplicate rows are possible). All entries in any column must be of the same kind. For example, in the column labeled "Date," only dates are permitted. In Second normal form (2NF), the tables are in first normal form and, in addition, each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table. At the second normal form, modifications are still possible because a change to one row in a table may affect data that refers to this information from another table. In Third normal form (3NF), the tables are in second normal form and, in addition, there is no transitive functional dependency. For example, if A is functionally dependent on B, and B is functionally dependent on C, then C is transitively dependent on A via B. In Domain/key normal form (DKNF), a key uniquely identifies each row in a table. A domain is the set of permissible values for an attribute. By enforcing key and domain restrictions, the database is assured of being freed from modification anomalies. DKNF is the normalization level that most designers aim to achieve.
Short Definition: A type of repository with a network accessible server that can process the 6 OAI-PMH requests in the manner described in the OAI Implementation Guide.
Short Definition: An object model that is the logical attributes or properties associated with a particular object. In a data object this would be the associated properties.
Short Definition: A collection of descriptions of classes or interfaces, together with their member data, member functions, and class-static operations.
Short Definition: The characteristics of any digital object can be described by a number of properties which are typically stored in metadata and/or PID records.
Short Definition: Available to anyone based on an open licence (e.g., Creative Commons Licence CC0 or CC-BY; Open Government Licence)
Reference: https://theodi.org/insights/tools/the-data-spectrum/ ; https://www.alerc.org.uk/uploads/7/6/3/3/7633190/an_introduction_to_open_shared_and_closed_data.pdf ; https://creativecommons.org/share-your-work/cclicenses/
Related Term: Closed access; Shared access
Short Definition: A low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.
Reference: oai.org
Synonym: OAI-PMH
Short Definition: Structured data that are accessible, machine-readable, usable, intelligible, and freely shared. Open data can be freely used, re-used, built on, and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Science as an Open Enterprise (SOE)as quoted by TC3+; Open Data 101 (Government of Canada); UNESCO Open Access Policy Guidelines ; Government of Canada http://open.canada.ca/en/open-data-principles#toc94
Short Definition: A governing culture that holds that the public has the right to access the documents and proceedings of government to allow for greater openness, accountability, and engagement.
Short Definition: Ongoing organizational activities associated with supporting functional elements, as opposed to project elements. Operational management also includes support of products that the organization has created through project activity.
Reference: Project Management Institute (2006) The Standard for Program Management.
Short Definition: Organizational leadership is: (a) The ability to attract, assess, mobilize and focus energies and talent to work towards a shared purpose aligned with the mandate of the organization; (b) The ability to change culture, processes and priorities within the organization; and, (c) The ability to mentor.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Short Definition: A type of repository where the original copy of data was stored and probably a data identifier registered.
Short Definition: A process by which a scholarly work (such as a paper or a research proposal) is checked by a group of experts in the same field to make sure it meets the necessary standards before it is published or accepted
Reference: Merriam-Webster dictionary
Short Definition: 1. In the context of a perpetual software license, it is a type of software license that authorizes an individual to use a program indefinitely; 2. In the context of records management, perpetual use means that the record must be retained in the active records area indefinitely.
Short Definition: A persistent identifier is a long-lasting reference to a digital object that gives information about that object regardless what happens to it. Developed to address "link rot," a persistent identifier can be resolved to provide an appropriate representation of an object whether that objects changes its online location or goes offline.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Australian National Data Service
Synonym: PID
Short Definition: This is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client.
Reference: MIT data management and publishing
Synonym: PURL
Short Definition: The World Wide Web Consortium's Platform for Personal Privacy Project (P3P) offers specific recommendations for practices that will let users define and share personal information with Web sites that they agree to share it with. The P3P incorporates a number of industry proposals, including the Open Profiling Standard (OPS). Using software that adheres to the P3P recommendations, users will be able to create a personal profile, all or parts of which can be made accessible to a Web site as the user directs. A tool that will help a user decide whether to trust a given Website with personal information is a Statement of Privacy Policy that a Web site can post.
Short Definition: Information [data] about an identifiable individual that is recorded in any form.
Long Definition: 1. Data which relate to a living individual who can be identified (a) from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller, and includes any expression of opinion about the individual and any indication of the intentions of the data controller or any other person in respect of the individual. 2. Any data that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de-anonymizing anonymous data can be considered personally identifiable data. 3. Data are identifiable if the information contains the name of an individual, or other identifying items such as birth date, address or geocoding. Data will be identifiable if the information contains a unique personal identifier and the holder of the information also has the master list linking the identifiers to individuals. Data may also be identifiable because of the number of different pieces of information known about a particular individual. It may also be possible to ascertain the identity of individuals from aggregated data where there are very few individuals in a particular category. Identifiability is dependent on the amount of information held and also on the skills and technology of the holder.
Reference: European Commission. (1999). Opinion of the European Group on Ethics in Science and New Technologies Ethical Issues of Healthcare in the Information Society, No. 13, 30 July 1999; OECD (The Organisation for Economic Co-operation and Development). (2013). Strengthening Health Information Infrastructure for Health Care Quality Governance: Good Practices, New Opportunities and Data Privacy Protection Challenges. Paris, France: OECD Publishing.
Synonym: Personal information; Personal data
Related Term: Confidential
Short Definition: The application of comprehensive scientific and professional knowledge to the following applied sciences: physics, planetary, and earth sciences.
Short Definition: A single data element related to a PID and part of its record content.
Short Definition: For a single identifier, the class of entity it refers to. For a PID system, the typical class of entities it is intended to be used for. Examples include: digital objects, physical objects, bodies, actors.
Short Definition: A type of record (and organization) that stores an instance of an executable/understandable PID. The content of a PID record distinguishes a registered digital or data object from other DOs. A PID record is a type of record that includes property information that characterizes the digital object it is identifying. Important parts of a PID record are location and checksum. However there is a large variation in usage. In some data models the PID is simply used as a unique label with an empty record. A PID record has a lifecycle including creation, publication, Curation and the destruction.
Short Definition: The process of resolving a PID to a useful state of information about a digital object by using a globally available system.
Short Definition: A service that provides a connection between a PID and its target object.
Short Definition: Consists of at least one PID resolver, a name schema and a defined mechanism for issuing PIDs that conform to the name schema. Examples include: DOI, Handle System, URN, ARK, PURL, etc.
Short Definition: A file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a pipe (
Short Definition: Preliminary version of an article that has not undergone review but that may be shared for comment. Preprints may be considered as grey literature.
Synonym: Working paper
Short Definition: An activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology.
Reference: JISC/TC3+; TBS Information Management Glossary (National Archives of Canada Preservation Policy)
Synonym: Conservation
Short Definition: Documents actions that have been undertaken to preserve a digital resource such as migrations and checks sum calculations. Example: Metadata Encoding and Transmission Standard (METS)
Reference: DCC/TC3+
Short Definition: The Principal Investigator (P.I.) is a researcher who has a research leadership role and is the point of contact for a project or partnership that applies the scientific method, historical method, or other research methodology for the advancement of knowledge resulting in independent, objective, high quality, traceable, and reproducible results. The P.I. has primary responsibility for the intellectual direction and integrity of the research or research-related activity, including data production, findings and results, and ensures ethical conduct in all aspects of the research process including but not limited to the treatment of human and animal subjects, conflicts of interest, data acquisition, sharing and ownership, publication practices, responsible authorship, and collaborative research and reporting. While various tasks may be delegated to team members, some of whom may have greater expertise in specific areas, the P.I. is familiar with the various technical and scientific aspects of a project and how they fit together, is able to identify and remediate gaps, and ensure communication within the team and with users of the research data and results. The project may be very small involving only a few people (or even only one person - the P.I.), or extremely large involving many groups and multiple P.I.s and/or co-P.I.s. Depending on the type of organization (e.g., university, industry, institute, laboratory, government program, etc.) the role of the P.I., how that role fits into the organizational structure, and how it relates to roles within and outside of the organization can vary.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; NARSTO ; U.S. Government
Synonym: PI.
Related Term: Researcher; Research manager; Project manager; Program manager; Laboratory manager; Manager
Short Definition: 1. In the context of the Internet: Most Internet users want that personal information they share will not be shared with anyone else without their permission. Privacy can be divided into the following concerns: (a) What personal information can be shared with whom; (b) Whether messages can be exchanged without anyone else seeing them; (c) Whether and how one can send messages anonymously; and, (d) Personal Information Privacy. 2. In a legal context: A broad concept that (in Canadian law) encompasses personal privacy (protection of ones physical self), territorial privacy (protection of ones private physical space), and informational privacy (protection of information about oneself and ones activities).
Long Definition: [[Extended definition::
Reference: Supreme Court of Canada (2004) R. v. Tessling, [2004] 3 S.C.R. 432, 2004 SCC 67, Ottawa ON.]]
Short Definition: Monitoring the risk to privacy posed by data requests from researchers, and the practices of data custodians in providing data (information governance) to ensure that confidentiality is protected. Such governance requires specialized knowledge of technology, law, and statistical methods.
Synonym: Information governance; Governance and accountability mode
Short Definition: Data linkage where the resulting product has been de-identified.
Synonym: De-identification
Short Definition: A set of interrelated actions and activities performed to achieve a specified set of products, results, or services.
Reference: Project Management Institute (2006) The Standard for Program Management.
Short Definition: In the context of a researcher's activities, productivity is the generation of outputs (also called contributions) being produced by a researcher, in accordance with the rate consistent with the specialty or type of work. Productivity is one of four valued outcomes. In general, outputs may include, for example: peer-reviewed publications, scientific products, science advice, research proposals, internal scientific reports, datasets, patents, technology transfers, reviews, books and chapters, expert panels; involvement in advisory committees, policy development, collaborative research and development projects, public outreach, peer-reviewed journals. Outputs may be individual or team contributions. Productivity is evidence of innovation, impact and recognition.
Long Definition: In a 5-level incumbent-based process, demonstrated valued outcomes of productivity in research, development & analysis (RDA) include: (1) Peer-reviewed research contributions; (2) Contributions to a research and development (R&D) project within the area of specialty; (3) Multi-year contributions of breadth and/or depth in the area of specialty that are relevant to results at program level; Contributions to the development of results which feed the discussions at national bodies leading to the establishment of standards, regulations, agreements, policies, services, etc.; (4) Multi-year contributions in broad, multi-disciplinary areas or increasing depth within the areas of specialty that are relevant to results at a program level; Contributions to the development of results which feed the discussions at national or international bodies leading to the establishment of standards, regulations, agreements, programs, policies, services, etc.; and, (5) Multi-year contributions in multiple/broad areas or in depth that are relevant to strategic results or public good; Contributions leading the discussions at national and international bodies to establish standards, regulations, agreements, programs, policies, services, etc. Demonstrated valued outcomes of productivity in managing research include: (1) not applicable; (2) Contributions to leading a team project; (3) Contributions relevant to supporting the management of a program (e.g., managing/leading teams; increased number or scope of projects led; increased resources; partnerships or collaborative work); (4) Contributions relevant to supporting the management of national or international programs (e.g., built national or international research teams; collaborated across programs, with other national or international institutes); and, (5) National and international contributions relevant to supporting the management of programs. Demonstrated valued outcomes of productivity in representation and client services include: (1) not applicable; (2) Contributions to respond to clients requesting technical input or information on research project in area of specialty; (3) Contributions to respond to clients who sought their support to solve problems in area of specialty; Contributions to regional or national standard, advisory or research bodies related to area of specialty; (4) Contributions to respond to clients who sought their support to solve problems in broad areas of specialty; Contributions to national and/or international standard, advisory or research bodies related to broad areas of specialties; and, (5) Same as level 4.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Valued outcome; Incumbent-based; Research, development and analysis; Managing research; Representation and client services; Outputs
Short Definition: Ethical or legal duty of a professional to exercise the level of care, diligence, and skill prescribed in the code of practice of his or her profession, or as other professionals in the same discipline would in the same or similar circumstances.
Synonym: Professional standard of care
Short Definition: A group of related projects managed in a coordinated way to obtain benefits and control not available from managing them individually. Programs may include elements of related work (e.g., ongoing operations) outside the scope of the discrete projects in a program.
Reference: Project Management Institute (2006) The Standard for Program Management; Cornell Project Management Methodology
Short Definition: The process of developing, communicating, implementing, monitoring, and assuring the policies, procedures, organizational structures, and practices associated with a given program.
Reference: Project Management Institute (2006) The Standard for Program Management.
Short Definition: The person responsible for creating the organizational environment culture by providing clear direction and circumstances that allow people to be successful. The program manager is judged on the elements time, cost, and scope, cumulatively for all the projects and operations within the program. Program management decisions are both tactical and strategic in nature. The strategy aspects of these decisions must consider multidimensional impacts beyond the near-term delivery dates of the project. In addition to delivery and execution, the program manager has to also be concerned with the overall health and effectiveness of the program over the long term. The program manager may have to accept calculated risk when he or she is unable to obtain clarity from the organization and then define clarity in his or her own terms. Accepting chaos, allowing chaos to exist, or passing down chaos all signal a lack of integrity and this does not create a culture conducive to successful projects.
Reference: Brown (2008) Handbook of Program Management;
Synonym: Program leader.
Related Term: Manager; Research manager
Short Definition: A set of activities required to produce certain defined outputs, or to accomplish specific goals or objectives within a defined schedule and resource budget. A project exists only for the duration of time required to complete its stated objectives.
Reference: Termium Plus
Short Definition: Describes the processes and tasks that must be completed to produce a product or service. Different project lifecycles exist for specific products and services. (For example, the lifecycle followed to build a house is very different from the lifecycle followed to develop a software package.)
Reference: Cornell Project Management Methodology
Short Definition: Defines how to manage a project. It will always be the same, regardless of the project lifecycle being employed.
Reference: Cornell Project Management Methodology
Short Definition: The person who is tasked with delivering a project within the boundaries and framework established by the program manager. The project manager is and should be delivery and execution focused and is judged on the elements of time, cost, and scope of the project. The person responsible for ensuring that the Project Team completes the project. The Project Manager develops the Project Plan with the team and manages the teams performance of project tasks. It is also the responsibility of the Project Manager to secure acceptance and approval of deliverables from the Project Sponsor and Stakeholders. The Project Manager is responsible for communication, including status reporting, risk management, escalation of issues that cannot be resolved in the team, and, in general, making sure the project is delivered in budget, on schedule, and within scope.
Reference: Project Management Institute (2006) The Standard for Program Management.
Synonym: Project leader
Short Definition: The Principal Investigator and the project team work together to inspect the accomplished work to ensure its alignment with the project scope, data fitness for use, and data end-user needs.
Short Definition: Responsible for executing tasks and producing deliverables as outlined in the Project Plan and directed by the Project Manager, at whatever level of effort or participation has been defined for them.
Reference: Cornell Project Management Methodology
Short Definition: Keeping the procedural mechanisms that researchers and data custodians must follow when engaged in data sharing and linkage proportional to the degree of risks associated with such practices. Proportionate governance operates in situations that are too variable to be regulated by hard laws (e.g., custom data access requests). It requires that analytical judgments be performed to ensure that the governance mechanisms deployed for a given research proposal correspond to the level of risk it entails. Proportionality is an important cross-cutting consideration across all types of governance that are put in place.
Reference: Sethi & Laurie (2013)
Short Definition: Refers to information (or other property) that is owned by an individual or organization and for which the use is restricted by that individual or organization.
Short Definition: Protected information is not classified. Information is "protected" when unauthorized disclosure could reasonably be expected to cause injury to a non-national interest (i.e., an individual interest such as a person or an organization). Protected information is any sensitive information that does not relate to national security and cannot be disclosed under access and privacy legislation because of the potential injury to particular public or private interests. Protection levels: Protected A (Injury to an individual, organization or government), Protected B (Serious injury to an individual, organization or government), Protected C (Extremely grave injury to an individual, organization or government).
Reference: https://www.tpsgc-pwgsc.gc.ca/esc-src/documents/levels-of-security.pdf
Related Term: Classification level
Short Definition: The special set of rules that regulate how components within a system are interacting. Protocols are crucial parts of interface specifications. They do not only specify message content, but also procedural aspects.
Short Definition: A type of historical information or metadata about the origin, location or the source of something, or the history of the ownership or location of an object or resource including digital objects. For example, information about the Principal Investigator who recorded the data, and the information concerning its storage, handling, and migration.
Short Definition: Information concerning the creation, attribution, or version history of managed data. Provenance metadata that indicates the relationship between two versions of data objects and is generated whenever a new version of a dataset is created. Examples include: (i) the name of the program that generated the new version, (ii) the commit id of the program in a code version control system like GitHub, (iii) the identifiers of any other datasets or data objects that may have been used in creating the new version. Provenance information is gathered along the data lifecycle as part of curation processes. A finer level of provenance metadata would be concerned only with data flowing between various stores such as curated databases and managed repositories. Provenance metadata is designed to allow queries over the relationship between versions, and includes either or both fine-grained and coarse-grained provenance data. Different applications may store different provenance data.
Short Definition: The process or set of processes used to measure and assure the quality of a product.
Synonym: QA
Short Definition: The process of meeting products and services to consumer expectations.
Synonym: QC
Short Definition: Data that have not been processed for meaningful use. Although raw data have the potential to become "information," they require selective extraction, organization, and sometimes analysis and formatting for presentation. As a result of processing, raw data sometimes end up in a database, which enables the data to become accessible for further processing and analysis in a number of different ways.
Synonym: Atomic data; Source data.
Related Term: Instrument output data
Short Definition: Use of content outside of its original intention.
Short Definition: Data that are being received, processed and stored at the time of their occurrence with only small delays. Examples include: stock quotes, manufacturing statistics, Web server loads, data warehouse activity and sensor feeds to data collectors. Real-time data are often used for navigation or tracking. Real-time data are data streams that are typically generated by sensors and received via direct networking connections.
Short Definition: In the context of a researcher's activities, recognition is a measure of credibility and stature of a researcher within the scientific community, and with clients and stakeholders, in accordance with the specialty or type of work. Recognition is one of four valued outcomes.
Long Definition: In a 5-level incumbent-based process, demonstrated valued outcomes of recognition in research, development & analysis (RDA) include: (1) Recognized by colleagues in area of specialty; Consulted by team members; (2) Recognized locally or regionally in area of specialty and for contributions to results at project level (e.g., consulted by team members and by local or regional stakeholders in a restricted specialty related to research findings; internal reviewer of publications; membership in scientific societies; requested to review internal research proposals); (3) Recognized nationally as expert in area of specialty and for contributions to results at a program level (e.g., consulted nationally on implications of research findings in area of specialty; collaboration across programs; consulted on policy development in area of specialty; invited to give academic lectures or courses, or supervise graduate students; invited to present papers at national level event; held office in scientific societies; external reviewer of journal publications; requested to review external research proposals; (4) Recognized nationally and/or internationally as an authority in area of specialty and for contributions to results (e.g., recipient of national or international award; consulted widely by stakeholders to help solve problems and make decisions; collaboration across programs; sought as research mentor; invited to give academic lectures or courses, or supervise graduate students; invited to present papers at national and/or international lectures, reviews or conferences; external reviewer of publications in prestigious journals; held office in national scientific societies; reviews research proposals on behalf of external funding agencies; invited to chair national scientific committees and/or lead national research endeavours; multi-year contributions in broad, multi-disciplinary areas or increasing depth within the areas of specialty that are relevant to sectoral or departmental results at program level); and, (5) Recognized nationally and internationally as an authority in a broad area and for contributions to strategic outcomes (e.g., recipient of national and international awards; consulted widely by stakeholders, executive level decision makers and research community on a broad area; sought as research mentor; invited to give academic lectures or courses, or supervise graduate students; invited as keynote speaker at international events; invited to present papers at national and international lectures, reviews, or events; held editorial board or similar positions; held executive office in national or international scientific societies; reviews research proposals and participates on expert and site-visit committees on behalf of external funding agencies; invited to chair national and international scientific committees and/or lead international research endeavors.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Valued outcome; Incumbent-based; Research, development and analysis; Managing research; Representation and client services.
Short Definition: 1. A collection of data items arranged for processing by a program. Multiple records are contained in a file or dataset. Typically, records can be of fixed-length or be of variable length with the length information contained within the record. 2. A record (sometimes called a row) is a group of fields (sometimes called columns) within a table that are relevant to a specific entity. For example, in a table called client contact information, a row would likely contain fields such as: ID number, name, street address, city, telephone number, etc. 3. Any documentary material other than a publication, regardless of medium or form.
Short Definition: Information for a data object that includes: * the person who deposited the data object in the repository, * the source of the data object, * the date when the object was deposited, and * authenticity information needed to link the data object to its original source.
Short Definition: A process in which files are first parsed (assigned to appropriate fields in a record) and then translated to a common format. For example, if an original record had the client's name and address as "Bob Jones, VP Acme. Co., 15 S. Main St, Brooklyn" the standardized record might read "Bob Jones, Vice President, Acme Corporation, 15 South Main Street, Brooklyn, New York". Data often lack consistency simply because there are a many of ways of saying the same thing. Standardizing the record ensures that when a query is run for a particular field, accurate results will be returned.
Short Definition: 1. A system design in which a component is duplicated so if it fails there will be a backup. 2. When there is duplication that is unnecessary or that is the result of poor planning.
Short Definition: A type of data (digital or not) that is persistently stored and which is referred to by a persistent identifier Digital data may be accessed by the identifier. Some data object references may access a service on the object.
Short Definition: A design covering a class of frameworks with the following characteristics: (1) it can be used to generate more specific models that still belong to the class and (2) it can be used to compare a concrete framework design to identify whether it belongs to the same class.
Short Definition: The process of resolving a reference to useful information by using a globally available system.
Reference: Digital preservation coalition
Short Definition: Copying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file reformatting).
Reference: Digital preservation coalition
Short Definition: Copying information content from one storage media to the same storage media.
Short Definition: A standard developed or adopted and promulgated by a regional organization [e.g., European Committee for Standardization (CEN); Pan American Standards Commission (COPANT)]. Regional standards are generally voluntary in nature, representing the joint action of the national standards bodies of a regional group of nations.]]
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: Data that have gone through a registration process and have been assigned an identifier metadata to aid in their search and retrieval.
Short Definition: A database containing information about trusted repositories that are provided by the repository managers and are useful for human and machine users. It is a registry information system on which a register is maintained. These registries do not contain information about all metadata descriptions of digital objects, nor do they offer a list of PIDs of all stored digital objects. They do offer information based on standardized types on how to retrieve such information (e.g., the port under which OAI-PMH can be accessed to offer metadata). It is a set of files containing identifiers assigned to items with descriptions of the associated items. It is assignment of a permanent, unique and unambiguous identifier to an item.
Reference: Government of Canada ""Annual Science and Technology Data Publication""
Short Definition: A collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports. In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified. A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product, customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid. When creating a relational database, the domain of possible values in a data column can be defined as well as further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable.
Short Definition: Indicates how the different components within a system are "linked" to fulfill the tasks. "Relations" are thus defined by the services they are making use of and by the interface specifications.
Short Definition: The probability of a given system performing its mission adequately for a specified period of time under the expected operating conditions.
Short Definition: The ability to get access to a computer or a network from a remote distance. Access may be through an Internet service provider (ISP) or through a dedicated line between a computer or a remote local area network and the "central" or main corporate local area network. A dedicated line is more expensive and less flexible but offers faster data rates.
Short Definition: The ability to access and download data from a repository.
Synonym: Remote access
Short Definition: In the context of metadata application profiles, indicates whether a metadata element can be applied only once or more than once when describing a single resource.
Short Definition: A set of actions that allow for a more efficient use of limited resources and reduce unwanted variation during the development and implementation of various projects. Repeatable processes allow a project team to make efficient use of project components that have proved to be successful in the past and reduce unnecessary variations that can tie up time, effort and budget. Key steps in developing repeatable processes include: (a) Understanding the organization's goals, objectives and stakeholders; (b) Documenting the current business processes; (c) Discovering significant process gaps and variations; (d) Applying insight toward the development of repeatable processes that align with the organization's goals and objectives.
Short Definition: A type of metadata used as part of a replication process or access.
Short Definition: 1. In data management context: The generation of a copy of a data object that is referenced by the same name, but with a different replica number. When changes are made to the data object, the replica can be updated to track the changes. AKA, duplication. As part of replication data may be given a PID for a repository. Enhanced metadata may be stored in a repository as part of replication. A PID should allow replicated from different communities to be identified as such. Related term replica number, PID, repository. 2. In the context of research: The evaluation of scientific claims by independent investigators using independent methods, data, equipment, and protocols (Peng, 2011). 3. In the context of measurement: Repeated measurement of the same object.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; ACTI-DM Working Group/Educause; TBS Standard for Electronic Documents and Records Management Solutions. Vocabulary for the Registration and Description of Research Data Repositories http://gfzpublic.gfz-potsdam.de/pubman/item/escidoc:76875/component/escidoc:76874/re3data_vocabulary_v2-0.pdf; Peng (2011).
Short Definition: Repositories preserve, manage, and provide access to many types of digital materials in a variety of formats. Materials in online repositories are curated to enable search, discovery, and reuse. There must be sufficient control for the digital material to be authentic, reliable, accessible and usable on a continuing basis.
Short Definition: In accessing a repository one uses a client (application) to discover relevant digital objects within a repository, and then retrieve a copy of a desired digital object.
Short Definition: A resource that conveys either the content of a resource (if it is a digital object instance), or provides a digital object that conveys the intention of the resource in a form useful to a user (machine or human).
Short Definition: In the context of a researcher's activities, Representation is the process of representing and speaking at local, national, and international fora. Client service is the process of interaction for facilitation of the knowledge/information transfer to clients. Areas where the representation and client services context is quite significant are: technology transfer and industrial liaison; scientific liaison with government organizations, academia, clients and stakeholders; scientific staff and advisory positions; contracting out. Within this context, collaboration, partnering and RDA support to business development activities also have significance. Representation and client services is one of the three research contexts in which a researcher is expected to conduct his/her activities.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Research context
Short Definition: Provides some context for a data object. It contains provenance, description (e.g. format, encoding scheme, algorithm, structural, and administrative information about the object. This is a form of metadata.
Reference: Peng (2011)
Short Definition: Reproducible data and code means that the final data and code are computationally reproducible within some tolerance interval or defined limits of precision and accuracy, i.e. a 3rd party will be able to verify the data lineage and processing, reanalyze the data and obtain consistent computational results using the same input raw data, computational steps, methods, computer software & code, and conditions of analysis in order to determine if the same result emerges from the reprocessing and reanalysis. "Same result" can mean different things in different contexts: identical measures in a fully deterministic context, the same numeric results but differing in some irrelevant detail, statistically similar results in a non-deterministic context, or validation of a hypothesis. All data and code are made available for 3rd-party verification of reproducibility. Note that reproducibility is a different concept from replicability. In the latter case, the final published data are linked to sufficiently detailed methods and information for a 3rd-party to be able to verify the results based on the independent collection of new raw data using similar or different methods but leading to comparable results.
Reference: NASEM (2019) Reproducibility and Replicability in Science; Buckheit and Donohue 1995; Donohue 2010; Peng 2011; Gandrud 2013; George 2015.
Short Definition: New datasets obtained by combining data appropriately from a variety of existing files, generating new data products that did not previously exist. Repurposed data result from data wrangling.
Synonym: Data wrangling
Short Definition: Features of a program, system, dataset, or product that are quantifiable, detailed, and relevant to the specified end use.
Synonym: Features
Short Definition: The process of determining user expectations for a program, system, dataset, or product. Requirements analysis is a team effort that must take into account hardware, software, end use, and human factors engineering expertise. Requirements analysis also requires skills in dealing with people. Requirements analysis involves frequent communication with end users to determine specific feature expectations, resolution of conflict or ambiguity in requirements as demanded by the various users or groups of users, avoidance of feature creep and documentation of all aspects of the project development process from start to finish. Energy should be directed towards ensuring that the final system or product conforms to client needs rather than attempting to mold user expectations to fit the requirements.
Synonym: Requirements engineering
Short Definition: A tendency for requirements to increase during development beyond those originally foreseen. Requirements creep may be driven by a deeper understanding of the system as the project progresses leading to a re-evaluation of the requirements analysis.
Synonym: Feature creep; Scope creep
Short Definition: A metric used to organize, control, and track changes to the originally specified requirements for a new system, project or product.
Short Definition: A systematic investigation to establish facts, including the input data, the code, and the full software environment that produced the research results.
Reference: Government of Canada ""Annual Science and Technology Data Publication""
Short Definition: Creative work undertaken on a systematic basis to increase the stock of knowledge, including knowledge of humankind, culture and society, and the use of this stock of knowledge to devise new applications.
Synonym: R&D.
Related Term: Related scientific activities; Research, development and analysis
Short Definition: There are three contexts of research work in which a researcher is expected to conduct his/her activities: (1) Research, development and analysis (RDA); (2) Managing research; and, (3) Representation and client services. A researcher's primary area of work is RDA.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Incumbent-based, Research, development and analysis; Managing research; Representation and client services
Short Definition: Data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: Acceptable formats for transmitting and sharing different types of research data include: (a) Quantitative tabular data with minimal metadata, i.e. a dataset with or without attribute labels but no other metadata in addition to the data matrix; (b) Quantitative tabular data with extensive metadata, i.e. a dataset with attribute labels, code labels, defined missing values, and attribute definitions in addition to the data matrix: Structured data from a database such as MySQL, PostGres; (c) Geospatial data, i.e. vector and taster data: .shp, .tif/geotiff, Keyhole Mark-up Language, .kml; (d) Qualitative data, i.e. textual: ASCII plain text (.txt); (e) Gridded data: Network Common Data Form (NetCDF, .nc, .cdf); (e) Digital image data: Joint Photographic Experts Group (.jpg, .jpeg), TIFF (.tif), Adobe Portablel Document Format (.pdf); (f) Digital audio data: MPEG-1 Audio Layer 3 (.mp3), Waveform Audio Format (.wav); (g) Digital video data: Moving Picture Experts Group, Standard definition (.mp4, AVC/H.264), High definition (AVCHD/H.264); and, (h) Documentation, publications, and scripts: ASCII plain text (.txt).
Short Definition: Data Management refers to the storage, access and preservation of data produced from a given investigation. Data management practices cover the entire lifecycle of the data, from planning the investigation to conducting it, and from backing up data as it is created and used to long term preservation of data deliverables after the research investigation has concluded. Specific activities and issues that fall within the category of data management include: File naming (the proper way to name computer files); data quality control and quality assurance; data access; data documentation (including levels of uncertainty); metadata creation and controlled vocabularies; data storage; data archiving and preservation; data sharing and reuse; data integrity; data security; data privacy; data rights; notebook protocols (lab or field).
Reference: Chuck Humphrey Blog/TC3+ ; US Geological Survey (2015) National climate change & wildlife science center & climate science centers data management manual.
Synonym: Data stewardship
Short Definition: The configuration of staff, services and tools assembled to support data management across the research lifecycle and more specifically to provide comprehensive coverage of the stages making up the data lifecycle. It can be organized locally and/or globally to support research data activities across the research lifecycle.
Short Definition: Activities and processes in a digital environment that lead to the publication of research data, associated metadata and accompanying documentation and software code on the Web. In contrast to interim or final published products, workflows are the means to curate, document, and review, and thus ensure and enhance the value of the published product. Workflows can involve both humans and machines and often humans are supported by technology as they perform steps in the workflow. Similar workflows may vary in the details depending on the research discipline, data publishing product and/or the host institution of the workflow (e.g., individual publisher/journal, institutional repository, discipline-specific repository)
Reference: Bloom T, Dallmeier-Tiessen* S, Murphy* F, Austin CC, Whyte A, Tedds J, Nurnberger A, Raymond L, Stockhause M, Vardigan M (2015 Preprint). Workflows for Research Data Publishing: Models and Key Components. International Journal on Digital Libraries, Research Data Publishing Special Issue. 27 pages, June 30, 2015.
Short Definition: Ensures that the benefits to society of research outweigh any risks, from both an ethical and legal perspective.
Short Definition: The person who manages or coordinates resources, personnel, facilities, and operating funds-allocations in an organization conducting research, development and analysis (RDA) in the natural and physical sciences. A research manager determines the nature, priority objectives and the resources committed to their achievement within and across the organizations, and evaluates program outputs in relation to organizational objectives and policies. A research manager provides scientific advice on the direction, conduct and management of these programs. A research manager does not personally conduct research development and analysis (RDA), control and coordinate projects, or control and coordinate contracted RDA.
Synonym: REM.
Related Term: Project manager; Principal Investigator
Short Definition: Acceptable formats for transmitting and sharing research metadata include: ISO 19115-2:2009
Short Definition: Research results are the journal articles, reports, books, slideshows, or websites that announce the projects findings and try to convince us that the results are correct.
Short Definition: A scientist who conducts activities in: (1) Research, development and analysis (RDA); (2) Managing research; and, (3) Representation and client services.
Synonym: RES.
Related Term: Scientist; Research context
Short Definition: In the context of a researcher's activities, "Research, development and analysis (RDA)" is the systematic investigative process of inquiry, including development, testing and analysis in order to discover, interpret or analyze facts, events, or behaviours, to develop and revise theories, or to make practical applications with the help of such facts, laws or theories designed to develop or contribute to knowledge. RDA is one of the three research contexts in which a researcher is expected to conduct his/her activities.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: RDA
Related Term: Research context; Scientific method; Research and development
Short Definition: A researcher's level is incumbent-based. It may described numerically (e.g., Level 1, Level 2, ..., Level 5), or with a descriptive title (e.g., Lecturer or Adjunct Professor/Professeur associe; Clinical or Research Professor/Professeur clinique; Assistant Professor/Professeur adjoint; Associate Professor/Professeur agrege; Full Professor/Professeur titulaire). The levels can generally described as: (1) A researcher, usually a recent Ph.D. graduate with little or no experience, who has made some expert-reviewed contributions and has sufficient experience to contribute to valued outcomes; (2) A researcher who is recognized by peers as knowledgeable in an area of specialty, either has worked in and led a small project team of scientific/technical personnel or carried out individual in-depth inquiries to support the delivery of valued outcomes; (3) A researcher who is recognized by peers as a national expert in an area of specialty, and has led a team of scientific and technical personnel or carried out in-depth inquiries to successfully deliver on the immediate, or contribute to intermediate and long-term, research goals in a specialty; (4) A researcher who is recognized as an authority in broad areas of specialty and who has strategically conceptualized the course of research activity leading to the achievement of the intermediate, and contributing to the long-term, research goals in a specialty; and, (5) A researcher who is recognized as an authority and visionary in broad areas of specialty who has strategically integrated leading edge scientific and technical objectives into programs, having long-term impact on the future directions of research in a specialty.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Incumbent-based
Short Definition: The documentation submitted by a researcher when applying for promotion to a higher level, or for tenure at a University. Depending on the institution, the documentation may or may not be accompanied by a portfolio of complete, full length research outputs.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Career advancement documentation; Promotion documentation.
Related Term: Tenure portfolio; Tenure dossier; Researcher level; Research context; Research, development and analysis; Research and development; Managing research; Representation and client services; Valued outcome; Innovation; Productivity; Impact; Recognition; Principal Investigator
Short Definition: Resistance management, a component of change management, aims to minimize or eliminate resistance to change. People in an organization may resist change for a number of reasons, including: (a) People comfortable in their current situation may be reluctant to risk that security; (b) People may concentrate on perceived negative outcomes of the change; (c) People may overestimate the value of what they have and underestimate the value of what they may gain by giving up what they have; (d) People may fear losing power, status, control, money, work, or work groups.
Short Definition: A source or supply that can be drawn on to support or fulfill a specific need or to handle a situation. Example: Information is a resource that supports and enables delivery, fulfills inquiry requests, and adds value to other products and services. Information is a strategic resource when it is recognized and managed as a valuable asset, independent of organizational boundaries, to address immediate needs, exploit opportunities to leverage it for business advantage, and enhance its value through knowledge creation and preservation.
Short Definition: Something that one is required to do as part of a job, role, professional, or legal obligation.
Short Definition: The impact or effect of something (e.g., a program).
Short Definition: In the context of records management: A policy that depicts how long data items must be kept, as well as the disposal guidelines for these data items.
Long Definition: A retention schedule is a policy that defines how long paper and electronic content must be kept and provides disposal guidelines for how those items should be discarded (e.g., destruction, transfer to archives, or alienation). Retention schedules are determined by the record type and the business as well as legal and compliance requirements associated with the data. Retention schedules establish guidelines regarding how long important information must remain accessible for future use or reference as well as when and how the data can be destroyed when it is no longer needed. Retention schedules are established based on data type and ownership as well as aspects such as the data's business value and regulatory compliance mandates. The schedules outline the business reason for retaining specific records and designate what should be done with the data when it's eligible for disposal.
Reference: Techtarget
Synonym: Disposal schedule, records retention schedule, data retention schedule, records schedule, transfer schedule.
Short Definition: In the context of records management, retention period defines the date when retention of the data object should be evaluated. The retention period must have an associated disposition policy for deciding what to do when the retention period expires.
Short Definition: A software implementation of revision control that automates the storing, retrieval, logging, identification, and merging of revisions (e.g., GIT, SVN)
Short Definition: The degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.
Short Definition: A role is the function of a resource or agent with respect to another resource, in the context of resource attribution or resource relationships.
Reference: https://www.w3.org/TR/vocab-dcat-3/#Class:Role
Short Definition: 1. The organization or structure for a database. The activity of data modeling leads to a schema. (The plural form is schemata.) The term is used in discussing both relational databases and object-oriented databases. The term sometimes seems to refer to a visualization of a structure and sometimes to a formal text-oriented description. Two common types of database schemata are the star schema and the snowflake schema. 2. A formal expression of an inference rule for artificial intelligence (AI) computing. The expression is a generalized axiom in which specific values or cases are substituted for each symbol in the axiom to derive a specific inference.
Short Definition: 1. The intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment. 2. A systematically organized body of knowledge on a particular subject. 3. Science is defined broadly to include the natural, health, and social sciences, mathematics, engineering, and technology.
Reference: Oxford dictionnary; Science Advice for Emergency Management in Canada.
Related Term: Scientist
Short Definition: The process, structures and institutions through which governments and decision makers receive and consider science and technology inputs to public policy development.
Reference: Science Advice for Emergency Management in Canada
Short Definition: Qualitative or quantitative attributes of a variable or set of variables. Data refers to representations of physical, biological or chemical facts, typically the results of measurements/observations. It also includes related socio-economic and cultural representations. Data are normally in a structured, tabular, numeric, character, geo-referenced, and/or computer-readable format.
Reference: Government of Canada, Environment Canada data stewardship handbook (draft)
Synonym: Scientific data; Technological data.
Short Definition: What is required to enable researchers to create, store and share the data resulting from their experiments, and to find, access and process the data they need.
Reference: European Commission, Advancing Technologies and Federating Communities/TC3+
Synonym: Science and technology data; Scientific data services
Short Definition: Assist organizations in the capture, storage, curation, long-term preservation, discovery, access, retrieval, aggregation, analysis, and/or visualization of scientific data, as well as in the associated legal frameworks, to support disciplinary and multidisciplinary scientific research.
Synonym: Scientific data infrastructure
Short Definition: Ask the research question, review the relevant scientific literature, design the study, collect the data, analyze and interpret the data, communicate the results.
Short Definition: A set of chained operations. The simplest computerized scientific workflows are scripts that can involve several ingredients such as data, programs, models and other inputs such as human or sensor observations. Workflows produce outputs that may include, for example, visualizations and analytical results. Preserved workflows are important for reproducible research. They simplify complex sequences of activities and enable researchers to automate and track the provenance of the work in workflow execution. Workflow scripts are digital objects.
Short Definition: A person who is studying or has expert knowledge of one or more of the natural or physical sciences.
Reference: Oxford dictionnary
Synonym: Science
Short Definition: Data that are tagged with particular metadata that can be used to derive relationships between data.
Reference: SOE/ TC3+
Short Definition: The ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems. Semantic interoperability is achieved when the information transferred has, in its communicated form, all of the meaning required for the receiving system to interpret it correctly, even when the algorithms used by the receiving system are unknown to the sending system. Syntactic interoperability is a prerequisite to semantic interoperability.
Reference: Wikipedia
Short Definition: Data that have not been organized into a specialized repository, such as a database, but that nevertheless have associated information, such as metadata, that makes them more amenable to processing than raw data. Semi-structured data lie somewhere between structured and unstructured data. They are not organized in a complex manner that makes sophisticated access and analysis possible. However, they may have information associated with them, such as metadata tagging that allows elements contained to be addressed. Example: A Word document is generally considered to be unstructured data. However, metadata tags could be added in the form of keywords and other metadata that represent the document content and make it easier for that document to be found when people search for those terms -- the data are now semi-structured. Nevertheless, the document still lacks the complex organization of a database, so falls short of being fully structured data
Short Definition: Data for which injury that could reasonably be expected as a result of a loss of confidentiality, loss of integrity, or loss of availability.
Long Definition: Data that could reasonably be expected to cause injury as a result of a loss of confidentiality (resulting from unauthorized disclosure), loss of integrity (resulting from unauthorized modification or destruction), or loss of availability (resulting from unauthorized removal or other disruption). Any personal information can be sensitive depending on the context, and information can become sensitive when combined with other information. Information that will generally be considered sensitive and require a higher degree of protection includes health and financial data, ethnic and racial origins, political opinions, genetic and biometric data, an individual’s sex life or sexual orientation, and religious or philosophical beliefs.
Reference: Treasury Board of Canada (Policy on government security) ; https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronic-documents-act-pipeda/pipeda-compliance-help/pipeda-interpretation-bulletins/interpretations_10_sensible/
Synonym: Sensitive information
Short Definition: In the context of reproducible research, a service object is a type of digital object containing executable code, considered as a unit.
Short Definition: Public commitment to a measurable level of performance that clients can expect under normal circumstances.
Reference: GC Policy on Service and Digital - canada.ca
Short Definition: 1. Provision of a specific final output that addresses one or more needs of an intended recipient and contributes to the achievement of an outcome. 2. A function that is being executed on request that delivers certain expected results.
Reference: GC Policy on Service and Digital - canada.ca
Short Definition: Short-term preservation. Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology.
Reference: Digital preservation coalition
Short Definition: A methodology, practice, or prescription that promises miraculous results if followed (e.g., structured programming will rid you of all bugs, as will human sacrifices to the Atlantean god Fugawe. Named either after the Lone Ranger whose silver bullets always brought justice or, alternatively, as the only known antidote to werewolves.
Short Definition: A mnemonic acronym (Specific, Measurable, Attainable, Relevant, Trackable/Time-bound) used in project management, teaching, and performance management giving criteria to guide in the setting of objectives.
Short Definition: Refers to specialization, discipline, field, etc.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Short Definition: Individuals, groups or organizations that have an interest or share in an undertaking or relationship and its outcome - they may be affected by it, impact or influence it, and in some way be accountable for it.
Short Definition: A document that applies collectively to codes, specifications, recommended practices, classifications, test methods, and guides, which have been prepared by a standards developing organization or group, and published in accordance with established procedures.
Short Definition: Detailed, written instructions to achieve uniformity of the performance of a specific function.
Reference: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use
Synonym: SOP
Short Definition: Written methods, instructions, and tools that, when applied in different data collection contexts produce data that are ready to be harmonized or integrated without further manipulation.
Synonym: Data harmonization; Data integration
Short Definition: The process of establishing by common agreement the criteria, terms, principles, practices, materials, items, processes, equipment, parts, sub-assemblies, and assemblies appropriate to achieve the greatest practicable uniformity of products and practices, to ensure the minimum feasible variety of such items and practices, and to effect optimum interchangeability or interoperability of equipment, parts, and components.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: The application of a set of data transformation techniques to de-identify data in such a manner that the resulting transformed fields retain a very high analytic value..
Reference: El Emam, K. (2013). Privacy Analytics White Paper: Overview of Re-identification Risk Assessment and Anonymization Process. Ottawa (ON): Privacy Analytics, Inc.
Short Definition: The group responsible for ensuring program goals are achieved and providing support to address program risks and issues.
Reference: Project Management Institute (2006) The Standard for Program Management.
Synonym: Governance Board; Program Board
Short Definition: A user ownership access-right flag that can be assigned to digital objects such as directories. When the sticky bit flag is set, files added to the directory will inherit the access permissions associated with the directory.
Short Definition: A physical storage location where a data object will be stored upon ingestion into a data repository. This requires identifying the IP address and the physical path name within the storage location where a data object will be stored. The sequence of these chained activities is conceptualized as a workflow object. For retrieval, the data object location is specified by the storage location and the physical path name.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; Rajasekar, R., M. Wan, R. Moore, W. Schroeder, S.-Y. Chen, L. Gilbert, C.-Y. Hou, C. Lee, R. Marciano, P. Tooby, A. de Torcy, B. Zhu, "iRODS Primer: Integrated Rule-Oriented Data System", Morgan & Claypool, 2010.
Short Definition: A high level plan of action or policy designed for a long-range or major aim.
Short Definition: A type of metadata that indicates how compound objects are put together (e.g., how pages are ordered to form chapters; how data are organized in a table; how datasets are organized in a collection) 2. The underlying structural metadata of digital objects that tells computers how to assemble them.
Short Definition: Data whose elements have been organized into a consistent format and data structure within a defined data model such that the elements can be easily addressed, organized and accessed in various combinations to make better use of the information, such as in a relational database.
Reference: Kitchin, R. (2014) The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Los Angeles (CA): SAGE Publications
Synonym: Structured information
Short Definition: A type of service that provides technical support and assistance to help solve problems related to technical products, including data access, data discovery, data integration and other data management support. Various support services are provided at different phases of the data lifecycle to help manage data and other things used as part of research.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://en.wikipedia.org/wiki/Data_center_services#Support_services
Short Definition: A mnemonic acronym (Strengths, Weaknesses, Opportunities and Threats) used in structured planning.
Short Definition: Syntactic interoperability defines the structure or format of data exchange and is achieved through tools such as XML or SQL Standards.
Reference: Wikipedia; HIMSS (Healthcare information management and systems society)
Short Definition: Synthetic data is information that's been generated on a computer to augment or replace real data to improve AI models, protect sensitive data, and mitigate bias.
Long Definition: Synthetic data is information that's artificially manufactured rather than generated by real-world events. It's created algorithmically and is used as a stand-in for test data sets of production or operational data, to validate mathematical models and to train machine learning (ML) models. While gathering high-quality data from the real world is difficult, expensive and time-consuming, synthetic data technology enables users to quickly, easily and digitally generate the data in whatever amount they desire, customized to their specific needs. The largest application of synthetic data is in the training of neural networks and ML models, as the developers of these models need carefully labeled data sets that could range from a few thousand to tens of millions of items. Synthetic data can be artificially generated to mimic real data sets, enabling creation of a diverse and large amount of training data without spending a lot of money and time.
Reference: https://research.ibm.com/blog/what-is-synthetic-data ; https://www.techtarget.com/searchcio/definition/synthetic-data ;
Short Definition: A combination of interacting elements organized to achieve one or more stated purposes. The system is the aspect that the scientific researcher will interact with. It must be well defined and directly relevant to the research needs, just as is the case for any other scientific instrument. "Systems" may undergo continuous extensions and system elements (components, services) may be the subject of innovation.
Short Definition: Digital entity properties that are generated by the data management system (e.g., creation time; owner; storage location; data retention period; the length of time a digital entity will be retained).
Short Definition: A file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a TAB from the next column's value and each row starts a new line.
Synonym: TSV
Short Definition: 1. An organized grouping of columns (i.e. fields). 2. In a relational database, a table (sometimes called a file) organizes the information about a single topic into rows and columns. The process of normalization determines how data will be most effectively organized into tables. 3. A decision table (often called a truth table) contains a list of decisions and the criteria on which they are based. All possible situations for decisions should be listed, and the action to take in each situation should be specified. A decision table can be inserted into a computer program to direct its processing according to decisions made in different situations. Changes to the decision table are reflected in the program. 4. An HTML table is used to organize Web page elements spatially or to create a structure for data that is best displayed in tabular form, such as lists or specifications.
Short Definition: Describes the technical processes used to produce, or required to use a digital object
Reference: DCC/TC3+
Short Definition: In the context of Data Management Plans (DMP's): Equipment needed/used to create or process data (e.g., a microscope, etc.)
Reference: RDA maDMP common standard
Short Definition: A defined systematic procedure employed by a human resource to perform an activity to produce a product or result or deliver a service, and that may employ one or more tools.
Reference: Project Management Institute (2006) The Standard for Program Management.
Short Definition: 1. The application of scientific knowledge for practical purposes. 2. The branch of knowledge dealing with engineering or applied science.
Short Definition: A copy of a data object such as a file during the course of routine operations.
Short Definition: A file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a character (e.g., pipe).
Synonym: Comma separated values; Character separated values; Pipe separated values; TXT
Short Definition: Something tangible, such as a template or software program, used in performing an activity to produce a product or result.
Reference: Project Management Institute (2006) The Standard for Program Management.
Short Definition: Describes the topic or "aboutness" of an information/data object - what are these data about. In order to make sense to an agent or systems, this may include a variety of vocabularies for describing, subjects, topics, categories, etc.
Short Definition: A comprehensive and structured approach to organizational management that seeks to improve the quality of products and services through ongoing refinements in response to continuous feedback.
Synonym: TQM
Short Definition: Research efforts conducted by investigators from different disciplines working jointly to create new conceptual, theoretical, methodological, and translational innovations that integrate and move beyond discipline-specific approaches to address a common problem. Transdisciplinary research transcends interdisciplinary research.
Short Definition: The change in custody and ownership of government records from a government institution to national archives. Transfer occurs via the application of a valid disposition authority or transfer agreement by the transferring institution.
Reference: Library and Archives Canada
Short Definition: An infrastructure component that provides reliable, long-term access to managed digital resources. It stores, manages, and curates digital objects and returns their bit streams when a request is issued. Trusted repositories undergo regular assessments according to a set of rules such as defined by Data Seal of Approval (DSA) or TRAC (ISO 16363). It is well understood that such an assessment has the potential of increasing trust from its depositors and users, but it will not be the only criterion for users. Repositories can be at different stages of assessments. However, it is evident that certain quality criteria need to be met to distinguish trusted repositories from all types of other entities that store data such as notebooks or lab servers.
Reference: Research Libraries Group/Educause
Synonym: TDR
Short Definition: In the context of artificial intelligence (AI), there are nine principles characteristic of trustworthy AI systems: 'valid & reliable', 'safe', 'secure & resilient', 'explainable & interpretable', 'privacy-enhanced', 'fair & managed bias', 'accountable & transparent', 'human-centred values', and 'inclusive growth, sustainable development, and well-being'. Altogether, 150 properties of trustworthiness have been identified across the first seven principles.
Reference: Newman J (2023). A Taxonomy of Trustworthiness for Artificial Intelligence. University of California, Berkeley. Center for long-term cybersecurity. https://cltc.berkeley.edu/wp-content/uploads/2023/01/Taxonomy_of_AI_Trustworthiness.pdf
Short Definition: A centralized computing system for collecting, integrating and managing large sets of structured and unstructured data from disparate sources.
Short Definition: A string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols.
Reference: MIT data management and publishing
Synonym: URI
Short Definition: An Internet resource with a name that, unlike a URL, has persistent significance - that is, the owner of the URN can expect that someone else (or a program) will always be able to find the resource. A frequent problem in using the Web is that Web content is sometimes moved to a new site or a new page on the same site. Since links are made using Uniform Resource Locators (URLs), they no longer work when content is moved.
Synonym: URN
Short Definition: A unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF.
Synonym: UNF
Short Definition: A 128-bit number used to guarantee unique identity for different objects on the internet over time. File system partitions.
Synonym: UUID
Short Definition: The application of a comprehensive knowledge of a discipline or disciplines to the development of expertise and the generation of new knowledge through research, and the planning and presentation of courses of study for undergraduates and graduates in universities.
Synonym: Researcher
Short Definition: Data that have not been organized into a format and identifiable data structure that makes them easy to access and process. These data can often be searched as long as they are digital, but they are difficult to use for computer analyses.
Reference: Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Los Angeles (CA): SAGE Publications
Synonym: Unstructured information
Short Definition: Data that can be understood and used without additional information. Usable data are delivered in a form that meets the needs of different end-user audiences, is ready for the tasks that the end-user needs to accomplish, and that has been adapted to the end-user's needs (not the other way around). Usable data have been cleaned, structured, are in machine readable format, fully documented, and ready for analysis and interpretation.
Short Definition: A methodology used in system analysis to identify, clarify, and organize system requirements. The use case is made up of a set of possible sequences of interactions between systems and users in a particular environment and related to a particular goal. It consists of a group of elements (e.g.,, classes and interfaces) that can be used together in a way that will have an effect larger than the sum of the separate elements combined. The use case should contain all system activities that have significance to the users. A use case can be thought of as a collection of possible scenarios related to a particular goal, indeed, the use case and goal are sometimes considered to be synonymous.
Short Definition: Manages user access, user tracking, and multi-versioning information.
Short Definition: A phase of development where the product is tested in the "real world" by the intended audience. The experiences of the early users are forwarded back to the developers who make final changes before releasing the product.
Synonym: UAT
Short Definition: In the context of a researcher's activities, there are four types of valued outcomes: Innovation, productivity, impact and recognition. These are the driving forces in a researchers career progression. Even though there are four types of valued outcomes, they are very much linked. For example, the evidence of a scientific researcher's innovation, impact and recognition is in her/his productivity; the recognition may result from the impact and/or the innovation.
Reference: Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researchers Career Progression Evaluation Documentation (Dossier)
Synonym: Innovation; Productivity; Impact; Recognition
Short Definition: Generate a unique reduced representation for a data object by applying a procedure and compare the result to the original reduced representation that has been stored as provenance information. Examples include: a checksum, a hash, a digital signature.
Short Definition: Control over time of data, computer code, software, and documents that allows for the ability to revert to a previous revision, which is critical for data traceability, tracking edits, and correcting mistakes. Version control generates a (changed) copy of a data object that is uniquely labeled with a version number. The intent is to track changes to a data object, by making versioned copies. Note that a version is different from a backup copy, which is typically a copy made at a specific point in time, or a replica.
Reference: Research Data Alliance http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page ; http://www.alliancepermanentaccess.org/index.php/knowledge-base/dpglossary/#B
Synonym: Source control; Revision control; Versioning.
Related Term: Universal numeric fingerprint; Data citation
Short Definition: A way of portraying information from a database. This can be done by arranging the data items in a specific order, by highlighting certain items, or by showing only certain items. For any database, there are a number of possible views that may be specified. Often thought of as a virtual table, the view doesn't actually store information itself; the view just pulls information out of one or more existing tables. Although impermanent, a view may be accessed repeatedly by storing its criteria in a query.
Short Definition: Generally established by private-sector bodies and made available for use by any person or organization, private or government. The term includes what are commonly referred to as "industry standards" as well as "consensus standards." A voluntary standard may become mandatory as a result of its use, reference, or adoption by a regulatory authority, or when invoked in contracts, purchase orders, or other commercial instruments.
Reference: American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""
Short Definition: 1. Addressable units of information that are addressed through Uniform Resource Identifiers (URIs). 2. The early notion of static addressable documents or files has evolved to a more generic and abstract definition. Every 'thing' or entity that can be identified, named, addressed or handled in any way whatsoever in the web at large or in any networked information system. Examples include: an electronic document or data stored on the Web, an image, a service (e.g., "a weather report), a collection of other resources. Each resource must have a URI.