Listen to our newest podcast with Ruth Marinshaw, CTO of Research Computing at Stanford University.

Close this search box.

Data Platforms & Data Commons

Harmonize data to allow scientists to ask more complex questions.

When scientists can combine diverse data that span data sets, trials and projects, they are able to piece together the information for integrative analysis that can lead to breakthrough discoveries. To make data open and easily accessible by the community, data commons software platforms co-locate data, the cloud-based computing infrastructure and commonly used software applications, tools and services. BioTeam builds data commons that provide consistency in the collection and make data available to research teams across your scientific data ecosystem. 

Data Platforms Icon

Data dictionaries document and share data across the ecosystem

Making data findable and accessible can be achieved with data dictionaries that document and share data structures and other information across the scientific ecosystem. A data dictionary is a collection of the names, definitions and attributes about data elements and models used or captured in a database, information system or research project. BioTeam works with you to develop data dictionaries that provide consistency by defining the conventions and describing the meanings and purposes of data elements within the context of a project. We also provide guidance on conventions, interpretation, accepted meanings and representation.

Metadata provide valuable context to the data

A data dictionary also provides metadata to assist in defining the scope and characteristics of data elements used as part of a database, research project or information system. Using our best-practices experience, BioTeam can help you define the metadata important in your scientific data ecosystem. Such metadata can include attribute name and type, entity relationship, reference data, rules for validation, schema or data quality, detailed properties of data elements and information about storage.

Contextualization influences the usefulness of data

Once scientists are able to capture the context, or metadata, in which the data were created, others can understand the boundaries of how the data can be used and which questions can be asked. For example, around clinical protocols, scientists can understand how tissue samples were collected and treated. Without a data dictionary, there’s a higher risk of data inconsistencies across a project, which could result in loss of crucial information in translation and transition.

Data hygiene: an essential part of data dictionaries

Data hygiene is critical, because data at the point of creation must be clean, authoritative, standardized and harmonized. Dirty data causes disruption and introduces errors in scientific research because it unknowingly introduces outdated, incomplete, duplicated or incorrect. BioTeam works with scientific teams to introduce data-hygiene processes that make the most of data.

Enforce the use of data standards

Data standards are rules that govern the collection, recording and representation of data providing a commonly understood reference for the interpretation and use of data sets. Researchers in the same disciplines will know that the way their data are collected and described will be the same across different projects. A shared dictionary enforces the use of data standards for quality, meaning and relevance for all data elements for all team members, allowing data to be recognizable and usable beyond the immediate research team.

Support data warehouses and data lakes

Data lakes and data warehouses are used for storing big data. A data lake is a vast pool of raw data whose purpose is not yet defined, while a data warehouse is a repository for structured, filtered data that’s already been processed for a specific purpose. While one works for one company, the other may be a better fit for another. Data dictionaries formalize and control the naming of entities, attributes and their relationships within databases. BioTeam has extensive experience implementing the right solution to meet the needs of scientific teams.

Partner With Us

Partner with BioTeam to increase data sharing and reuse.

BioTeam works closely with scientific teams to increase data sharing and reuse across your ecosystem regardless of the current stage of your digital transformation journey. We look closely at the data-sharing goals of an organization. Your goals can have significant implications for the data generation, preservation and publishing phases and impact the type of infrastructure solution best suited to your needs. Often BioTeam works with clients to implement data commons and platforms to support data sharing and reuse within a specific organization, or more publicly in the context of open, public data-sharing projects.

Learn more about our other practice areas:

Data Platforms / Commons

Data Platforms / Commons

Related News & Insights

Check out our latest articles and webinars. Get expert advice and learn best practices.