Overview
Researchers at a leading NIH institute were generating increasingly complex scientific datasets, and the need for a modern, optimized data infrastructure became a key priority to support their cutting-edge research. Scientific data was distributed across various Network Attached Storage (NAS) systems, external hard drives, and instrument control computers without network access. There was no centralized storage platform, limited ability to share large datasets with collaborators, and no consistent strategy for moving data from instruments to analysis environments.
BioTeam partnered with both scientific and IT stakeholders to design and implement an end-to-end scientific data infrastructure. This included centralized high-performance storage, Science DMZ networking, instrument connectivity, HPC integration, and secure external data sharing capabilities. The new environment enables research data to move efficiently from generation to analysis, allowing scientists to focus on research rather than data logistics.
Impact Summary
- Core instruments now transfer data directly to centralized scientific storage over the network, eliminating reliance on portable drives.
- Researchers can securely share and receive large datasets with external collaborators directly from the storage platform.
- High-throughput network connectivity established for major instruments and analysis workstations.
- Monthly storage dashboards provide visibility into data usage across laboratories and projects.
- Reduction in lab-managed storage vendors and removable media purchases, simplifying IT operations and reducing security risk.
- Centralized backup and archiving capabilities relieve researchers from managing their own data protection workflows.
Challenge
The institute lacked a centralized, scalable storage environment designed for scientific data. Valuable datasets accumulated across fragmented storage locations, including segregated NAS systems, portable drives, optical media, and instrument-local storage. Leadership had limited visibility into data growth, location, and protection status.
Instrument connectivity constraints required researchers to physically move data between systems, creating delays and increasing the risk of data loss or security exposure. Collaboration with external partners was often slow and difficult due to the lack of secure, high-performance data-sharing mechanisms. Access to high-performance computing resources was similarly constrained by inefficient data transfer solutions.
Prior infrastructure proposals focused primarily on enterprise storage for office documents rather than on large-capacity, high-performance storage necessary for research workflows. The institute required a comprehensive scientific data strategy aligned with how researchers generate, analyze, and share data.
Approach
BioTeam began with structured discovery sessions involving individual research groups to understand existing data workflows, bottlenecks, and unmet needs. Parallel engagement with IT leadership clarified the current technology landscape, operational constraints, and long-term strategic priorities. This dual engagement model enabled alignment between scientific requirements and institutional IT and security standards.
Based on these findings, BioTeam designed and validated a comprehensive solution before implementation. Key elements included:
- Centralized Scientific Data Storage
BioTeam conducted a proof-of-concept evaluation of multiple storage platforms, incorporating feedback from both researchers and IT stakeholders. The selected solution provides high-performance all-flash storage with multi-protocol access (SMB, NFS, and HTTPS) and supports future scaling as data volumes grow. - Network and Instrument Connectivity
High-bandwidth connections were established between core instruments, analysis workstations, and the centralized storage environment. BioTeam worked closely with institutional networking teams to implement a Science DMZ architecture supporting efficient, high-throughput data movement. - HPC Integration
Resilient data transfer pathways were created between scientific storage and the institutional HPC environment. Researchers can now move datasets into computational workflows without manual intervention or workflow disruption. - External Collaboration and Data Sharing
Secure data-sharing capabilities were implemented using Globus, enabling the compliant exchange of large datasets with external collaborators. Legacy data lake resources were migrated to the new environment to maintain continuity. - Data Management Supporting Services
BioTeam deployed centralized backup and archiving infrastructure, including simple user-driven workflows that move data to cloud archiving storage tiers. Permissions management, usage dashboards, and automated monthly reporting were implemented to improve operational visibility.
Outcomes
Researchers now generate data directly into a centralized scientific storage environment accessible across laboratories and analysis platforms. Dependence on removable media and fragmented storage solutions has been significantly reduced, simplifying IT management and strengthening data governance.
Archiving workflows allow researchers to manage older datasets independently while maintaining rapid restore capabilities. Automated reporting provides both researchers and leadership with consistent insight into storage utilization and growth trends.
Secure, self-service data sharing has streamlined collaboration with external partners. Most importantly, the time required to move data from acquisition to analysis has decreased substantially. By removing infrastructure friction from daily workflows, the institute has enabled scientists to focus more fully on research productivity and scientific discovery.


