IBM Spectrum Scale Best Practices for Genomics Medicine Workloads

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads PDF

Author: Joanna Wong

Publisher: IBM Redbooks

Published: 2018-04-25

Total Pages: 78

ISBN-13: 0738456756

DOWNLOAD EBOOK →

Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM SpectrumTM Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum ScaleTM is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic StorageTM Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM RedpaperTM publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow from the Broad Institute. The scenarios provide all that is required, including ready-to-use configuration and tuning templates for the different building blocks (compute, network, and storage), that can enable simpler deployment and that can enlarge the level of assurance over the performance for genomics workloads. The solution is designed to be elastic in nature, and the disaggregation of the building blocks allows IT administrators to easily and optimally configure the solution with maximum flexibility. The intended audience for this paper is technical decision makers, IT architects, deployment engineers, and administrators who are working in the healthcare domain and who are working on genomics-based workloads.

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads PDF

Author: Joanna Wong

Publisher:

Published: 2018

Total Pages: 64

ISBN-13:

DOWNLOAD EBOOK →

Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM Spectrum"!Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum Scale"!is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic Storage"!Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM Redpaper"!publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow ...

HIPAA Compliance for Healthcare Workloads on IBM Spectrum Scale

HIPAA Compliance for Healthcare Workloads on IBM Spectrum Scale PDF

Author: Sandeep R. Patil

Publisher: IBM Redbooks

Published: 2020-03-16

Total Pages: 18

ISBN-13: 0738458600

DOWNLOAD EBOOK →

When technology workloads process healthcare data, it is important to understand Health Insurance Portability and Accountability Act (HIPAA) compliance and what it means for the technology infrastructure in general and storage in particular. HIPAA is US legislation that was signed into law in 1996. HIPAA was enacted to protect health insurance coverage, but was later extended to ensure protection and privacy of electronic health records and transactions. In simple terms, it was instituted to modernize the exchange of healthcare information and how the Personally Identifiable Information (PII) that is maintained by the healthcare and healthcare-related industries are safeguarded. From a technology perspective, one of the core requirements of HIPAA is the protection of Electronic Protected Health Information (ePHIPer through physical, technical, and administrative defenses. From a non-compliance perspective, the Health Information Technology for Economic and Clinical Health Act (HITECH) added protections to HIPAA and increased penalties $100 USD - $50,000 USD per violation. Today, HIPAA-compliant solutions are a norm in the healthcare industry worldwide. This IBM® Redpaper publication describes HIPPA compliance requirements for storage and how security enhanced software-defined storage is designed to help meet those requirements. We correlate how Software Defined IBM Spectrum® Scale security features address the safeguards that are specified by the HIPAA Security Rule.

IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences

IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences PDF

Author: Dino Quintero

Publisher: IBM Redbooks

Published: 2019-09-08

Total Pages: 88

ISBN-13: 073845690X

DOWNLOAD EBOOK →

This IBM® Redpaper publication provides an update to the original description of IBM Reference Architecture for Genomics. This paper expands the reference architecture to cover all of the major vertical areas of healthcare and life sciences industries, such as genomics, imaging, and clinical and translational research. The architecture was renamed IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences to reflect the fact that it incorporates key building blocks for high-performance computing (HPC) and software-defined storage, and that it supports an expanding infrastructure of leading industry partners, platforms, and frameworks. The reference architecture defines a highly flexible, scalable, and cost-effective platform for accessing, managing, storing, sharing, integrating, and analyzing big data, which can be deployed on-premises, in the cloud, or as a hybrid of the two. IT organizations can use the reference architecture as a high-level guide for overcoming data management challenges and processing bottlenecks that are frequently encountered in personalized healthcare initiatives, and in compute-intensive and data-intensive biomedical workloads. This reference architecture also provides a framework and context for modern healthcare and life sciences institutions to adopt cutting-edge technologies, such as cognitive life sciences solutions, machine learning and deep learning, Spark for analytics, and cloud computing. To illustrate these points, this paper includes case studies describing how clients and IBM Business Partners alike used the reference architecture in the deployments of demanding infrastructures for precision medicine. This publication targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing life sciences solutions and support.

IBM Reference Architecture for Genomics, Power Systems Edition

IBM Reference Architecture for Genomics, Power Systems Edition PDF

Author: Dino Quintero

Publisher: IBM Redbooks

Published: 2016-04-05

Total Pages: 140

ISBN-13: 0738441635

DOWNLOAD EBOOK →

This IBM® Redbooks® publication introduces the IBM Reference Architecture for Genomics, IBM Power SystemsTM edition on IBM POWER8®. It addresses topics such as why you would implement Life Sciences workloads on IBM POWER8, and shows how to use such solution to run Life Sciences workloads using IBM PlatformTM Computing software to help set up the workloads. It also provides technical content to introduce the IBM POWER8 clustered solution for Life Sciences workloads. This book customizes and tests Life Sciences workloads with a combination of an IBM Platform Computing software solution stack, Open Stack, and third party applications. All of these applications use IBM POWER8, and IBM Spectrum ScaleTM for a high performance file system. This book helps strengthen IBM Life Sciences solutions on IBM POWER8 with a well-defined and documented deployment model within an IBM Platform Computing and an IBM POWER8 clustered environment. This system provides clients in need of a modular, cost-effective, and robust solution with a planned foundation for future growth. This book highlights IBM POWER8 as a flexible infrastructure for clients looking to deploy life sciences workloads, and at the same time reduce capital expenditures, operational expenditures, and optimization of resources. This book helps answer clients' workload challenges in particular with Life Sciences applications, and provides expert-level documentation and how-to-skills to worldwide teams that provide Life Sciences solutions and support to give a broad understanding of a new architecture.

IBM Elastic Storage Server Implementation Guide for Version 5.3

IBM Elastic Storage Server Implementation Guide for Version 5.3 PDF

Author: Luis Bolinches

Publisher: IBM Redbooks

Published: 2019-02-05

Total Pages: 102

ISBN-13: 0738457418

DOWNLOAD EBOOK →

This IBM® RedpaperTM publication introduces and describes the IBM Elastic StorageTM Server as a scalable, high-performance data and file management solution. The solution is built on proven IBM SpectrumTM Scale technology, formerly IBM General Parallel File System (GPFSTM). IBM Elastic Storage Servers can be implemented for a range of diverse requirements, providing reliability, performance, and scalability. This publication helps you to understand the solution and its architecture and helps you to plan the installation and integration of the environment. The following combination of physical and logical components are required: Hardware Operating system Storage Network Applications This paper provides guidelines for several usage and integration scenarios. Typical scenarios include Cluster Export Services (CES) integration, disaster recovery, and multicluster integration. This paper addresses the needs of technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who must deliver cost-effective cloud services and big data solutions.

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally PDF

Author: Nils Haustein

Publisher: IBM Redbooks

Published: 2019-03-29

Total Pages: 78

ISBN-13: 0738457493

DOWNLOAD EBOOK →

Economic globalization requires data to be available globally. With most data stored in file systems, solutions to make this data globally available become more important. Files that are in file systems can be protected or shared by replicating these files to another file system that is in a remote location. The remote location might be just around the corner or in a different country. Therefore, the techniques that are used to protect and share files must account for long distances and slow and unreliable wide area network (WAN) connections. IBM® Spectrum Scale is a scalable clustered file system that can be used to store all kinds of unstructured data. It provides open data access by way of Network File System (NFS); Server Message Block (SMB); POSIX Object Storage APIs, such as S3 and OpenStack Swift; and the Hadoop Distributed File System (HDFS) for accessing and sharing data. The IBM Aspera® file transfer solution (IBM Aspera Sync) provides predictable and reliable data transfer across large distance for small and large files. The combination of both can be used for global sharing and protection of data. This IBM RedpaperTM publication describes how IBM Aspera Sync can be used to protect and share data that is stored in IBM SpectrumTM Scale file systems across large distances of several hundred to thousands of miles. We also explain the integration of IBM Aspera Sync with IBM Spectrum ScaleTM and differentiate it from solutions that are built into IBM Spectrum Scale for protection and sharing. We also describe different use cases for IBM Aspera Sync with IBM Spectrum Scale.

IBM Software-Defined Storage Guide

IBM Software-Defined Storage Guide PDF

Author: Larry Coyne

Publisher: IBM Redbooks

Published: 2018-07-21

Total Pages: 158

ISBN-13: 0738457051

DOWNLOAD EBOOK →

Today, new business models in the marketplace coexist with traditional ones and their well-established IT architectures. They generate new business needs and new IT requirements that can only be satisfied by new service models and new technological approaches. These changes are reshaping traditional IT concepts. Cloud in its three main variants (Public, Hybrid, and Private) represents the major and most viable answer to those IT requirements, and software-defined infrastructure (SDI) is its major technological enabler. IBM® technology, with its rich and complete set of storage hardware and software products, supports SDI both in an open standard framework and in other vendors' environments. IBM services are able to deliver solutions to the customers with their extensive knowledge of the topic and the experiences gained in partnership with clients. This IBM RedpaperTM publication focuses on software-defined storage (SDS) and IBM Storage Systems product offerings for software-defined environments (SDEs). It also provides use case examples across various industries that cover different client needs, proposed solutions, and results. This paper can help you to understand current organizational capabilities and challenges, and to identify specific business objectives to be achieved by implementing an SDS solution in your enterprise.

IBM Spectrum Archive Enterprise Edition V1.3.2.2: Installation and Configuration Guide

IBM Spectrum Archive Enterprise Edition V1.3.2.2: Installation and Configuration Guide PDF

Author: Hiroyuki Miyoshi

Publisher: IBM Redbooks

Published: 2022-03-10

Total Pages: 360

ISBN-13: 0738460427

DOWNLOAD EBOOK →

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum® Archive Enterprise Edition (EE) Version 1.3.2.2 for the IBM TS4500, IBM TS3500, IBM TS4300, and IBM TS3310 tape libraries. IBM Spectrum Archive Enterprise Edition enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale based environment. It also helps encourage the use of tape as a critical tier in the storage environment. This edition of this publication is the tenth edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 9, 8, 7, 6, and 5 tape drives. and the IBM TS1160, TS1155, TS1150, and TS1140 tape drives. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM customers, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage

IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage PDF

Author: Joseph Dain

Publisher: IBM Redbooks

Published: 2019-10-01

Total Pages: 152

ISBN-13: 0738457868

DOWNLOAD EBOOK →

This IBM® Redpaper publication provides a comprehensive overview of the IBM Spectrum® Discover metadata management software platform. We give a detailed explanation of how the product creates, collects, and analyzes metadata. Several in-depth use cases are used that show examples of analytics, governance, and optimization. We also provide step-by-step information to install and set up the IBM Spectrum Discover trial environment. More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data such as: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.