Shmulevich Lab
  at ISB
 Shmulevich Lab Home

Information

Research Informatics
  Projects
  People
  Downloads
  Help
  Partners & Funding
  Links
Research Overview
     

Informatics Projects

Specifications | Frameworks | Access Control | Web Service Development | Registries | Asynchronous Processing | Project and Build Management | Schemas

Following are tools that we have found to be valuable in building an enterprise informatics infrastructure

Specifications

The Java Content Repository (JCR) is a common, standardized API for a hierarchical content store. It can be used to persist and access all sorts of structured and unstructured content (documents, data, domain objects). The JCR is not tied to any storage architecture, so its back-end can be a database, a WebDAV server, or even the file system. It normalizes the functionality for access control (security), full text search, transactions, observation (events) and versioning of the content repository. The domain model for the JCR also allows users to impose constraints on the structure and metadata.
The Life Science Identifier (LSID) is a way to name and locate pieces of information on the web. Essentially, an LSID is a unique identifier for some data (expressed as a Uniform Resource Name (URN)), and the LSID protocol specifies a standard way to locate the data (as well as a standard way of describing that data). Every LSID consists of up to five parts, each separated by a colon:
  1. Network Identifier (NID) - "urn:lsid:" label
  2. Authority - usually the root DNS name of the issuing authority
  3. Namespace - describing the location of content within domain
  4. Namespace Unique Object Identification
  5. Optional Version Id

Also see the LSID information in the tutorial and software sections.

sample lsid:
urn:lsid:systemsbiology.org:microarray.experiment. Jcr:be1bd68f-ada3-4648-ab92-8e2d992ebb7d:1.0

Also see the LSID information in the tutorial and software sections.

Universal Description, Discovery and Integration (UDDI) is a platform-independent, XML-based service registry. UDDI is an open industry initiative to publish and discover service listings and define how services and/or software applications interact over the web. UDDI was integrated into the Web Services Interoperability (WS-I) standard as a central pillar of web services infrastructure.

Also see the jUDDI updater in the software section.

Application Frameworks

A Layered Application Framework that provides the most complete lightweight dependency injection (IoC) container; provides centralized, automated configuration of application objects. It seeks to improve application testability and scalability by allowing components to be developed and tested in isolation, then scaled up for deployment in any environment (J2SE or J2EE).

It provides common abstraction layers for transaction management, JDBC support (with a meaningful exception hierarchy), integration with popular Object/Relational Mapping (ORM) tools. It also extends and uses the newest language features in Java, such as Aspect Oriented Programming (AOP), Annotations, Generics and Enumerations.

One of the core philosophies of Spring is software integration. This framework facilitates the integration of third-party components, and allows our development to leverage from the large variety of solutions already available in the open source community.

Already provides modules to integrate with JCRs, OSGI, Rich Clients, Web Services, Remoting, and more.

For J2EE applications, Spring can be used in conjunction with the Jboss application server to provide enterprise level features such as Enterprise Java Beans (EJBs), distributed caching and distributed transactions. In many cases these features are not required, so Tomcat is preferred as a lighter weight container.

Apache Jackrabbit is a fully conforming implementation of the JCR. It provides components for Search Text Extractors (search inside files), Remote Method Invocation (RMI), Web Applications, Enterprise Applications (JCA), Service Provider Interface (SPI), and WebDAV (interact with the repository using WebDAV clients, such as Windows Explorer).

Access Control

Acegi Security is a powerful, flexible security solution for enterprise software. It provides comprehensive authentication, authorization, instance-based access control, channel security and human user detection capabilities (captcha). Acegi features include:
  • Enterprise-wide single sign on
  • Domain object instance security
  • Non-intrusive setup, leverages AOP, and easily integrates into Servlet or EJB container without requiring changes to business logic
  • Secures HTTP requests (not necessary to rely on web.xml security constraints)
  • Channel security (can automatically redirect requests to HTTPS)
  • LDAP Support
  • Event support (enables you to implement account lockout and audit log systems)
  • Run-as replacement (supports running asynchronous processes with security credentials)
  • Configuration via Spring XML, Commons Attributes, or JDK 5 Annotations
  • Built by Maven
Web-based Distributed Authoring and Versioning (WebDAV), is a set of extensions to HTTP that allows users to collaboratively edit and manage files on remote servers. It provides functionality to create, change and move documents. Important features include locking, file properties, name space management (copy and move) and collections. Most modern operating systems provide built-in support for WebDAV.

Web Service Development

Apache Tomcat is a web container, or application server developed at the Apache Software Foundation. Tomcat implements the Java Servlet and the JavaServer Pages (JSP) specifications from Sun Microsystems.

Axis2 is an open source, XML based Web service framework. It provides a SOAP Server implementation, and various utilities and APIs for generating and deploying Web service applications. It allows developers to quickly create interoperable, distributed computing applications, and automatically generates

WSDL (Web Services Description Language) files. Axis can be used in standard application servers, or it can also function as a standalone server application. It supports SOAP 1.1 and SOAP 1.2, and it also has integrated support for RESTful Web services. There are also modules available to integrate many SOAP protocols: WS-ReliableMessaging, WS-Coordination, WS-AtomicTransaction, WS-Security and WS-Addressing.

Registries

jUDDI (pronounced "Judy") is an open source, platform-independent Java implementation of the UDDI specification for Web Services (compliant with UDDI v2.0). It can be used with any SQL database (MySQL, DB2, Sybase, JDataStore, HSQLDB, etc.). Its deployable on any Java application server that supports the Servlet 2.3 specification (Tomcat, JOnAS, WebSphere, WebLogic, etc.). jUDDI registry supports a clustered deployment configuration and easy integration with existing authentication systems.

Provides a Java API for client and server components to facilitate LSID integration. It provides a set of servlets and a simplified interface for quickly creating LSID authorities, as well as fully featured LSID resolution services. Supports publishing metadata about an LSID using RDF format (Resource Description Framework - http://www.w3.org/TR/rdf-schema).

Asynchronous Processing

GenePattern is a tool developed by the Broad Institute to support multidisciplinary genomic research programs by providing an interface to combine modules in many different languages for creating analysis workflows.

Mule is a messaging platform based on ideas from Enterprise Service Bus (ESB) architectures. An ESB works by acting as a sort of transit system for carrying data between applications within or outside your intranet. The ESB defines a series of stops, or "endpoints", through which applications can send or receive data onto or from the system.

BPEL provides a language for the formal specification of business processes and business interaction protocols. By doing so, it extends the Web Services interaction model and enables it to support business transactions.

Project and build Management

An enterprise wiki that provides a platform for team collaboration in the form of project notes, blogs, and integration with projects that are managed in Jira.

JIRA is a bug tracking, issue tracking, and project management application that integrates well with Confluence. JIRA is organized around tasks, and we have found it to be a flexible solution for both project and issue management.

Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

Schemas

Controlled vocabularies, taxonomies, ontologies developed by the Informatics Core

Functional genomics experiments present many challenges in data archiving, sharing and querying. As the size and complexity of data generated from such experiments grows, so does the requirement for standard data formats. To address these needs, the Functional Genomics Experiment [Object Model / Markup-Language] (FuGE-OM, FuGE-ML) has been created to facilitate the development of data standards.

FuGE is a model of the shared components in different functional genomics domains. FuGE facilitates the development of data standards in functional genomics in two ways:

  1. FuGE provides a model of common components in functional genomics investigations, such as materials, data, protocols, equipment and software. These models can be extended to develop modular data formats with consistent structure.
  2. FuGE provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats. In this context, FuGE allows the capture of additional metadata that gives formats a context within the complete workflow.
The Ontology for Biomedical Investigations (OBI) project is developing an integrated ontology for the description of biological and medical experiments and investigations. This includes a set of 'universal' terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain.

This ontology will support the consistent annotation of biomedical investigations, regardless of the particular field of study. The ontology will model the design of an investigation, the protocols and instrumentation used, the material used, the data generated and the type analysis performed on it. This project was formerly called the Functional Genomics Investigation Ontology (FuGO) project.

  • Develop an Ontology for Biomedical Investigations in collaboration with groups representing different biological and technological domains involved in Biomedical Investigations
  • Make OBI compatible with other bio-ontologies
  • Develop OBI using an open source approach
  • Create a valuable resource for the biomedical communities to provide a source of terms for consistent annotation of investigations
Microarray and Gene Expression (MIAME/MAGE/MO)
As a high-level guideline, we use the MIAME (Minimum Information About Microarray Experiments) standard. MIAME provides information about the context of an experiment and information (as a minimum) should be captured. MAGE aims to provide a standard for the representation of microarray expression data that would facilitate the exchange of microarray information between different data systems. MAGE is composed of three parts - MAGE-OM is the object model for implementing MIAME. MAGE-ML is an XML-based markup language for exchanging microarray data. MAGEstk (or MAGE Software Toolkit) is a collection of packages that act as converters between MAGE-OM and MAGE-ML.

The MGED Ontology (MO) provides Concepts, definitions, terms, and resources for standardized description of a microarray experiment in support of MAGE. The MGED ontology is divided into the MGED Core ontology which is intended to be stable and in synch with MAGE v.1; and the MGED Extended ontology which adds further associations and classes not found in MAGE v.1.

A mapping between MIAME, MAGE-OM, and MO is available here We base our internal systems on MAGE-TAB, A simple spreadsheet-based, MIAME-supportive format for microarray data that follows the structure of the MAGE-OM, along with references to external vocabularies and vocabularies within the MO.

Flow Informatics and Computational Cytometry Society (FICCS) is an organization that connects people sharing interest in new software tools, methods, and standards for flow cytometry. FICCS outlines a number of different standards that are currently being used in the Flow Cytometry space. MIFlowCyt: Minimum Information for a Flow Cytometry Experiment is a MIAME-like (see above) standard for expressing experimental structure. FlowRDF provides guidelines for encoding annotations of flow cytometry experimental descriptions in RDF. FlowRDF is associated with three XML schemas: FuGEFlow is a flow cytometry experimental workflow description based on the FuGE model

The International Society for Analytical Cytology (ISAC) is an international scientific and educational organization whose purpose is to promote: development of analytical cytology; transfer of methodologies; and exchange of scientific and technical information. Analytical cytology is broadly defined as the characterization and measurement of cells and cellular constituents for biological, diagnostic and therapeutic purposes.

Open Microscopy Environment (OME)
OME produces open tools to support data management for biological light microscopy. We use a subset of OME's schema.
The Cell Ontology is designed as a structured controlled vocabulary for cell types. This ontology was constructed for use by the model organism and other bioinformatics databases, where there is a need for a controlled vocabulary of cell types. This ontology is not organism specific; indeed it includes cell types from prokaryotes to mammals, including plants and fungi. A full description of the Cell Ontology can be found in An Ontology for Cell Types (Bard, Rhee and Ashburner. 2005).
RESEARCH OVERVIEW | SOFTWARE | PUBLICATIONS | PEOPLE | CONTACT | NEWS AND EVENTS
© 2008, Institute for Systems Biology, All Rights Reserved