The ubiquitous applicability of data science in today’s science- and process-driven industries has given rise to a plethora of tools, libraries and projects which all fall under the moniker “data science.” As a result, the role of “data scientist” has evolved; many organizations are now looking to create teams of specialists to tackle a wider range of problems more efficiently. This is especially true for enterprise science and engineering, which often require an added layer of discipline-specific technical expertise to the statistical and computational acumen of the average data scientist.
However, this increase in the number of data scientists working on a single project can present some risk to the scalability of their solutions simply due to the number and complexity of the tools available to data science teams. Different libraries, environments and languages may not be immediately compatible together, limiting the applicability of a solution; two teams may attack the same problem differently, producing conflicting suggestions; or senior team members may depart the organization, taking valuable knowledge about key methods with them. Any of these issues and more could significantly undermine the usefulness of a data science initiative or slow its development and impact for an organization. Standardizing data analysis processes, methodology development and implementation, and solution deployment offer a means to help mitigate these challenges. To maximize the effectiveness of “standardization,” however, all of this must be done in a common development environment.
To this end, BIOVIA Pipeline Pilot focuses on unifying the multiple environments data scientists work with into a common location to drive the standardization of data science initiatives across the enterprise. The first major step in pursuit of this goal is the ability to develop in Jupyter Notebook natively in Pipeline Pilot. This allows Python specialists to utilize and wrap key Python 3 functionality within a single Pipeline Pilot component while utilizing the well-known JupyterLab interface. This functionality also supports many commonly used Python libraries, such as numpy, scipy, matplotlib, scikit-learn, tensorflow, pandas and more. For Pipeline Pilot users, this means that they are able to more effectively integrate tools, models and solutions developed by their colleagues in Python directly into their Pipeline Pilot protocols, which can then be easily shared and deployed to developers and end users alike at the enterprise scale.