This is a crosspost from Jonathan Dursi, R&D computing at scale. See the original post here.
Computers are everywhere now, but computing is still hard. Canada should build on its competitive advantage by strengthening existing efforts to provide expertise, skills and training to researchers and scholars across the country, and let others provide the increasingly commodity hardware. The result will be a generation of trainees with deep research and cloud experience, and a critical mass of talent at centres focussed on building enabling technologies.
As R&D becomes increasingly intertwined with computational techniques, the need for advanced R&D computing support to power research and scholarship has grown enormously. What that support looks like, however, and the kind of services that researchers most need, has changed radically over the past decades.
In the 1990s and 2000s, the overwhelming need was simply access to computers. With no other providers for computing or storage, it fell to individual research groups to supply their own. But a natural economy of scale starts to play out with computational resources. Purchasing and operating hardware becomes more cost-effective in bulk; and what was even then the most scarce and valuable resource - the expertise to operate and make effective use of the hardware - actually grows, rather than is diminished, by being involved in different research problems. So quickly individual researcher “clusters in a closet” gave way to departmental, then institutional, and finally regional or national platforms for computational research and data science support. In Canada, the vast majority of such support is offered through Compute Canada.
As we enter 2019, this landscape looks quite different than it did in the 90s. Computing resources adequate for research are thick on the ground. Indeed, as the range of problems researchers tackle with computing and data broaden, many extremely active areas of compute- and data-powered research require nothing more than a powerful desktop.
And for larger needs, the unavoidable logic of economies of scale for computers and storage has now entered the marketplace. A competitive range of commercial vendors provide access to computing resources that can meet the vast majority of other researchers needs. While it’s true that those commercial cloud providers charge a premium (50%-100%, slowly declining over time) over what it costs to provide the resources in academic research environments, that premium pays for enormous benefits in improved uptime, flexibility, and currency of the hardware, all of which have real value for researchers. Increasingly, even niche technologies like FPGAs, RDMA-enabled networking, and ARM processors are readily available on commercial cloud providers, leaving fewer and fewer use cases where in house provision of computer resources remains a necessity. Those use cases are important — they include multi-rack HPC users, and the stewardship and analysis of data with the strictest regulatory on-premises requirements — but they represent a minority of computational science needs.
But even while computers for research become ever more accessible, research computing for cutting edge research remains a barrier to too many. Scientists and scholars are trained to be experts in their field, not necessarily experts in computer science or the latest computer hardware. Even keeping track of the latest computational methods, which frequently come from neighbouring fields if not different disciplines entirely, can be a challenge. Researchers greatly need assistance from and collaborations with experts in research computation itself. It is the skills, not the infrastructure, that is scarcest.
The good news is that the Compute Canada federation has a network of roughly 200 computational experts, many at the Ph.D. level, available to directly enable science projects. The bad news is that the priorities of the organization, and thus most of its effort and energies, are focussed on procuring and operating on-premises commodity computing and storage hardware - to the extent that many of those experts spend most of their time answering basic help-desk questions or performing routine operational duties for those systems.
With academic institutions now being just one player amongst many for computing and storage resources, there are a few possible futures for Canada’s computing centres – centres that have grown up primarily focused on purchasing, operating, and providing access to hardware for researchers. They could downsize, shrinking to focus on those sorts of hardware not well covered by other providers. Alternatively, they could double down on the “discount provider” model, emphasizing low cost, ‘no frills’ access to compute and storage, competing on price.
Either of these approaches represent a scandalous squandering of opportunity, wasting invaluable and nearly irreplaceable expertise and experience in applying computational techniques to open research problems. Instead, we should do something different. We should pursue our competitive advantage by taking the existing network of computational science advisors that we already have and make those higher level expert services the primary offering, letting other providers focus on the lower level procurement and operating of most computing and storage hardware.
The goal of a research computing support platform is to enable research, and to help develop the next generation of research talent. Knowledge transfer and skills development are by far the most valuable work that a computing team can to to meet those goals - because skills have longest lasting impact, because it addresses real needs in Canada’s R&D ecosystem, and simply because no one else can do it at scale.
First, deep training with research methods pay long-lasting dividends. Even in a rapidly changing fields like data and computational science, skills and experience don’t depreciate the way computing hardware does. New methods come, but old methods don’t really go; and fluency in the previous generation of methods makes learning – or even creating – those newer methods easier.
And it’s actually even better than that, because not only do the skills that come from that research experience and training remain useful in their field for long periods from time, they transfer to other disiplines extremely well. Methods for solving equations, or pulling information out of data, have strong relationships with each other and can often be applied with modest modifications to problems well outside the fields in which they were first developed. These broad areas of effort - Data Science, Informatics, Simulation Science, and the Data Engineering or cloud computing tools needed for them - are enabling research technologies which can empower research in many fields. And there lies the second reason for the importance of the skills devevelopment; these research-enabling technologies are areas in which Canada currently lags. A recent report on the State of Science and Technology and Industrial R&D specifically calls out “enabling technologies” as a current area of weakness for Canada which is holding high impact research in other areas back. Focussing on such highly transferrable skills and talent development in our research computing platform would help build a critical mass of such expertise both in the research computing centres themselves and in the community as a whole.
Finally, there just aren’t other options for providing high-level data and computational science collaboration and training to Canada’s scholars and researchers consistently and across disciplines. We in the research community know that availability of a collaborator with complementary interests and skills can make the difference between a research project happening or not. Unlike access to commodity computing hardware, the skills involved in making sure researchers have access to the best methods for their research, and in training emerging research talent in the computational side of their discipline, are very much not commodity skills, and cannot be purchased or rented from somewhere else.
The benefits of further efforts in skills development and training are fairly clear, and this alone would justify redirecting some effort from hardware to research services, and using comercial cloud providers to fill the gap. But having substantial commercial cloud resources available for researchers is worthwhile on its own merits.
Firstly, cloud provides more flexibility for rapidly changing research. The resource mix can be much broader and change much more rapidly than traditional procurement cycles would allow; what’s more, those changes can be in response to demonstrated researcher needs, rather than making predictions and assumptions about the next five years based on existing research users. Like owning systems, dynamically taking advantage of this flexibility requires top operational staff. And the uptime availability and hardware currency of these resources will generally be significantly better than what can be provided in house.
Secondly, trainees and staff benefit from gaining extremely relevant commercial cloud expertise. This goes back to skills development a bit, but in this case it’s the system tools – the experience working with commercial cloud services and building data systems solutions using them – that are valuable in and of themselves, and will be attractive skills to have in whatever career they move on to.
Finally, commercial engagement can proceed much more smoothly, and be more attractive from the point of view of the commercial partner, when the collaboration happens in the commercial cloud. The success of efforts like Uber Cloud provides some validation of this. Most companies that would participate in such engagement either already have or are planning commercial cloud projects, and are likely more comfortable with such offerings that using academic systems.
Making significant changes to priorities and indeed how we provision basic services can seem daunting. It may not seem clear how to get there from here, but there are some basic approaches and guidelines that can help.
The goal of a research computing support platform - any research support resource, really - is to enable research, and to help develop the next generation of research talent. With that primary mission in mind, the reasons for focussing the time and effort of computational science experts on collaboration and skills development rather than operating commodity hardware could not be clearer:
There are costs to this approach; it will cost somewhat more to have someone else run much of that hardware. But even those costs have upsides:
The prospect of moving to such a different service model may seem daunting, but it needn’t be:
These changes will not be easy; they will require participation from funders, staff, researchers, and all stakeholders. But the research computing world of today is not that of the 1990s, and how we support computational research should take advantage of that.
Images courtesy of shutterstock and pixabay, used under license