This is a crosspost from Jonathan Dursi, R&D computing at scale. See the original post here.
(Note: This post is adapted from #102 of the Research Computing Teams Newsletter)
If you followed HPC twitter in late 2021 at all, you will have seen a heartfelt thread by a well-known research software developer, one who was a key contributor to the Singularity project among others, lamenting the frankly appalling state of developer productivity in HPC - both in what tools exist, and support for them (and other tools for developers) at academic centres. A lot of people chimed into the discussion, including one of the leading developers of the PetSC project, embedded software developers, some key people at big computing centres, all agreeing that there was a problem, but typically zooming in on one or another particular technical or procedural issue and not coming to any conclusion.
I think the issue is a lot bigger than HPC software development workflows - it comes up in too many contexts to be about specific technical issues of running CI/CD pipelines on fixed infrastructure. The only people to identify the correct underlying issue, in my opinion, were people with experience of both academia and the private sector, such as Brendan Bouffler at AWS:
Too much reliance on “free” labour - postgrads and post docs who, invariably, decide that burning their time being mechanical turks for their “superiors” just sucks, so they come and work for us. And since we pay $$, we’re not gonna waste them on things that software can do.
— Brendan Bouffler☁️ 🏳️🌈 (@boofla) November 20, 2021
The same argument got made by R&D research staff in the private sector. Their time actually has value; as a result, it gets valued.
In academic research computing, partly because of low salaries — especially for the endless stream of trainees — but also because we typically provide research computing systems for free, we tend to put zero value on people’s time. Thus our “lowest-cost” approach definitely does not apply to researcher or trainee effort. If researchers have to jump through absurd hoops to get or renew their accounts, or have to distort their workflows to fit one-size-fits-all clusters and queueing systems, or postdocs have to spend hours of work by hand every month hand because tools to automate some of that work would cost $500, well, what do they expect, right?
It’s not that this is an indefensible position to take, but one can’t take this position and act surprised when researchers who can afford to are seriously investigating taking their projects into the commercial cloud even though it costs 2x as much. It turns out that people’s time is worth quite a lot to them, and is certainly worth some money. If we were to let researchers spend their research computing and data money wherever they pleased, I think we’d find that significantly less than 100% of researchers would use “lowest price possible” as their sole criterion for choosing providers. Core facilities like animal facilities, sequencing centres, and microscopy centres compete on dimensions other than being the cheapest option available.
To be sure, there are process issues in academia which exacerbates the tendency to see people’s time as valueless - rules about capital vs operating costs, for instance - but those rules aren’t a law of nature. If we were paying people in academia what they pay in tech, administration would suddenly discover some additional flexibility in the thresholds and criteria for considering something a capital expense if it meant we could be a bit more parsimonious with people’s time.
Until then, one can’t be too surprised when the most talented and ambitious staff get routinely poached by the private sector, and when research groups start considering service providers that cost more but respect their time.