
Historically, US research software has predominantly been utilized within the country by domestic researchers. However, recent years have seen a surge in international collaboration, with research software playing a pivotal role. International users can represent a substantial user base for some research software, and foreign engineers have huge potential to contribute significantly to the US software community. As a result, it is crucial for RSEs and researchers to recognize the importance of software internationalization and localization, and to acquire the methodologies necessary for their effective implementation. This talk will offer guidance on designing, developing, and testing internationalized research software, ensuring that it meets the needs of a global audience in the future.

This talk will be structured from broad concepts to specific skills (i.e., Internationalization → Localization → Translation) to present software design principles that can prepare for research software a global impact in the future.

The first section (~3 mins) will cover globalization/internationalization, focusing on the product design perspective. This part will cover the concept of internationalization, some regulations for product owners to keep in mind, product design principles, and potential costs, with examples for demonstration. The audience will learn the importance of including internationalization considerations at the proposal drafting stage, rather than leaving it as a task during the development stage.

The second section (~4 mins) will transition to localization. The presenter will discuss the meaning of localization and how the lack of localization can hinder the global promotion of US research software. Examples will illustrate the steps of designing, developing, and testing software localization. At the end of this section, the presenter will provide a checklist for RSEs and researchers to refer to in future localization processes.

The third section (~4 mins) will focus on translation, a crucial component of localization. The presenter will introduce software design and development principles for adding translation capabilities, followed by a discussion of common translation tools that RSEs can use for popular frontend and backend frameworks. The section will conclude with a focus on utilizing AI tools to enhance translation quality.

Overall, this talk aims to inspire the research software community to rethink software from an international perspective and empower them with the knowledge to promote U.S. research worldwide in the future.

Many academics feel comfortable wrangling and analyzing data in R, but have little to no experience working on the command line and may find job scheduling systems like SLURM intimidating. This can be a significant barrier for using high performance computing which generally requires creating BASH scripts and submitting jobs via the command line. The {targets} R package provides many benefits to researchers, one of which is running steps of an analysis automatically as job requests on an HPC all from the comfort of R.

The {targets} package allows for workflow management of analysis pipelines in R where dependencies among steps are automatically detected. When a {targets} pipeline is modified and re-run, any steps (called “targets”) that do not need to be rerun are automatically skipped, saving compute time. By default, {targets} pipelines are launched in a “clean” R session, which enforces reproducibility (a blessing to some and a curse to others). It is relatively trivial to parallelize a {targets} workflow so that independent targets are run on parallel workers either locally as multiple R sessions, or using HPC or cloud computing resources. Users can define a controller that runs their pipeline on the HPC using multiple workers running as separate SLURM jobs (or PBS, SGE, etc.). It is also possible to define multiple controllers with different resources for different targets so that tasks with heavier computational needs are run with more CPUs, for example. All of this happens from the comfort of R without users needing to manually create multiple R scripts and/or multiple SLURM submission scripts for each task.

RSEs and HPC professionals can help enable this powerful combination of technologies in a few ways. At University of Arizona, our group has created a template GitHub repository for a {targets} project that can run on the UA HPC either through the command line where targets are run as SLURM requests or with Open OnDemand where targets are run on multiple R sessions. It includes code for a controller function that works with the UA HPC and documentation about how to modify the template and get it onto the HPC with git clone. We have previously run {targets} workshops that help researchers re-factor their analysis scripts into {targets} pipelines. We hope to work with HPC professionals to increase awareness of {targets} as an option for R users to harness the power of cluster computing.

Building on discussions first started at the German RSE conference in 2023 (de-RSE23), a recent pre-print, Foundational Competencies and Responsibilities of a Research Software Engineer, identifies a set of core competencies required for RSEng and describes possible pathways of development and specialisation for RSEs. It is the first output of a group with broad interests in the teaching and learning of RSEng skills.

With continuing growth in RSE communities around the world, and sustained global demand for RSEng skills, US-RSE24 presents an opportunity to align international efforts towards

* training the next generation of RSEs * providing high-quality professional development opportunities to those already following the career path * empowering RSE Leaders to further advocate for the Research Software Engineering needs and expertise in their teams, institutions, and communities.

Therefore, we want to give an overview of what the group has been working on so far, discuss the aims of our future work, and invite members of the international RSE community to contribute and provide feedback. We particularly encourage members of regional groups focused on RSEng training and skills to attend and share their perspectives.

The University Corporation for Atmospheric Research (UCAR) Software Engineering Assembly (SEA) was formed in 2005 to provide an informal meeting space, instructional content including tutorial series and seminars, and an evolving compilation of best practices for those staff and collaborators at the organization interested in software engineering. Over time, the SEA membership grew, events were regularly conducted, and in 2012 a yearly conference was established with the focus being scientific software engineering.

Communities of practice like the SEA benefit from motivated members actively cultivating the organization and adding some formal structure and legitimacy. Unfortunately, staff turnover and budget (and thus time) constraints led to a gradual atrophying of SEA activity. While our yearly conference - eventually titled the Improving Scientific Software Conference - remained a robust fixture throughout, other offerings tapered to a nadir during the COVID pandemic. Soon after, the longtime chair of the SEA left the organization, and it appeared that it may sunset entirely.

As the SEA was shrinking, the US-RSE became a growing presence at National Labs. When a new committee did eventually take over SEA governance, this presented an opportunity to align our Assembly with the principles and best practices being developed by the research software engineering community.

This talk will describe our Assembly in its current state, the changes that have been made to modernize it thus far, and our goals for the future. Much of the focus has and will be on building a community of practice through events like open discussions on best practices, but some of the more mundane challenges will also be described - such as revitalizing our web presence and ensuring collaboration instead of competition with peer groups within and outside of our organization. We will also give a brief overview of our Improving Scientific Software Conference, our efforts to modernize it (i.e. using Jupyter Notebooks for proceedings), and how we use the Conference to drive interest in the SEA and vice versa. Finally, we will discuss some lessons learned about sustaining a long-running interest group, and mention some of the things we wish we had known at the start of this revitalization effort.

Honeycomb is a template repository that standardizes best practices for building jsPsych-based tasks. It offers continuous deployment for use in research settings, at home, and on the web. The project's main aim is to improve the ability of psychiatry researchers to build, deploy, maintain, reproduce, and share their own psychophysiological tasks (“behavioral experiments”).

Behavioral experiments are a useful tool for studying human behavior driven by mental processes such as cognitive control, reward evaluation, and learning. Neural mechanisms during behavioral tasks are often studied in the lab via simultaneous electrophysiological recordings. Uniquely registered participants may be asked to concurrently complete the task at home where connecting such specialized equipment is not feasible. Furthermore, online platforms such as Amazon Mechanical Turk (MTurk) and Prolific enable deployment of tasks to large populations simultaneously and at repeated intervals. Online distribution methods enable far more participation than what labs can handle in a reasonable amount of time.

Honeycomb addresses the key challenge of using a single code base to deliver a task in each of these environments. The benefits of Honeycomb were first seen in an ongoing study of deep brain stimulation for obsessive compulsive disorder. Subsequent projects have included research on decision making processes for people with obsessive compulsive disorder as well as gameplay style differences between control, obsessive compulsive disorder, and attention-deficit/hyperactivity disorder patients. The CCV additionally maintains a curated public library, termed BeeHive, of ready-to-use tasks.

The project is open-source and directly supported by the Center for Computation and Visualization at Brown University. It has been in active development since August of 2019 (currently version 3.4) with version 4 and 5 releases roadmapped. An ultimate goal of the project is to publish it as its own library to the node package manager (npm) registry.

Reading computer program code and documentation written by others is, we are told, one of the best ways to learn the art of writing readable, intelligible and maintainable code and documentation. This talk introduces the concept of software resurrection as a tool for learning from program code and documentation that are remote in time (e.g. 20 years old) and space (e.g. unfamiliar algorithms and tools). The software resurrection exercise requires a motivated learner to compile and test a historical software release version of a well maintained and widely adopted open source software on a modern hardware and software platform. The learner develops fixes for the issues encountered during compilation and testing of the software on a modern platform that could not have been foreseen at the time of its release. The exercise concludes by writing a critique which provides an opportunity to critically reflect on the experience of maintaining the historical software. An illustrative example of this exercise pursued on a version of the SQLite database engine released 20 years ago shows that software engineering principles (or, programming pearls) emerge during the reflective learning cycle of the software resurrection exercise.

The concept of software resurrection is similar to the "Learning by doing" methodology which is based on the experiential learning theory. Engaging with program code and documentation that are remote in time or space helps learners actively explore the experience of software maintenance. These experiences reveal the factors that contribute to readability, intelligibility and maintainability of program code and documentation.

Prerequisites This talk is aimed at students, researchers and professionals who develop, support and maintain computer software. The talk includes an illustrative example based on a software written in the C programming language and therefore a basic understanding of the C programming language will be useful. Since, the concept of software resurrection applies in general to the field of software engineering, the attendees will still be able to understand the key ideas even if they do not have a background in the C programming language.

Expected Outcomes The attendees will learn about a novel method for teaching and learning software engineering principles by engaging with existing software code and documentation. The concepts described in this talk will allow the attendees to view the impact of existing software development and documentation practices from the perspective of a software maintainer.

Creating population estimates for the entire globe using machine learning is a challenging task. One challenge is gathering and combining vast amounts of global GIS data at high resolutions. Another challenge is processing the amount of complex GIS data required to make population estimation possible in a reasonable amount of time. Speed is important in research because of the need to iterate and evaluate the data for validity and accuracy. In this work, we present the challenges of taking research code from a Jupyter notebook and creating a cloud optimized solution using infrastructure as code (IaC) to deploy a cluster in OpenStack. We show the code modifications for speed performance improvements, comparisons of running machine learning on multithreaded CPUs versus GPUs, and the architecture design for running a global dataset on a Kubernetes cluster.

As both hardware and software becomes more prevalent in research computing, the user base of these systems has broadened considerably. Novice users from many backgrounds and at many stages of their careers are looking to make effective use of these resources, while retaining focus on their domain work.

Formal curricula for students of those domains may not have room for computational training. Non-student researchers face challenges making time to acquire the relevant expertise. Moreover, the parallelism of HPC systems adds complexity to an already demanding software development task. Consequently, large subsets of researchers have access to HPC resources without the technical skills to use them effectively.

These challenges are familiar to the Research Software Engineering community.

HPC Carpentry provides training solutions complementary to the research software engineering role, supporting effective use of novel, shared computing resources.

HPC Carpentry workshops are modeled after those of The Carpentries1 and take place over one or two days, providing a hands-on mode of instruction, where learners type along with instructors to acquire the basic skills necessary to get started on HPC systems. Learners are not expected to come away as experts, but instead with the "muscle memory" of how basic operations work on HPC systems, with a mental model of the shared HPC system and its resources, and with enough vocabulary to make self-directed training more accessible and effective.

This talk will describe the current state of the HPC Carpentry project, our strategic development plan for the workshops, current challenges, and the lesson content that we develop, teach, host, and cite.

Science gateways have emerged as a popular and powerful interface to computational resources for researchers. Most if not all of these science gateways now rely on container technology to improve portability and scalability while simplifying maintenance. However, this can lead to problems where the container image size can grow as more domain-specific packages and libraries are needed for the tools deployed on these containers. This is particularly relevant in the case of JupyterHub-based gateways, where the Python virtual environments or Conda environments underlying the Jupyter kernels can often grow in size and number.

For example, a JupyterHub gateway that I work on as part of the NSF-funded I-GUIDE institute required the installation of a large number of geospatial libraries, leading to the Jupyter notebook container images approaching several gigabytes in size. To combat this, our team decided to integrate the CernVM File System (also known as CVMFS) with Kubernetes, which acts as a software distribution service and can provide software packages to the containers from a separate server.

As a first step in this integration, we had to deploy our own CVMFS server on a separate virtual machine and load it with the packages that were needed for distribution. CVMFS itself has two main servers, which are the stratum 0 and the stratum 1. The stratum 0 is the main server for configuration and packages, while the stratum 1 acts as a mirror of the stratum 0. Following the deployment of the stratum 0 server, we were able to install the necessary Conda environments and modules. The I-GUIDE JupyterHub platform is deployed on a Kubernetes cluster using the Zero2JupyterHub recipes. In order to integrate the CVMFS server with this Kubernetes cluster, we installed a CSI (container storage interface) driver provided by the developers of CVMFS to connect the stratum server to the Kubernetes cluster. This then enabled us to create the necessary storage class and persistent volumes in Kubernetes that could then be mounted into the Jupyter notebook containers to serve the necessary Conda environments. Ultimately, this resulted in containers having their sizes reduced significantly, from multiple gigabytes to only half of a single gigabyte!

In conclusion, science gateways are incredibly impactful for researchers, but can take more effort to maintain than most realize. Containers and virtual machines make this easier, yet can contribute to their own issues by becoming bloated over time. These size issues can be resolved with CVMFS, making the container sizes around three to four times smaller compared to before.

In this talk I will be presenting our deployment design as well as our experience through this deployment process and lessons learned.

To tackle the problem of sustainably training and developing a workforce, SDSC has experimented over the past decade with various strategies to shape a seasonal internship program that has met and exceeded its original goal of research software developer workforce training. Using modern agile frameworks, a novel summer training program, and minimal resources, SDSC has supported over 200 interns over the past four years who have learned about and supported research software development. Come hear the internship program founders, Ryan Nakashima and Jenny Nguyen, share both unsuccessful and successful strategies used to build the SDSC software development internship program and connect with them for follow-up discussions.

Reproducibility of research that is dependent on software and data is a persistent and ongoing problem. In this talk, I invite the emerging research software engineering community to leverage the half century of specific knowledge offered by cybersecurity professionals. I demonstrate that the practical needs of cybersecurity engineering and research software engineering overlap significantly. In addition to enabling reliable reproducibility of research, I illustrate how well-understood cybersecurity tools enable independent verification of research integrity and increase the public trust in open science.

Johns Hopkins Applied Physics Laboratory (APL) is the U.S.’s largest university-affiliated research center, home to over 9000 staff dedicated to making “Critical Contributions to Critical Challenges” for our various federal agencies. APL’s Space Exploration Sector alone has designed, built, and operated over 70 spacecraft missions; developed hundreds of specialized instruments for yet more missions, and collectively has visited every planet in the solar system. Within the sector’s Space Science Research branch resides one of the largest RSE organizations we are currently aware of: our very own Space Analysis & Applications group – a team of 60+ research software engineers that directly support our missions and the scientific research enabled by them. Our talk will explore the history, functions, and operation of this group as a means to examine a mature RSE organization and to share our insights and experience with those US RSE colleagues developing and managing their own. Individual topics covered will include organizational structure, team composition, funding sources, work discovery, intake, and some brief visuals or demonstrations of the group’s software products and the research we have enabled.

Student opportunities are important for diversifying RSE and getting students hooked on research software engineering. Engaging with undergraduate and graduate students interested in scientific computing is both rewarding and beneficial to the RSE community, given there is not yet a clear academic curriculum nor career path to becoming a research software engineer. This talk will cover the macro aspects of RSE internships through the lens of SIParCS, a successful, long running internship program, and deep-dive into the micro aspects of working with student RSEs by sharing experiences at DART, an open source project at the intersection of science and software.

We’ll give a lightning overview of the 17 years of SIParCS, the Summer Internships in Parallel Computational Science at the NSF National Center for Atmospheric Research, including background, history and motivations. SIParCS provides opportunities for undergraduate and graduate students to gain hands-on experience in computational science, particularly focusing on high-performance computing, scientific computing, and data analysis. The program’s goal is to develop and diversify the next generation of computational scientists and engineers by offering holistic mentorship, professional development, and the chance to work on cutting-edge projects alongside experienced researchers.

DART, the Data Assimilation Testbed, has been fortunate to have various interns though SIParCS as well as part-time student RSE employees working year-round. We'll share our specific experiences, challenges and triumphs, working with student RSEs. What worked, what didn’t, and how summer internship RSEs differ from year-round part-time student RSEs. Everyone is different, what motivates and incentivizes people varies from person to person, and can change over time. People’s time has value, we want people spending that time on the most interesting and impactful thing they can be working on. Working with students requires a balance between getting quality work from them, and the students finding benefit in this work and progressing their career. We'll conclude with thoughts on future student interactions and possible community collaborations.

AutoRA – the Automated Research Assistant – is a growing collection of python packages for running fully automated psychological experiments online. It allows the user to automate the specification of experimental conditions, data collection, and theory derivation, cycling back to specifying new experimental conditions.

One primary goal of the PI was to allow unaffiliated developers to contribute new methods for generating experimental conditions, and new regression techniques for theory derivation. But taking the naïve monorepo approach would be too costly: either 1) testing all of the contributions for every change to the code – which would be too costly as some of the contributions train neural networks as part of their execution and require hours to run; or 2) would require configuring the CI so that only relevant parts of the code were tested for each pull request – which would mean a high maintenance burden. Furthermore, since this work is about applying ML to experiments run on people, it’s vital that every submission be ethically vetted before it can be part of the official release.

Thus, one primary goal of our work was to allow for decentralized extensibility, so that contributors unaffiliated with the core team could easily innovate and share new functionality without leading to a high centralized maintenance and testing costs. Another was to ensure that contributions could be vetted and included easily.

We’ll present how RSEs helped to establish a common interface based around a simple functional paradigm, with namespace packages spread across multiple repositories so that each contributor could be owner of and responsible for their own work, and how their contributions are integrated into the main package. We’ll also look at how we enable contributors to start their work quickly using templates.

Ecological Momentary Assessments (EMA) are often used in the field of Psychology to deliver multiple data collection instruments to study participants at several time points in the day. As these are often used to study participants current (at the time of receiving the notification) mood, activity or immediate company, it is important that they do not anticipate the notification arrival at fixed times throughout the day. However most traditional electronic data capture (EDC) systems require participant notification schedules to be pre-determined with little to no room for sending random events individualized to participants environment (wake up time, etc.). Given this problem, our team of RSE’s has developed a cloud first random EMA notification system, that can serve 1000’s of participants multiple random EMA push notifications throughout the day. The system is capable of tracking user wake and sleep times, adapt to weekend or weekday modes and configurable to work with different randomization logic and anchors (points around which to randomize). During development the team prioritized the use of proven architectural building blocks to maximize uptime, reduce cost and speed up development. More importantly the system was built to evolve hand in hand with the changing requirements from research stakeholders. In this talk we will go over how we build this system using Amazon Web Services Event Schedulers, low cost serverless components, lessons learned from testing across various time zones, and compliance monitoring. We will look at the initial design choices, their limitations and how they had to be adapted. Finally, we will go over how the solution integrates with existing commercial EDC products such as Care Evolution’s MyDataHelps offering. The solution is currently open sourced and can be adapted by RSE teams for their own stakeholder’s studies.

Fortran still occupies a significant fraction of the workloads at scientific computing centers, and many projects are still under active development by research software engineers (RSEs). In this talk I will describe how the National Energy Research Scientific Computing Center (NERSC) provides a holistic support structure for our users, and especially RSEs, that take advantage of Fortran.

Computational notebooks provide a dynamic and interactive approach to scientific communication and significantly enhance the reproducibility of scientific findings by seamlessly integrating code, data, images, and narrative texts. While notebooks are increasing in popularity among researchers, the traditional academic publishing paradigm often requires authors to extract elements from their notebooks into another format, losing the interactive and integrative benefits of notebook format.

In response to this evolving landscape, the Software Engineering Assembly (SEA) Improving Scientific Software Conference (ISS) has built a framework for paper submissions that accommodates Jupyter Notebooks and Markdown files. This approach is designed to enhance the transparency and accessibility of research, enabling authors to submit and share their work in a more dynamic and interactive format.

In this presentation, we will talk about this deployed framework and how it can be easily adopted for future conferences and journals. This framework is built on top of open-source tools such as Jupyter Notebooks, JupyterBook, and Binder. In this framework, we utilized GitHub Workflows for the automated build and deployment of submissions into a cohesive JupyterBook format. The presentation will cover the challenges and solutions encountered in implementing this framework, aiming for its application in future conferences. Additionally, we will share insights and experiences from developing and deploying this ecosystem, emphasizing how it can fundamentally change the way research is published, shared, and assessed within the open science and reproducibility paradigm.

Research software engineers and research data curators face similar challenges in their efforts to support truly reproducible science. This talk anticipates a future in which the research software engineering and research data curation communities identify ways to align their respective efforts in promoting best practices. We present a project to develop specialized training for curating research software as one such opportunity.

There is great interest on the part of both scientific communities and funding agencies to see that science research is reproducible, and that the products of research—both data and code—are “Findable, Accessible, Interoperable, and Reproducible” (FAIR) now and into the future. To enhance reproducibility and FAIRness, funding agencies typically require that grant applicants file so-called Data Management Plans (DMPs); journals increasingly require that authors deposit code and data in a certified repository and link those artifacts to their publications; and community and institutional repositories work to ensure quality of deposits by employing curators to examine, approve, and sometimes validate and even improve deposited code and data. The curation step is critical in maintaining viable research data lifecycles, and requires that curation workflows be implemented and that curation staff be funded and trained.

The Data Curation Network (DCN) is a membership organization of institutional and non-profit data repositories whose vision is to advance open research by making data more ethical, reusable, and understandable. Its mission is to empower researchers to publish high quality data in an ethical and FAIR way, collaboratively advance the art and science of data curation by creating, adopting, and openly sharing best practices, and supporting thoughtful, innovative, and inclusive data curation training and professional development opportunities.

Last year the DCN, with project leadership from Duke University, obtained funding from the Institute of Museum and Library Services (IMLS) to create course curricula to train new curators in addressing curation of data types that require specialized knowledge and often warrant specific types of treatment and analysis (IMLS Award no. RE-252343-OLS-22). These specialized data types include geospatial data, scientific images, simulations and models, and, last but not least, code. The presenters are members of the cohort that developed specialized training for curating code. In our talk, we will give an overview of the topics covered in the pilot workshop, which include introductory topics such as dependency management, licensing, and documentation, as well as more advanced topics such as containerization and nondeterminism. We will also describe how these topics apply to research software developers wishing to create more reproducible and sustainable code.

At the National Center for Supercomputing Applications (NCSA), our team of research software UIX (User Interface and User Experience) designers is dedicated to enhancing academic research applications through innovative design thinking and user-centric methodologies. Since our expansion in 2021, we have successfully collaborated on over 30 applications across diverse scientific domains, underscoring the growing demand for design as part of the research software development process.

Our presentation will delve into the principles of design and design thinking, highlighting the distinction between UI and UX design (and why both are important), and describe our role as user advocates. We will outline our comprehensive design process, which includes discovery, ideation, and implementation phases, and the highly iterative and user-engaged form this process takes. We will give an overview of design workflow tasks such as user research, wireframing, rapid prototyping, high-fidelity design production and usability testing, all facilitated by tools like Figma for collaboration with stakeholders, streamlined handoff and communication with developers.

We will also showcase an example project to illustrate our project lifecycle, from initial requirements gathering to design audits and continuous process improvement. Our collaborative, cross-functional teams, which integrate designers, developers, and research scientists, are pivotal in producing high-quality, sustainable software. By prioritizing user experience, we ensure that our applications not only meet the technical needs of scientists and researchers but also provide delightful and efficient user interactions, fostering faster onboarding and greater adoption. Join us to explore how thoughtful design and interdisciplinary collaboration can lead to more effective and impactful research software solutions.

In comparison to traditional software engineers, research software engineers (RSEs) often come to software engineering from scientific domains and may lack formal training. As the field continues to develop, direct educational pathways and formal training are likely to expand. Questions about best practices for training students and early career RSEs must be answered to ensure new RSEs are able to contribute high quality code. How should we train students with minimal experience to work on real-world projects? How do we bridge the gap between classroom learning and the expectations of writing reproducible code? What assumptions can be made about what students can (and should) learn themselves, and what do they need to be explicitly taught? The University of Chicago’s Data Science Institute has been able to wrestle with these questions over the past three years by engaging with students via its experiential, project-based, Data Science Clinic course.

The Data Science Clinic is a useful setting for asking these questions and testing related hypotheses. The clinic works with 3 cohorts of students each year and typically has more than 50 students per cohort. This provides a great environment for iterating on best practices. Preparing students who are interested in research software engineering careers is similar to training early-career RSEs who are coming from backgrounds with limited computer science education. Students in the clinic come from diverse backgrounds, with both master’s and undergraduate students well-represented, but most have one university computer science course. This level of formal computer science background is similar to many RSEs coming from non-computer science backgrounds. Additionally, code reproducibility and code quality are often lower priorities on both student projects and RSE projects.

The Data Science Clinic has led to the important conclusion that relying on assumptions about student background knowledge leads to negative outcomes. When using experiential learning or project-based classes, it's easy to have a biased view of student understanding since only the most confident and engaged students are likely to volunteer to participate. These advanced students can shoulder most of the load of a project and make quite a bit of progress, while allowing the students who would gain the most from direct instruction to coast undetected. In reality, many students lack robust mental models of computer operation, familiarity with essential terms and concepts, and an appreciation for software engineering best practices. Overcoming these challenges requires significant investment from experienced mentors.

The purpose of this talk is to share these lessons and conclusions, discuss why these problems are so difficult, and to consider next steps.

Researchers support reproducibility and rigorous science by sharing and reviewing research artifacts—the documentation and code necessary to replicate a computational study (Hermann et al., 2020, Papadopoulos et al. 2021). Creating and sharing quality research artifacts and conducting reviews for conferences and journals are considered to be time consuming and poorly rewarded activities (Balenson et al., 2021; Collberg & Proebsting, 2016; Levin & Leonelli, 2017). To simplify these scholarly tasks, we studied the work of artifact evaluation for a recent ACM conference. We analyzed artifact READMEs and reviewers’ comments, reviews, and responses to surveys. Through this work, we recognized common issues reviewers faced and the features of high quality artifacts. To lessen the time and difficulty of artifact creation and evaluation, we suggest ways to improve artifact documentation and identify opportunities for research infrastructure to better meet authors' and reviewers' needs. By applying the knowledge gleaned through our study, we hope to improve the usability of research infrastructure and, consequently, the reproducibility of research artifacts.

Most research projects today involve the development of some research software. Therefore, it has become more important than ever to make research software reusable to enhance transparency, prevent duplicate efforts, and ultimately increase the pace of discoveries. The Findable, Accessible, Interoperable, and Reusable (FAIR) Principles for Research Software (or FAIR4RS Principles) provide a general framework for achieving that.1 Just like the original FAIR Principles2, the FAIR4RS Principles are as designed to be aspirational and do not provide actionable instructions. To make their implementation easy, we established the FAIR Biomedical Research Software (FAIR-BioRS) guidelines, which are minimal, actionable, and step-by-step guidelines that biomedical researchers can follow to make their research software compliant with the FAIR4RS Principles.3,4 While they are designed to be easy to follow, we learned that the FAIR-BioRS guidelines can still be time-consuming to implement, especially for researchers without formal software development training. They are also prone to user errors as they require several actions with each new version release of a software.

To address this challenge, we are developing codefair, a free and open-source GitHub app that acts as a personal assistant for making research software FAIR in line with the FAIR-BioRS guidelines.5,6 The objective of codefair is to minimize developers’ time and effort in making their software FAIR so they can focus on the primary goals of their software. To use codefair, developers only need to install it from the GitHub marketplace. By leveraging tools such as Probot and GitHub API, codefair monitors activities on the software repository and communicates with the developers via a GitHub issue “dashboard” that lists issues related to FAIR-compliance (updated with each new commit). For each issue, there is a link that takes the developer to the codefair user interface (built with Nuxt, Naive UI and Tailwind) where they can better understand the issue, address it through an intuitive interface, and submit a pull request automatically with necessary changes to address the related issue. Currently, codefair is in the early stages of development and helps with including essential metadata elements such as a license file, a CITATION.cff metadata file, and a codemeta.json metadata file. Additional features are being added to provide support for complying with language-specific standards and best coding practices, archiving on Zenodo and Software Heritage, registering on, and much more to cover all the requirements for making software FAIR.

In this talk, we will highlight the current features of codefair, discuss upcoming features, explain how the community can benefit from it, and also contribute to it. We believe codefair is an essential and impactful tool for enabling software curation at scale and turning FAIR software into reality. The application of codefair is not limited to just making biomedical research software FAIR as it can be extended to other fields and also provide support for software management aspects outside of the FAIR Principles, such as software quality and security. We believe this work is fully aligned with the US-RSE’24 topic of “Software engineering approaches supporting research”. The conference participants will benefit greatly from this talk as they will learn about a tool that can enhance their software development practices. We will similarly benefit as we are looking for community awareness and contribution in the development of codefair, which is not currently supported through any funding but is the result of the authors aim to reduce the burden of making software FAIR on fellow developers.

Small to medium-sized research projects require increasingly sophisticated software stacks as the demand continues to grow for more high performance computing (HPC) resources and Kubernetes clusters for web-based applications. Frequently these smaller projects do not have funding for dedicated DevOps engineers, and require their RSEs to perform the task of dedicated DevOps engineers. The effort required to manually provision each layer of this stack, from cluster node operating system configuration to application deployment, especially given the scarcity of RSEs, will become infeasible without force multiplying innovations. Often these tasks are done early in the project, and need to be re-learned for the next project. Additionally, the wealth of knowledge from the DevOps engineer, securing these systems and upgrading them during the project will fall on the RSE, reducing the often scarce time to develop the application even more.

We present the approach developed at NCSA to address this problem: a GitOps-based method of bootstrapping virtual computing resources and Kubernetes clusters for composable deployment of collaborative tools and services. Leveraging industry-standard software solutions we provide a free and open source foundation upon which open science can flourish, with an emphasis on decentralized applications and protocols where possible. Leveraging this infrastructure, we can add new layers on this called DecentCI, allowing an RSE to quickly get a complex system up and running, allowing for shared access to data, sharing ideas in forums, private messaging, websites, etc.

Building on the knowledge gained from many projects, we have created a set of recipes allowing for a new project to be up and running in under 30 minutes. For example in the case of kubernetes, nodes will be created and configured, and clusters will be initialized with ingress controllers, secret management, storage classes etc (all of this is configurable on a per cluster basis). The clusters deployed can easily be upgraded by applying newer centrally managed modules in these clusters. New functionality added centrally can be added over time to the clusters.

During this talk we will discuss what tools are used and are centrally managed, and what tools are installed in each cluster. We will describe how an RSE can add their applications to the system and use well understood GIT workflows to deploy new applications, and work with other RSE on the project. The end goal is a system that will be decentralized and empower the RSE to get new applications to the scientists faster and securely to help with their research.

Domain research, particularly in the life sciences, has become increasingly complex due to the diversity of types and amounts of data concomitantly with the associated analytical methods and software. Simultaneously, researchers must consider the trustworthiness of the software tools they use with the highest regard. As with any new physical laboratory technique, researchers should test and assess any software they use in the context of their planned research objectives.

As examples, bioinformatics software developers and contributors to community platforms that host a variety of domain-specific tools, such as KBase (the DOE Systems Biology Knowledgebase) and Galaxy, should design their tools with consideration for how users can assess and validate the correctness of their applications before opening their applications up to the community.

More attention should be placed on ensuring that computational tools offer robust platforms for comparing experimental results and data across diverse studies. Many domain tools suffer from inadequate documentation, limited extensibility, and varying degrees of accuracy in data representation. This lack of standardization in biological research, in particular, diminishes the potential for groundbreaking insights and discoveries while also complicating domain scientists' ability to experiment, compare findings, and confidently trust results across different studies.

Through several examples of tools in the biology domain, we demonstrate the issues that can arise in these types of community-built domain-specific applications. Despite their open-source nature, we note issues related to transparency and accessibility resulting in unexpected behaviors that required direct engagement with developers to resolve. This experience underscores the importance of deeper openness and clarity in scientific software to ensure robustness and reliability in computational analyses.

Finally, we share several lessons learned that extend to research software in general and discuss suggestions for the community.

Citing software in research is crucially important for many reasons, from reproducibility, to bolstering the career of the research software engineers who worked on the code, to understanding the provenance of ideas. With the inclusion of DOIs on Zenodo, CRAN, and through integrations with GitHub, it is easier than ever to cite software as a first order research object. However, there are no standards on what software should be cited in a paper, and authors often fail to cite software, or only cite well-known, charismatic, user-facing packages. There are few attempts at citing dependencies, in particular.

Here, we took citing software as an ethical research goal to its logical but unfeasible conclusion, citing all dependencies for software used in a research package, not only the top-level package itself. We present our open source tool we used for finding DOIs and citation.cff files within dependencies, and talk about the implications of large amounts of citations within paper that uses research software. In particular, we encourage the adoption of software bills of materials (SBOMs) for citing software, especially research software.

The Department of Energy Systems Biology Knowledgebase (KBase) is a community-driven research platform designed for discovery, knowledge creation, and sharing. KBase is a free, open access platform that integrates data and analysis tools into reproducible Narratives, provides access to scalable DOE computing infrastructure, and enables collaborative research and publishing of findable, accessible, interoperable, and reusable (FAIR) analyses.

The KBase Narrative - the primary user interface for the KBase platform - is built on top of the Jupyter Notebook application. This interface enables platform users to access an array of wrapped tools (apps) that are used throughout the computational biology community, many of which produce their own data visualizations, which are also made available in Narratives. Within a Narrative, users can perform analyses, display interactive results, and record interpretations. In contrast to analysis workflows commonly used in bioinformatics, where researchers will run individual tools (potentially hosted in different locations) sequentially, KBase allows users to build custom analytical pathways with notebooks, where all the tools and data are contained in a single place to enable reproducibility of analysis and provide data provenance.

The platform is built around creating a welcoming user experience for users with a broad range of biology, bioinformatics, and computational expertise. To accomplish this, the KBase Narrative uses a GUI to generate code that runs analysis tools on DOE computing infrastructure. The output of these apps can be made more comprehensible for users in the form of interactive data visualizations, reports, or a simple list of data objects created by a tool. At the same time, the Jupyter Notebook allows more advanced users to supplement their app runs with custom Python code.

Reproducibility is one of the major concerns of the system: Narratives and apps are versioned, and all data in the system receives a persistent unique ID, so analyses can be rerun to ensure that the same results are achieved. There is also an emphasis on tracking data through KBase and recording the transformations it undergoes through the provenance system; every data object has an immutable record of how it was produced, and this provenance chain can be followed forwards or backwards to view the original inputs or see the eventual output of a set of analyses.

KBase also strives to provide FAIR data access. Recent work has focused on assigning credit to users who do analyses and publish data generated through KBase. In an early step toward creating a publishable Narrative, these documents can be exported to a “static” format providing a frozen snapshot detailing the analysis steps and data. Furthermore, DOIs (digital object identifiers) can be assigned to the static Narrative. These features can be used for reproducibility in a publication and ensure being credited for work. Markdown cells provide a mechanism for users to extend the documentation automatically created by data provenance and add additional context to explain the background of the data imported into KBase.

Together, these features make the KBase Narrative an application where analyses, results, and interpretations can be viewed and shared in a single interactive document. There are still many challenges ahead for KBase as it steps toward making publishable Narratives. These include updating the user experience as the platform expands and caters to a growing user-base, and challenges with adapting to recent advances in large language models and their utility in biological data science. We welcome and encourage community feedback and discussion.

The Research Engineering Group at The Alan Turing Institute started our RSE journey 8 years ago as a new team at a new institute. Founded in 2016 and without the usual constraints and advantages of RSE groups based in universities, the team has had to find its own path in a rapidly evolving Institute and field.

Over this time the Turing has grown and evolved considerably as a national institute, adding AI to its initial data science remit and refining its science and innovation agenda from a broad programme-based approach to a more focussed challenge-led one. As the Institute has grown and evolved over the years, so has the Research Engineering Group, growing from 4 to 45 and going through several iterations of how we structure ourselves and operate as a team in order to support the Institute's research.

As the team has evolved, we've expanded our range of research engineering roles to include those more focussed on data and computing, and we've built a sustainable career pathway for these roles within the Institute. Over the years we have refined our approach to recruitment, professional development and career progression to attract a diverse range of talent and support them in their professional journey, with a clear pathway from our Junior level training role to our Principal level team leadership role.

This journey has been guided by our principles: transparency in leadership and decisions; diversity of talent, people and experience; supporting individuals in their career journey; and focus on our role as expert collaborators. As we've progressed along this journey ourselves, we've also looked to support others in doing so - both within the Turing as it has established other teams of related research infrastructure professionals, and across the wider RSE community as other organisations have looked to establish or scale their own similar teams.

In this talk, we will share our journey and the lessons we've learned along the way. We hope that our story will be of interest both to those in leadership roles looking to establish or grow RSE teams at their own organisations and to team members within existing teams thinking about how they organise themselves, their work and their culture as they grow and evolve as a team.

We present RainFlow as a case study in how collaborations between bench scientists and software developers can deliver impactful solutions to the reproducibility crisis in scientific Research.

RainFlow is a MacOS desktop application developed for Reproducible Analysis and Integration of Flow-cytometry data. Flow cytometers are routinely used to collect rich biological data in clinical settings and research laboratories alike. Modern flow cytometers are extremely sensitive instruments that can measure the expression of 25-40 different proteins in millions of single cells in a matter of minutes. However, deriving actionable insights from this high-dimensional, high-volume data is hindered by the lack of reproducible analysis techniques.

Lack of reproducibility affects two aspects of the research data pipeline. First, technical noise in the sensitive instrumentation can confound accurate protein signal measurement during data collection. Second, variations in the analytical choices made by individual researchers can confound reproducibility during data analysis.

We developed RainFlow in an effort to automate the process of data cleaning and analysis for flow cytometry experiments. First, to decouple technical noise from true biological variation, we developed custom machine learning pipelines which reproducibly correct technical noise in the signal, as well as produce a quality score for each sample. The quality score can then be used to select for high-quality samples before integrating several batches of data together for downstream analysis. Second, to aid researchers in making good analytical choices and recording every analytical choice, we packaged the algorithms into a user-friendly MacOS desktop application called RainFlow.

RainFlow was specifically designed to be accessible to researchers with little to no coding ability i.e. researchers who have “bench skills” for experimental data collection, but not necessarily computational data analysis skills. RainFlow takes the researcher step-by-step to transform raw flow-cytometry data into cleaned, batch-normalized, quality-controlled data ready for integration. In addition, it automatically records every data decision taken by the researcher during the analysis process and exports the parameters for easy sharing. At every step, the researcher is able to visualize the effect of the machine learning algorithmic corrections on the data distribution. Helpful informational guides are provided to explain what each individual step or algorithm does, and how best to select the required analytical parameters. Additionally, we sought to automate parameter selection as much as possible, so that fewer total decisions relied on manual expertise.

This talk will focus on the lessons learnt during the development of RainFlow, which we hope will be more broadly applicable for the research software engineering community. RainFlow was released in the Apple App Store in May 2024 and is available for free download.

Research Software Engineering (RSE) covers a wide spectrum of people who fall somewhere in between domain research science and software engineering. While this makes the community highly inclusive, it can be difficult for some to know if they qualify as an RSE or not and hesitate to engage. In this talk I will share my personal journey from research in software engineering (SER) to RSE.

As someone who was never formally a software engineer in the classic sense but a researcher using software engineering methods in domain science, I never felt like I had any particular identity. Upon first hearing the term “RSE”, I immediately identified. However, over the next two years of slowly engaging with the community - including attending US-RSE’23 - I was still hesitant to see myself as one as my journey and position looked different than most of who I was seeing. It wasn’t until attendance in a recent Dagstuhl seminar that brought together SERs and RSEs that I was able to debate my insecurities first-hand and settle into my identity.

Throughout my experiences I have met a wide array of different types of RSEs. Each coming from their own backgrounds, skill sets, job titles, daily practices, team composition, career priorities, and challenges. Many of these types which I have yet to see well-represented or understood in the community. In my talk I will not only share my personal experiences, but also highlight several examples of diverse types of people who identify as RSEs in order to provide a broader representation to the community and encourage anyone on the edges as I once needed that they do belong.

In the spirit of this year's theme, we will present the past, present, and possible future of RSEs at the National Center for Supercomputing Applications (NCSA), which was founded in 1986 as one of the original five centers in NSF’s Supercomputing Centers program. While High Performance Computing (HPC) was the center's initial emphasis, software was also a key part of NCSA's work from the start, ebbing and flowing over time with a number of broad reaching applications, early insights into areas such as applied AI, and the need to support UIX within research software This led to the growth of RSEs at NCSA to a body of 50 or so RSEs today supporting scores of projects across every scientific domain, identifying common needs, and through that building larger more sustainable software frameworks.

During the early years of the Center, the Software Development Group was formed and it quickly began to produce a number of globally impactful software packages for the community such as NCSA Telnet, Iperf, HDF (Hierarchical Data Format), Habanero and other tools. This work continued and in 1993, NCSA released NCSA Mosaic, the first wide-spread graphical web browser that directly led to Netscape, Internet Explorer, and Spyglass, and NCSA httpd, which led to Apache httpd that in turn drove 90% of web servers at its peak. Though all were built around enabling the use of supercomputers during the growth of the internet, they all also had an enormous broader impact with the general public. During this period, NSF funded NCSA (and likewise our sibling centers as part of the Supercomputing Centers Program) through a "block grant" model that supported the majority of activity at NCSA; funding was ~$35M per year. The funding model was a key to success since it allowed NCSA staff to more freely explore ideas and thus we saw the significant contributions NCSA made. In 1997, that changed as NSF shifted from the block grant model to funding efforts through a set of independently awarded grants for specific work. This resulted in software developers being scattered across smaller groups that supported less traditional users who did not need HPC, leaving the Center, or supporting others software on HPC resources after the Center took a much more HPC support and hardware focus across a chain of large NSF efforts such as TeraGrid, XSEDE, and Blue Waters.

The subsequent evolution of RSEs at NCSA had a very grassroots beginnings when a handful of these small groups developing software decided to join forces: rather than competing with each other in terms of collaborations, grants, and staff, they instead worked together, jointly pursued funding, shared resources, and added greater security to all by having a larger portfolio of collaborators and projects. The initial coalition was founded on a charter that prioritized trust for improved efficiency. It emphasized respecting the PI’s role on projects and refraining from interfering in another group’s project unless invited. The coalition also committed to supporting each other if one group experienced a shortfall in projects. Over time other groups also joined and through that, software had a larger voice enabling it to push for changes such as more efficient hiring practices, support for green cards, standing up more flexible on-prem cloud based resources to support interactive web services and data sharing, adoption of the RSE title as an official campus title, and a recognized career path. Today software exists as a top level directorate within the NCSA organization. This talk will walk through the key changes during the evolution of NCSA's RSE role in a manner that can be leveraged by other RSEs starting new groups.

Have you ever asked, "did this output use the right version of the inputs and code?”, "what software does this program require to execute again?”, "how can I convert this pile-of-scripts to a containerized workflow?”, "how does the ancestry of these two outputs differ?", "who is using my software library?”, or a similar question. All these are example questions that computational provenance can help answer. Computational provenance is the process by which a certain computational artifact was generated, including its inputs (e.g., data, libraries, executables) and the computational provenance of those inputs, for example, the figure on the right.

How to collect computational provenance? We could ask the application developers to emit this kind of data. That approach requires a herculean effort to get all applications to comply. We could require the user to use workflow systems that explicitly declare the inputs and outputs of every node. This approach shifts the compliance burden onto the user. If the user misspecifies the workflow, it may still execute but the provenance would be wrong. The "holy-grail” would be to collect provenance data at the system-level without modifying any application code and not needing superuser privileges or harming performance.

Almost all prior attempts at unprivileged system-level provenance collection used ptrace syscall, which asks the Linux kernel to switch to the tracing process every time the tracee process executes a system call (like how strace binary works). Ptrace-based tracers meet most of the technical requirements but are prohibitively slow. Our recently accepted work (Grayson et al. 2024) observed a geometric mean of traced runtime 1.5x for CARE (Janin et al. 2014), 2.5x for RR (O'Callahan et al. 2017), and 3x for ReproZip (Chirigati et al. 2016) over the untraced runtime. Ptrace involves a context-switch from the tracee process to the tracer process and back every system call, of which there could be thousands per second. Each context switch causes scheduler overhead and clears caches (especially the translation lookaside buffer).

We propose PROBE (Provenance for Replay OBservation Engine), a tool that collects system-level provenance using library interpositioning (Curry 1994), also known as the LD_PRELOAD trick. Library interpositioning happens all within the same virtual address space, which does not involve any additional context switching. We have a working research prototype operating this way.

Another possible reason system-level provenance hasn't caught on is the lack of downstream tooling. We are developing several consumers of provenance including a graphical viewer, an automatic containerization tool, an environment "diff”, and a software citation generator. Furthermore, we export our provenance to Process Run RO Crate format (Leo et al. 2023), so that it can be interoperable with other provenance consumers.

Life sciences research is increasingly requiring researchers to do more difficult tasks, as datasets are becoming larger and more complex and new statistical methods are being advanced. Researchers are consequently needing to create and use more research software tools to manage data and analyses. The majority of researchers in life sciences are lagging behind on the computational skills that they need to stay on the cutting edge of modern research. Most are self-taught, as computational skills are mostly still not being taught in the curriculum. As these are challenging skills to teach oneself, the process is difficult and leads to gaps or inaccuracies in knowledge. Additionally, most researchers do not have the time to become software engineering experts while doing all of their other necessary tasks.

Our group at the University of Arizona is addressing this problem by helping life sciences researchers increase their computational skills through training and collaboration. We are a small group of data scientists and research software engineers embedded in a division that includes departments for agriculture, plant and animal sciences, and environmental science. We devote a substantial amount of our time teaching researchers in the division through a variety of programming. We develop curriculum and hold workshops, workshop series, learning groups, and lab-level trainings on a variety of intermediate topics on good software practices, programming libraries, and version control. We also teach a lot of people one-on-one. We approach teaching in an inclusive and accessible way, and hold the philosophy that almost everyone is capable of learning these skills. We build community among our research community, connecting folks who are isolated in their labs or departments. We have also discussed with many people what paths there are to move into research software engineering as a career.

The second part of our group's approach is to have devoted practitioners collaborate with life sciences researchers. We have advanced skillsets that researchers cannot have themselves because we devote our focus on learning skills and new tools to develop software, advance data management, and improve reproducibility. By teaching researchers when possible and doing the necessary when they cannot, we enable research to be done that could not be otherwise. We are able to do this because we only help with the research of others and do not have our own research program. Everyone in our group also has domain expertise in life sciences fields, and so are more familiar with those fields' challenges, data types, and language. We also have strong communication skills that are needed for excellent collaborative work. Lastly, our collaboration success comes from being a small and flexible group embedded in the domain unit.

There are some challenges to how our group is helping improve scientific software use and creation in life sciences. Our approach is slow and not very scalable because we are working with individuals or small groups. It can also be difficult for us to track our impact, even with gathering a diverse set of data and information about how we are helping others. We are making a substantial difference in the research that our institution's life sciences researchers are able to do.

In 2022 the Princeton Research Software Engineering group, in collaboration with Human Resources, established a multi-level career path job family for Research Software Engineers (RSEs) at Princeton University. Expanding on the existing "Research Software Engineer” and "Senior Research Software Engineer” roles, the new job family creates a structured career ladder that includes roles for six individual contributors (Associate RSE, RSE I, RSE II, Senior RSE, Lead RSE, Principal RSE), two working managers (Lead RSE, Principal RSE), and three leadership positions (Associate Director, Director, and Senior Director). This formally establishes guidelines for defining and differentiating between RSE roles, enabling equitable hiring of RSEs at substantially different experience levels, and establishing promotional pathways for RSEs employed at Princeton University.

In this talk we will describe how the vision for the career path originated, the process behind defining the roles and grades within the career ladder, and how we bridged gaps in technical understanding with administrative partners who were unversed in the role of Research Software Engineers. By minimizing technical jargon to ‘standardize' job descriptions, roles were able to be defined with essential requirements that allowed for proper compensation review and enabled the model to be effective for broader use across campus departments. Finally, we will discuss important lessons learned from the model creation process through its implementation and use as we have successfully hired, reviewed, promoted, and retained RSEs at Princeton.

This talk will look at some containers that are actively used in research computing. It will try and examine how easy they are to dissect and understand as software engineering artifacts. The talk will aim to provoke everyone to think about what good practices and guidance the RSE community might put forward around the use and role of containers.

In many ways containers provide an elegant solution to ensuring reproducibility and portability of codes. Each layer in a container has a unique hash that ensures the full stack of a container is defined unambiguously. Containers can carry with them a deep set of software dependencies that help simplify the challenge of making a portable code. Container repositories and publication services make containers findable and easily cloned for shared use. These are all unambiguously valuable features. However, it is not uncommon to come across containers in active use that contain incredibly expansive layers, so that they in effect encompass entire operating system distributions. Often in such containers it is not clear what is key to an application and what is more of an expedient, included to enhance short-term productivity.

In this talk I will dissect a few large containers and examine what structure is or isn't present and how their formulation sits with regard to traditional software engineering practices. In particular the talk will look through a lens that channels Edsger Dijkstra's motivation for promoting structured programming. Djikstra argued persuasively that programs should not only be functional but they should also strive to be comprehensible and digestible to a "a slow-witted human being with a very small head". It is interesting to look at some containerized software through this lens. For the RSE community especially some very effective and expedient practices in container publication and software distribution can appear to be in tension with other software engineering design ideas around modularity and composability.

The sustainability of scientific software is crucial for advancing research. In the complex world of scientific software development, it is essential to understand the diverse factors that influence sustainability. From the health of the software community to the robustness of engineering practices, each element plays a pivotal role in the long-term viability of a project. This talk, presented by the Center for Open-Source Research Software Stewardship and Advancement (CORSA), focuses on the diverse definitions of sustainability within the scientific software community, its attributes, and the metrics used to measure and enhance it.

The Center for Open-Source Research Software Stewardship and Advancement (CORSA), a new community of practice, aims to address the long-term sustainability of scientific and research software by fostering collaboration among stakeholders, facilitating partnerships with open-source foundations, and educating the community regarding approaches to the stewardship and advancement of open-source software. CORSA is part of a larger initiative funded by the U.S. Department of Energy (DOE) called the Next Generation Scientific Software Technologies (NGSST) project, which includes stakeholders from a broad cross-section of the scientific computing and research software community.

In this talk, we will provide a brief history of the NGSST project and the objectives of DOE to create sustainability pathways for open-source scientific software. We will then discuss the key issues that CORSA plans to address to facilitate scientific software's long-term stewardship. These include the development of metrics and metric models that help projects assess and understand their position in the landscape of sustainability efforts. The talk will draw on information gathered from previous CORSA workshops and existing literature and research into this topic, including the types of sustainability metrics identified as crucial by the community. In particular, we will explore:

● Definitions of Sustainability: Understand the various ways the community defines sustainability in the context of scientific software. ● Attributes of Sustainability: Identify the key attributes that the community values, such as community health, engineering practices, and funding stability. ● Metrics for Measuring Sustainability: Discuss the different metrics and models that help projects assess their sustainability, including how these metrics are developed and applied. ● Capturing and Using Metrics: Explore methods for capturing these metrics and practical strategies for using them to improve sustainability.

Our goal is to create a community of practice where we can collaborate to curate, share, and disseminate information and guidance that will strengthen and sustain the research and scientific software community in the long term.

Combining prose, code, and outputs in a single artifact, computational notebooks are exceptionally valuable instruments in any context where 'research' and 'software' intersect. However, the same features that make notebooks such effective tools also result in unique issues that need to be addressed to ensure they can fulfill their full potential for the wider community of software and research practitioners. One of the biggest challenges with computational notebooks is ensuring that a notebook can be run by people other than its author(s), on computation environments, and/or at different times in the future after its creation, an ability often known as computational reproducibility. While this is a general problem affecting any context where notebooks (or indeed, any computational artifact) are used, these concerns also represented a concrete issue for the computational notebooks submission track at the US-RSE conference, affecting both authors and reviewers alike.

If reviewers are not able to run notebooks for the submissions they're reviewing, they'll likely be unable to evaluate the submission based on its full intended functionality; or, they might try to fix the issues preventing the notebook from being run (missing dependencies, incompatible versions, etc), which results in extra work, frustration, and/or less consistency across multiple reviewers. Even when authors try their best to provide resources for reproducing a valid computational environment in which their submission can be run (such as documentation, packaging/environment metadata, etc), the lack of an automated way to test and a documented standard for the computational environment that will be used limits their ability to validate their resources (and, therefore, estimating how likely it is that their notebooks will run as expected during review) before finalizing their submission. As the program subcommittee responsible for notebooks at US-RSE’24, a vital part of our role has been to streamline the submission and review process to enable both authors and reviewers to concentrate on their respective duties. Additionally, given the added technical complications unique to notebooks, any solution that required unsustainable amounts of extra work on our side would also not be feasible to adopt. This talk will provide an overview of the workflow we developed for US-RSE’24 to help dealing with these issues, as well as lessons learned on what worked well and what didn’t. Built using open-source and/or freely available tools such as repo2docker, GitHub Actions, and Binder, the infrastructure provides a set of automated checks that authors can enable to test the repository before submission, based on the same standardized tools, specifications, and computational environment available to reviewers.

Beyond the specific context of this year’s conference, we structured this talk to be relevant and appealing for a broad audience of RSEs, especially (but not limited to) those interested in computational notebooks, Continuous Integration and Development (CI/CD), and the challenges and tradeoffs associated with designing workflows to be usable at all levels of prior experience.

Leading a collaborative data science or research software engineering (RSE) team in an academic environment can have many challenges including institutional infrastructure, funding, and technical expertise. Even in the most challenging environment, however, leading such a team with inclusive practices can be rewarding for the leader, the team members, and collaborators. We describe nine leadership and management practices that are especially relevant to the dynamics of such teams and an academic environment: ensuring people get credit, making tacit knowledge explicit, establishing clear performance review processes, championing career development, empowering team members to work autonomously, learning from diverse experiences, supporting team members in navigating power dynamics, having difficult conversations, and developing foundational management skills. Active engagement in these areas will help those who lead data science or RSE groups – whether faculty or staff, regardless of title – create and support inclusive teams.

Research software is critical for scientific advancement and, like all software, is susceptible to being targeted by malicious actors and misuse alike, meaning that security is an important quality of research software. Implementing and evaluating security is a complex and ever-evolving process. However, poor research software security could result in the sabotage of data, hardware, or research findings. Proper security implementation requires security knowledge and expertise that many research software stakeholders do not have, resulting in more burden placed on the limited bandwidth of security resources and personnel. Therefore, it is important to identify methods of improving methods of research software security without increasing demand for limited security resources.

To improve the security of research software, we propose introducing security concepts, such as threat modeling, to RSEs so they can be involved in ongoing security efforts. At its root, threat modeling is the process of creating a model of a system or piece of software that is used to theorize both potential attacks and countermeasures to prevent them. Threat modeling is a low-cost, effective way to supplement security efforts, improve security posture, and create cleaner software architecture. While difficult to automate, threat modeling has a host of tools available to make it easier to perform with less required security expertise compared to other security activities. RSEs are prime candidates for threat modeling because of their expertise in both the research domain and in software engineering.

To establish a baseline for how RSEs view security, we replicated a security culture survey originally focused on open-source software. This survey contains questions along six dimensions: Attitude, Behavior, Competency, Governance, Subjective Norms, and Communication. In aggregate, these six dimensions describe the security culture of the RSE community. In addition to measuring the current security culture, we exposed participants to three vignettes depicting security events. In the summary of these vignettes, we explained how threat modeling could have been used to prevent or diagnose malicious or accidental damages before they occurred.

We recruited 96 US and German RSEs for the survey. Our initial results show a generally positive security culture in the RSE community. Respondents perceived all cultural dimensions positively, except for Governance, which represents security expertise, policies, and implementation. The respondents also responded positively to threat modeling. They saw the value of threat modeling and thought it would fit nicely into their existing development processes. Respondents also indicated they would need additional training to effectively threat model and were interested in receiving this training.

We are using the data from the survey and vignettes to create resources that educate RSEs on threat modeling practices that can be incorporated into their existing development processes. We will use this talk to 1) present our findings to the US-RSE community, 2) gather feedback on the security resources we are developing for RSEs, and 3) promote dialogue about involving RSEs in security efforts.

Since the term research software engineer (RSE) was coined over a decade ago, the field has enjoyed rapid growth with the establishment of RSEs groups at labs and universities, professional societies, and conferences and workshops. Today, RSEs worldwide make impactful contributions to science and engineering through excellence in software, but we believe the best is yet to come. RSEs represent an emerging profession, one that continues to develop its identity, values, and practices (Sims 2022). There is a growing body of literature around who RSEs are and the future of the field, with many works written by RSEs themselves. Concurrently, it is also important to consider how RSEs relate to other professions within research organizations. RSEs regularly interact with staff from a diverse range of backgrounds, including domain researchers and engineers, computing facility and IT professionals, data scientists, technical librarians, and managers and HR specialists. When we examine this organizational context, we are led to ask many important questions. How non-RSE allies can best support RSEs? How can we create a supportive ecosystem in which RSEs will thrive? How do we integrate RSEng with allied professions to achieve mutual success? In this talk, we consider the case of RSEs and software engineering researchers (SERs). Both SE academics and practitioners have a common interest in improving the quality of software and its production (Stol and Fitzgerald 2018). While CSE software development has historically received little attention from mainstream software engineering, RSEs have been successful in building bridges between the two worlds. We believe the SE research community should work more closely with RSEs and serve their needs. Based on our experiences, which include three of the authors participating in a recent Dagstuhl workshop on this topic, we discuss (1) SE-related needs that RSEs report having, (2) what SE researchers can do to address those needs, and (3) how to foster productive relationships between RSEs and SERs.

At Sandia National Laboratories, computational modeling and simulation is ubiquitous across the labs’ diverse missions. Computational models—that is, digital representations of physical systems and/or phenomena and their behaviors— are regularly developed and provide empirical justification to critical mission decisions; this spans workflows, scripts, and notebooks that drive simulations as well as the complex software stacks underneath them. As the number and variety of models continues to grow, however, our limited ability to maintain and govern them becomes a bottleneck to further productivity improvements. They are created in a highly manual process, may be created in duplicate, lost because of personnel changes, or deteriorate over time due ever-changing computing environments.

Researchers and engineering analysts often lack the time, resources, and/or skills to build sustainable models and to make them discoverable. RSEs and allied professionals can play an important role in encouraging the adoption of better practices, but to affect enduring change, we must go even further: to realize a culture of sharing, collaboration, and reusability around modeling, we need software and organizational infrastructures that can support that culture.

For these reasons, we are building the Engineering Common Model Framework (ECMF), a platform for computational model sustainment at Sandia. ECMF will enable the automated evaluation of models over time and ensure that models created at Sandia are discoverable and ready to be revisited, extended, and reused. We have demonstrated the capability of virtually air-gapped automated execution in a containerized environment and have a prototype, user-friendly frontend where users can submit models, schedule model executions, and monitor model status. In our talk, we will discuss our current and planned capabilities, review our lessons learned, and discuss the role of RSEs in the present and future of software and data stewardship.

In the field of research software engineering, Large Language Models (LLMs) have emerged as powerful tools for enhancing coding practices. This presentation, "Leveraging LLMs for Effective Coding," delves into the practical applications of LLMs in automating and improving various aspects of the development process. By providing some empirical evidence from our firsthand experience using LLMs for software development, we explore how LLMs can significantly augment a developer's toolkit, making what are often time-consuming tasks more efficient and Reliable.

Automated test generation, for instance, not only speeds up the testing process but also ensures a more comprehensive coverage, leading to robust software products. Similarly, leveraging LLMs for code review can preemptively identify potential issues, optimizing code quality before it reaches human reviewers. Furthermore, the ability of LLMs to generate and update documentation in tandem with code changes addresses one of the most common challenges in software development, maintaining accurate and helpful documentation. Beyond these key areas, the presentation also touches upon additional use cases where LLMs can make a significant impact, including debugging assistance, code implementation, and code refactoring, among others. We discuss effective ways for integrating LLMs into the development workflow, emphasizing the importance of clear communication, context provision, and iterative refinement. Ethical considerations, particularly in addressing potential biases and ensuring responsible use, are also explored.

Join us as we navigate the practicalities of incorporating LLMs into coding practices, aiming to inspire developers to harness these AI tools for more efficient, high-quality software development.

Science gateways provide an easy-to-use computational platform for research and educational purposes, abstracting underlying infrastructure complexities while promoting an intuitive interface. In the last 15 years, quite a few mature science gateway frameworks and Application Programming Interfaces (APIs) have been developed fostering distinct communities and strengths that meet a diverse set of needs. Examples such as HUBzero, Tapis, Galaxy, and OneSciencePlace are well-sustained science gateway frameworks that create production quality gateways that facilitate collaborative workspaces. These gateways enhance the research process by democratizing access to computational resources and supporting users in their exploration of research. Researchers benefit from streamlined access to various resources, such as high-performance computing (HPC) systems, data repositories, and specialized software tools. The shared workspaces enable collaborative projects, facilitating communication and cooperation across different disciplines. Interdisciplinary collaboration is crucial to addressing many grand scientific challenges such as climate modeling, genomics, or materials sciences. The standardized environments these gateways provide promote data sharing and set the stage for the reproducibility of computational experiments, a cornerstone in science. For research software engineers, engagement with science gateways offers numerous advantages. These frameworks provide standardized interfaces and mechanisms to interact with software libraries and tools, streamlining the development process and ensuring compatibility. This reduces development time and complexity, allowing engineers to focus on each community's unique requirements without dealing with low-level technical details. Automated deployment features supported by many gateways further ease the process. Beyond the technical benefits above, engaging within a science gateway framework also means engaging with a larger community of developers and users. This collaborative environment leads to shared knowledge, rapid issue resolution, and the opportunity to participate in joint development efforts. Continuous user feedback from researchers using the tools allows continuous improvement, ensuring the software evolves to meet evolving user needs. From a professional development perspective, active participation in science gateway frameworks exposes engineers to cutting-edge computational methodologies, cloud computing principles, and big data techniques. This both enhances their skills and keeps them up-to-date with the latest technological advancements. Furthermore, experience with science gateways and the relevant tech stacks being used, can open up career opportunities in academia and industry, given the growing demand for expertise in these areas. In summary, science gateway frameworks play a pivotal role in accelerating scientific research, providing enhanced accessibility and collaboration. For research software engineers, these frameworks offer a rich environment for skill development, collaboration, and career advancement. As scientific research increasingly relies on collaborative, data-intensive approaches, the role of science gateways will continue to expand in the research ecosystem.

In the rapidly evolving field of Artificial Intelligence and Machine Learning (AI/ML), the journey from innovative research to scalable, robust deployment is fraught with challenges. This presentation delves into the critical lessons learned from our experiences in navigating this complex transition, offering insights that are vital for researchers and practitioners alike. The initial phase of any AI/ML project is marked by excitement and potential. However, we quickly learned the importance of grounding this enthusiasm with practical considerations, particularly the early and thorough definition of metrics and benchmarks. This foundational step, often overlooked, became the cornerstone of our project's success, enabling us to evaluate research solutions effectively and pivot our strategies as needed.

Another significant hurdle we encountered was the transition from the exploratory and often chaotic environment of Jupyter notebooks to the structured realm of development-ready code. The ability of our research engineers to write modular and reproducible Python code was instrumental in bridging the gap between research findings and development, highlighting the necessity of coding best practices in the research phase.

Deployment presented its own set of challenges, notably the infamous "It works on my computer" syndrome. Our solution was a strategic embrace of Docker images, which not only streamlined our deployment process but also ensured consistency and reliability across different environments. This approach, coupled with a focus on cache optimization, significantly reduced deployment headaches.

Perhaps the most profound lesson was the value of rapid prototyping. Moving swiftly from concept to an end-to-end solution, even if imperfect, provided multiple benefits. It accelerated our learning about the problem space, facilitated iterations based on real-world feedback, and improved stakeholder engagement by providing a tangible product for demonstration. This approach also forced us to make critical decisions about technology investments and workflow design, laying a foundation for continuous improvement.

This talk aims to share these insights and more, exploring strategies that can help bridge the often daunting gap between AI/ML research and impactful deployment. Join us to learn how to navigate this transition effectively, ensuring that your projects are not only innovative but also ready for the real world.