Birds of a Feather

Alternative Routes to Becoming a Research Software Engineer: The Role of Doctorate and Essential Skills

Session leader: Joanne Sheppard

Research Software Engineering (RSE) is a field that requires a diverse range of skills, including software development, data management, and research methods. However, there is no single route to becoming an RSE, and the value of obtaining a doctorate in this field is often debated. This panel brings together experts in the field of RSE to discuss alternative routes to becoming an RSE, the role of obtaining a doctorate, and the essential skills that are necessary for success in this field.

The panel will explore the benefits and drawbacks of different paths to becoming an RSE, including formal education programs, on-the-job training, and self-taught skills. We will also examine the value of obtaining a doctorate in this field, including the advantages and disadvantages of a PhD versus a master’s degree or other formal qualifications.

In addition, the panel will discuss the skills that are most important for an RSE, such as programming languages, software development practices, project management, and communication. We will consider how these skills can be developed and measured, and how they can be used to evaluate and recruit RSEs.

Overall, this panel aims to provide a comprehensive and insightful discussion of the different routes to becoming an RSE, the role of obtaining a doctorate in this field, and the essential skills that are necessary for success. Attendees will gain a deeper understanding of the field of RSE and the skills required to excel in this exciting and rapidly evolving field.

Code Review for Research Software

Session leaders: Cole Crawford, Jeffrey Tharsen

Many academic research projects require custom software development. These developers are often a consortium of researchers, academic staff and professional software developers. As the field has matured, the question of testing and reviewing software to confirm it works properly has become an important topic, as nearly all software will have errors (some small, some potentially significant) in the code. A rough estimate suggests that there are between 15 to 50 errors in 1000 lines of code written by professional software developers (Soergel 2015). While most errors do not have any substantive effect on research findings, there are plenty of cases where this has happened (see for example Letzner et al. 2020 or Miller 2006). In addition, uncaught “edge cases” can drastically affect future researchers’ results.

Code Review is a common technique used in industry to improve software and reduce the number of flaws. In a code review process, a software engineer other than the original code author(s) reviews the source code. They ask questions and make suggestions for improving the software. In addition to identifying and eliminating errors, code review can improve overall quality by making the source code more readable and the overall product more sustainable. Furthermore, code reviews improve not just the skills of the reviewee but also those of the reviewer. If a code author and reviewer work in the same team or on the same or related projects, code reviews can also support team cohesion and facilitate information-sharing within a team.

Unfortunately, code review is not widely used in many research disciplines. The reasons for this vary widely. Many research software projects have only one core developer who does all the coding with no colleagues to review their code. Some projects avoid reviews by developers external to the project out of concern that their code could be plagiarized or their license ignored. In other cases, the person developing the code is concerned that they or their work will be attacked for being of poor quality or producing erroneous results. We suggest a BoF featuring a panel of three experts who will present best practices, tutorials and guidelines for code review from the perspectives of different disciplines. Following the panel we will host an open discussion and Q&A with the audience; the following questions are among those we propose to address:

  • Are certain challenges associated mainly with specific disciplines, or are there commonalities across disciplines?
  • Is code review across disciplines useful?

A pilot study conducted by Petre and Wilson (2013) suggests that there is value in code reviews even if the reviewers do not possess any domain expertise for the code they are reviewing; in their study, professional software engineers from Mozilla reviewed code written by scientists. The goal of this BoF is to help attendees to better understand the code review processes of different RSE teams across disciplines, and explore the value of code review undertaken by researchers for researchers.

Get Involved with pyOpenSci! Building diverse community around scientific Python open source software through peer review, training and mentorship

Session leaders: Leah Wasser, David Nicholson

pyOpenSci is building a diverse community around the Python open source software that drives open science. Attend this BoF session to learn about submitting a Python package to pyOpenSci’s open peer review process, and how you as a research software engineer will benefit from our supportive review process. You’ll also learn about best practices for pure Python packaging that are documented in our guide that we’ve recently updated with significant input from maintainers of Python packaging tools, core scientific packages and the broader community.

Founded in 2018, pyOpenSci’s core program is open peer review of scientific software with the goal of improving the quality, usability, discoverability and long-term maintenance of scientific packages. We also focus on diversifying the community through both community partnerships, and support for communities that have not been traditionally a part of the Python open space but who have interest in participating. Through our partnership with JOSS (Journal of Open Source Software), we also offer a publication win-win to maintainers who are interested in both Python modern best practices, long term community support and also formal publication of their software.

Community is core to our organization. As such understanding the needs of esearch software engineers working in the scientific Python space is critical as we develop our organization’s structure, programs and goals.

In this BoF session, we will discuss and ask for feedback on four things:

  1. We will review our open peer review process and how you can submit a package to us for review.
  2. We will talk about our community governance and how we are actively engaging with a variety of communities with the core goal of diversifying the Python open source space.
  3. We will answer questions and get feedback on our process to ensure that we are supporting research software engineers that are working on Python tools.
  4. We will talk about the community work that we are doing to create useful Python packaging and contributor resources that support modern approaches to Python packaging

This BoF will be relevant to anyone interested in US-RSEcon 2023’s theme enabling software, because pyOpenSci is empowering a diverse community to drive open science forward by building Python open source tools. The format of this session will be a mixture of short “how to” overviews followed by discussion sessions that solicit feedback.

Notebooks as a Possible Future for Scientific Publication, Better Practices for Sharing Software, and RSE Authorship

Session leaders: Kenton McHenry, Nicole Brewer, Daniel S. Katz

Software is now a key part of just about all scientific research and as such the need to support this need within science has gotten a lot of attention. One area of particular interest by funding agencies, such as NSF who invest in scientific software development, is that of sustainability and reusability so as to maximize the gains from these investments from research funds. One approach to scientific software sharing and reusability is that of utilizing computational notebooks as the paper itself, with it as the peer reviewed scholarly object. Such an effort was leveraged by the NSF EarthCube program n the last 3 years as a means of not only making software investments within geoscience more discoverable and reusable, but to also motivate the community itself to go through the needed effort of documenting and packaging their software in this manner through the peer review process which carries weight within academic reward systems. Given its success here the AGU put in and was awarded a grant from the Sloan Foundation to carry this further, bring together publishers and other stakeholders to work out the details as to how publications can actually be transition such that notebook submissions might actually replace paper submissions where appropriate and still be indexed, discovered, and accessed with existing publication systems and processes.

Computational notebooks are also increasingly important tools for research and science communication as they can be used to teach, showcase tools, communicate research results, and provide friendly interfaces to interact with data and software. To this end, US-RSE’23 Conference accepted submissions for reproducible notebooks (that were rendered to HTML documents), and interactive notebooks (that were hosted on Binder). Drafting a rubric for better practices was exceptionally difficult for interactive notebooks which are, by definition, not fully-reproducible due to widgets or other expected user-inputs. In addition, all software inevitably becomes unrunnable without maintenance. How then, should we ensure the reproducibility of our notebooks? And should this be a lasting or ephemeral goal? These are open questions the notebook subcommittee had to reckon with as they formulated better practices for peer review and we feel that they are worth opening up to the RSE community at large.

We propose to bring together a Birds of a Feather at US-RSE’23 to introduce these efforts as an area of interest for RSEs as well as explore the state of things right now and the areas that still need to be worked out. The format of the BoF will be in the form of a panel followed by a menti driven discussion with the audience. The panel will consist of representatives from notebook and notebook tool (e.g. binder) providers, AGU, and members of the US-RSE’23 notebooks subcommittee to discuss where each is going in terms of supporting/using notebooks.

Polyglot Software Development and Language Interoperation

Session leader: Mark Kittisopikul

This Birds of a Feather session will discuss the use of multiple programming languages and interoperation between them. Software development in the research environment requires the integration of existing and new tools employing a variety of programming languages. New programming languages are arising to fill unmet needs and new niches such as Rust, Zig, Go, and Julia. Yet, the use of these languages in ongoing research requires that they integrate with traditional programming languages such as Python, R, Java, and C++. I will draw upon my personal experience with working on JavaCPP projects, JavaCall.jl for Julia-Java interop, and building language bindings around the HDF5 C library (Malone 2014). I also hope to invite collaborators working at the interface of Java and Python in the context of FIJI (ImageJ) (Schindelin 2012, Rueden 2022). The session will incorporate both traditional approaches such via foreign function interfaces and the SWIG interface compiler and new approaches made possible by coding assistants such as a Github Copilot. The BoF is meant to gather project managers and software development engineers interested in sharing best practices for integrating more than one programming language into a single project and sharing data across language barriers.

The proposed format is to offer several, three or four, 10 minute talks about projects that facilitate interoperation between languages or a presentation of a project that uses multiple programming languages. An interactive roundtable discussion will then follow with attendees sharing their experience with attempting to use multiple programming languages together. An alternative format is to have the speakers lead a panel based discussion while encouraging audience participation. The choice of format will depend on the number of participants.

The overall goal of the BoF is to lower the barriers of including multiple languages in a project so that the best languages can be used to accomplish particular tasks. By collecting project managers and developers from diverse programming backgrounds, the BoF hopes to foster relationships across language boundaries.

What does it mean to be an RSE at a National Lab?

Session leaders: Keith Beattie, Rinku Gupta, William Godoy, Daniel Howard, Anshu Dubey, David E. Bernholdt, Daniel Gunter

Although the research software engineering (RSE) community originated in universities, RSEs also find a place in National Laboratories confronting comparable but distinct challenges as those at universities and other research institutions. This BoF seeks to provide a forum for those who identify as RSEs and those supportive or curious about the community, to deliberate upon challenges faced at national labs in order to explore approaches or solutions that might enhance their professional and scientific pursuits. Additionally, this forum can be used to discuss avenues for RSEs across labs to exchange information as well as define meaningful relationships beyond National Labs with university and industry affiliated RSEs.

The structure of the BoF proposed is to start with an overall introduction, two short presentations from RSEs at labs that currently have an established RSE framework, then break out into smaller groups to discuss a predefined set of topics and questions. The BoF will conclude with a panel discussion, led by representatives from various national labs, to find the common or unique issues faced by RSEs at national labs and develop solutions for improving the RSE experience within national lab ecosystems.

The topics covered will be chosen from, but not limited to:

  • How does being an RSE vary from lab to lab or as compared to universities or industry?
  • What are the RSE career paths at labs and how are they supported? What opportunities or support is missing?
  • How are RSEs funded at labs and how can this be improved?
  • What can be done to improve recruitment & retention strategies, specifically for DEI?
  • How do labs currently organize RSEs or groups of RSEs? What has worked well across different labs?
  • What training or opportunities should be made available for RSEs

We welcome a broad audience both from national labs and otherwise, to provide contrast and context to the discussions, expecting somewhere between 10-20 people in attendance.