launch

Alex Gaynor

‘We Were Operating in the Wild’: Alex Gaynor on Building — and Fixing — Software at the Department of Veterans Affairs

Alex Gaynor was a Software Engineer and Site Reliability Engineer at the U.S. Digital Service from 2015 to 2017. Prior, Alex worked as a software engineer in San Francisco.  

Alex Gaynor spent the first year of his U.S. Digital Service tenure embedded at the Department of Veterans Affairs working to improve the disability claims process.

During his time as a software and site reliability engineer, he learned the importance of securing leadership support for projects. He also witnessed how quickly government is able to move when it perceives something as a real emergency. Below, Alex discusses challenges, achievements, and hacking together a software prototype during an initial meeting about that very same project.


June 13, 2019

Emily Tavoulareas:

Tell us about your journey to the U.S. Digital Service. How did you end up there?

Alex Gaynor:

A friend of mine sent me the Washington Post article about USDS’s founding. I knew Paul Smith from open-source Python Chicagoland, and I mentioned to a coworker at the time: “Oh, I bet I know somebody who knows these jokers.” Neither of us were super happy with our jobs, so I reached out. Paul introduced me to Mikey Dickerson. Mikey introduced me to Jennifer Anastasoff, and we spoke on the phone. In fall 2014, Todd Park was in the Bay Area. Jennifer invited me to one of their round tables. I submitted an application that night. At some point in this, I also heard Jen Pahlka speak. 

Emily:

Was there any point that you felt like the light bulb went off? Where you were like, “Oh yeah, I’m definitely going to apply to this?”

Alex:

It was hearing Todd speak. He’s very persuasive. I submitted my application that night.

Emily:

When did you start?

Alex:

January 11, 2015. I remember it with such clarity. It was a Sunday, which is when the Department of Veterans Affairs (VA) says you start. Monday I went to the VA hospital in San Francisco and the VA regional office in Oakland. I was doing paperwork and getting my laptop, and then took a red eye to D.C. for January 13 where I met all of you guys.

Emily:

There were lots of hugs at Jackson Place.

Alex:

Yeah. And no working heating.

Emily:

Tell us about your first day at Jackson Place.

Alex:

The whole first week is kind of a blur. I got to Jackson Place just before nine and I’m pretty sure nobody was there. Thankfully after a few minutes of waiting, Erie Meyer showed up and let me in. Erie was the first person I met.

We were supposed to start at 10 —  all six of us and Matt Weaver. Weaver gave us the plan and the schedule for our first two weeks, which was lots of meetings and presentations. I don’t remember in detail what we went over on the first day, but I’m sure it was the three missions for the VA: double-digit increase in claims processing capacity, single unified veterans experience, and unified electronic health records system. 

In the afternoon, we had a meeting with people from Indiana who were doing novel work. I don’t think I ever met with them again. I also met Robert Sosinski, who was a Presidential Innovation Fellow (PIF) at the VA, and we were told the first project we’d be working on was integrating the work he’d been doing on Caseflow with the Veterans Benefits Management System (VBMS). 

Emily:

How would you describe USDS to someone who doesn’t know anything about it?

Alex:

It was a group of people with deep expertise in their subject matter — engineering or design or product management — focused on how government functions. They were both supremely humble and supremely arrogant, and grouped up semi-arbitrarily and pointed at the most pressing problems of service delivery we had in government.

Emily:

I like that description. You mentioned the three things that the original team at the VA was pointed at. Could you say a little bit about what you were brought on to do and how that evolved?

Alex:

I had no idea what I was brought in to do. At the time I joined, I told people I was probably going to be consolidating login systems for VA websites. I have no idea where I got this impression. It’s not a thing I spent a single day of my time working on. Really I was brought on as an engineer to work on whatever engineering thing there was.

As we divided ourselves up — and this was before Vets.gov was off the ground — I was placed on the claims processing-side. We had this high-level goal of double-digit increase in claims processing throughput, which at the time was understood to be original claims. That is, claims of first instance that are being processed in VBMS. These are disability benefit claims, and we spent months trying to figure out the angle to tackle that. Are we going to work on UX problems in the Veterans Benefits Management System (VBMS)? Are we going to work on missing functionality? Are we going to work on performance issues? It would be at least April or May before we really settled on a project.

Emily:

What do you remember about how the team operated at that point? And when I say “the team,” I mean both at headquarters — the larger USDS entity — but also at the VA.

Alex:

High level, the VA team was operating relatively independently from the headquarters team. We were the first team that had people hired into an agency. The other major projects at the time were HealthCare.gov and immigration, both of which were staffed out of the headquarters team. And the Healthcare.gov team was out in Columbia, Maryland. So we were operating relatively independently. We spent the first two weeks at Jackson Place, but after that we moved to 1800 G. Most of the contact with the headquarters team was at the weekly staff meeting.

In terms of how it was organized, all six of us unofficially reported to Weaver, but reported to a rotating cast of Deputy CIOs. On paper I don’t know who Weaver reported to, but de-facto he reported to Marina Nitze, the VA CTO and senior advisor to the secretary.

We were relatively self-contained; we did not have a lot of interaction with people above us on the food chain. We were operating, at least on the claims side, in the wild. We had this high-level goal, were given intros to a handful of people, and told to dig out.

Emily:

Having lived it with you, I know that it took some time to gain clarity around the priorities. Do you think there was a turning point or a moment of clarity?

Alex:

There was no singular turning point, but I can point out a handful of impactful moments. The first was a deep dive we had with Mikey. This would’ve been circa March or April, and it was originally scheduled for 90 minutes. We spent two-and-a-half hours talking about how to approach the claims side. We were debating between the UX issues VBMS had and the performance issues and where to put our energy. That was the first real moment of, “We need to make a decision about what we’re doing. It’s not enough to just be looking for our project anymore.”

The next important moment, although I don’t think we recognized it at the time, was in one of our weekly meetings with the SES (Senior Executive Service) in charge of VBMS. She tried to pitch us on this lawsuit coming out of the Yale Law Clinic with veterans submitting Privacy Act Requests for their data. And we needed a tool to download all this, because otherwise we’ll spend months downloading files one-by-one in VBMS. I don’t remember why, but we jumped at it and we built something. That was a pretty important moment — it was the first concrete thing we had started building, and that work became even more important as the history of the team developed.

Kathy:

What was it about that project?

Alex:

Veterans potentially have a lot of documents stored in the VBMS eFolder, which is a record of every internal form the VA filled out, every piece of evidence they submitted, everything their doctor submitted. And it was one click per file. So to complete a Privacy Act for somebody who had 100 files in their folder meant hundreds of clicks and many, many minutes to fill out. The lawsuit from Yale was representing around 40,000 veterans. It was plainly going to cost millions of dollars in staff time to complete all those requests.

I’ve never actually attempted to verify that the lawsuit was real. Maybe you’ll Google this and find out it was just a goofball project invented to make us go away. 

It turns out these tasks, the downloading of eFolders, was being performed all over the place. I want to say this work was being fulfilled by contractors in Florida on a contract. And we would later discover that there was also a workflow at the Board of Veterans Appeals preparing a case for a judge that had lawyers doing the exact same work of printing out everything in an eFolder.

Emily:

Were there any other turning points?

Alex:

Sometime in summer 2015, I was asked to work with the Office of Personnel Management (OPM) to get e-QIP back up and running after the security breach. It was the first time I saw the way a different agency functioned, and very particularly, I saw how quickly things could move when there was a true emergency. That was something I brought back with me; trying to force the urgency and have more engagement with senior leadership on making things happen. 

Another major milestone in summer 2015 was when Mikey made a request for every project to come up with the drop dead date — the point at which if you had not accomplished a goal, it simply did not make sense to staff resources to the project. Not a lot of people liked this. But I found it important because at that point we had not shipped anything of value on the claims. We had not hit any milestones, particularly in terms of improved service delivery or outcomes for a veteran.

This was really clarifying in terms of, “How much progress are we making? How do we speed up, or does this not make sense?” That culminates in Mikey’s meeting with the President where Mikey asked to shift USDS staff from the VA to other projects. The answer was no. That was not a turning point, but for me personally, it was significant in that it was a very clear communication that the measure of potential value was extremely high and that the slow progress had not gone unnoticed, but was judged as being worth it.

Emily:

What made you think they were viewing it as worth it, even in the face of that long timeline?

Alex:

Because the question had been asked explicitly. It was not implicit. The question had been posed to the person whose decision was ultimately, “What is the correct allocation of resources?” And his answer was that it was worth having us there, even if it was slower going and we could deliver things faster to other agencies. The project was sufficiently important.

Another important milestone — though I don’t think of it with the same significance now — was the goofball lockdown at MITRE. This is summer 2015, when we assumed a greater degree of ownership over the appeals project. Similarly, we had a VA offsite at Marina’s apartment in fall 2015 where Ellen left USDS entirely. Weaver also shared he was going to be leaving the VA at that offsite. It was not a high point. We ended several hours early because everybody was exhausted. There’s no doubt that that was a significant point. 

Kathy:

Can you talk more about the OPM urgency? 

Alex:

There were two sources of urgency. The OPM suffered a pair of major breaches, culminating in the loss of every single SF86 and SF85 that had been filed in the previous decade-plus. Probably one of the most tremendous counterintelligence coups for a foreign adversary ever. That was not why we were sent over. We were requested because in the midst of this, OPM had shut down e-QIP, the application where you complete your background investigation form. Immediately, federal hiring stopped. The federal government hires something on the order of five to 10,000 people a week, and the Army couldn’t onboard new people. The VA couldn’t hire, nobody could hire. They’d started to move back to paper. Paper. Filling out and reading a 100-page form on paper is no good. So we were sent over to restart federal hiring.

So we were not sent over on a security mission per se. e-QIP had been shut down because somebody had deemed it to be unsafe. We were sent over to assess whether that was true and, if it was, to make it safe.

It was a measure of the delta between how traditional government cyber people would approach that problem, and how USDS did. The team was me, Weaver, Margaret McKenna, and Nacin. It’s fair to say we looked nothing like the team the government would have composed, and that was an interesting proxy for how USDS approached this and how our urgency manifested itself.

Emily:

You mentioned your experience with e-QIP highlighted how quickly things could move when there was a true emergency. What is it about a true emergency that you think is helpful?

Alex:

A lot of the normal people — the character class that is people who are very uninterested in change, very preservationist — clear out. They don’t want their fingerprints on it if the whole thing gets worse. And so a lot of these sources of typical objections are not present. The other factor was that the whole organization was oriented around the  management hierarchy condensing. There was an expectation that during an emergency, senior leadership would be more directly involved. We saw how that looked: We spent our week working out of the Deputy CIO’s office. We huddled around a conference table in his office. When we needed something, it could get translated from, “We need this from the contractor” to “The CIO or Deputy CIO says it’s okay” very quickly. We were operating with the agency’s leadership’s full authority at all times.

Ultimately, there were not that many true emergencies USDS worked on. The overwhelming majority of USDS staff probably never worked on an emergency-type problem. There was a conflation of emergencies with rescue engagements. The example I would use is immigration: The work on the Electronic Immigration System (ELIS) was not responding to an emergency in any sense. However, it was rescuing an unsuccessful project engagement. Healthcare.gov, e-QIP, and the IRS breach — those were emergency response projects.

Emily:

I’ve never heard anyone articulate it that way: emergencies versus rescues. So it might be a problem, it might be a crisis, but it’s not an emergency.

Alex:

This distinction is really important when you’re thinking about the VA. If I read you every New York Times or Washington Post headline about the VA from 2014, and then I asked you what type of engagement we had at the VA in the start of 2015, you would for sure say “emergency.” There was no way an agency that’s secretary had resigned in disgrace and that was plagued by lost records and dead veterans in Phoenix would not be thought of as an emergency. Yet I don’t think any fair assessment of what we did in 2015 would say we were responding to an emergency. It should have been, but it was very clearly not.

Emily:

It was more of an institutional crisis or a political crisis or a service delivery crisis, but not an actual emergency.

Alex:

Right. You could have imagined that same set of facts producing an emergency, but that’s not how the VA responds.

Kathy:

So you’re saying we should have responded to those headline emergencies, but the projects you worked on were different?

Alex:

No, I mean we were trying to work on claims processing, which was one of the huge issues. It was taking months and months and the secretary had set a goal of all claims processed in 125 days. There was a serious emergency. We were trying to work on those problems, and that’s just not how the VA approached its response. A question nobody can answer is, “If we tried to increase the urgency, would that have been successful, or would it have just bounced off the agency?” I don’t think anyone can tell you with confidence which way that would’ve gone.

Kathy:

You talked about a couple things: The complete inability to ship anything. And how important it is to tell the story that we were either overly conservative and using our executive firepower — and maybe scared of looking at the bridge the wrong way, much less burning it down. And you also talked about contextualizing the experience with OPM and U.S. Citizenship and Immigration Services (USCIS), because you saw how effective executive support could be used more effectively. Can you talk more about being too conservative around executive support and how that played out for the team?

Alex:

I can give you an example that is very crisp. One of the first things we wanted was access to the VBMS source code — to have some sense of what is running, to be able to reference things we see in the UI, complaints we hear back to the source code. It seemed like a basic starting point. But it was probably two months before we got access to that source code. And in that time we did not ever engage any of our executive stakeholders to speed that process along. 

By comparison, at OPM, we got there on our first day. The contractor was in Tennessee, and somebody was having trouble getting the contractor on the phone. As we were waiting, we started thumbing through our phones to figure out how to fly to Tennessee. Maybe that was how this project was going to go. But we got lunch, we came back, and somebody who was functioning as a chief of staff-type at OPM was like, “Good news. I finally got them on the phone. I said, These crazy White House people are going to show up in Tennessee if you don’t get that source code up here right away.”

The fact that half of us did not work for the White House was in no way relevant, and we had the source code by the end of the day. One of the things that I took away was that as soon as we had the source code, we were able to start conversing with this contractor engineer-to-engineer. Whatever anger it might’ve provoked to go this threatening route was immediately superseded by the fact that we were able to have substantive conversations. They were able to see that we were just there to help out. Seeing that contrast of the months versus a day was very eye opening for me in terms of how we had underused our executive buy-in at the VA.

Kathy:

That’s a great framing around how being able to speak the language removed some of the political barriers.

Alex:

Yeah. Because as long as we’re fighting over meta stuff, we are unproductive interlopers. When we are able to actually do work, our value becomes very clear.

In terms of the over-conservatism, ultimately it was something like 10 or 11 months on the claims/appeals project before we shipped anything that would help a veteran. We had a similar struggle again with the VBMS team. It’s a recurring theme for my time at VA. We had a struggle getting credentials to access their production API, so we could deploy the CaseFlow tool, so we could deploy the eFolder Express tool and run our script to find previously misfiled cases. We were engaging the deputy secretary on a semi-regular basis on that, but we weren’t as crisp with him as we could have been. It was within his power to order them to give us what we needed. But heaven knows how long that would’ve taken if we’d not engaged him at all.

Kathy:

You mentioned the VBMS performance and usability issues and how long it took to get the credentials for VBMS’s API for eFolder Express and Caseflow. VBMS is probably a billion-dollar IT project. Is there anything around that that you want to elaborate on?

Alex:

Look at the scale of what VBMS does — serving all original disability claims at the VA — and how dysfunctional it was. Anyone who spent a day working with Veterans Service Representatives (VSRs) or a Regional Office (RO) could just see the UI problems, see the missing functionality, see the performance issues. And yet we did not ultimately choose to work on that. I don’t think there’s another example of a USDS team deliberately choosing not to work on a system that was so central to the agency’s operations and in such need of improvement. This system was to the VA what HealthCare.gov was to CMS or CCD was to State, both of which USDS worked on. 

That is very important, because it would be fair to describe that as a failure on our original goal. We never engaged on the double-digit increase in claims processing capacity because we were not able to find a way to engage with that team. 

Kathy:

We didn’t increase at all?

Alex:

We talked to them a whole bunch. We tried to work on this performance stuff, but we really bounced off hard. Part of that was their team being based in Charleston, South Carolina. So it wasn’t like those other projects that had really significant staffing in the D.C. area. Part of that was their leadership being very resistant; partially as fallout to the discovery sprint report from 2014. But we never tried to force that issue.

Kathy:

So you did try and figure out how to get in?

Alex:

Yeah, trying to find the source code. We spent a bunch of time at RO’s seat trying to figure out what the problems were. It was April or May that eFolder Express began, and I would say that is the turning point of working on that and Caseflow as our primary projects and giving up on the dream of VBMS. Somewhere around the lockdown, our team at USDS agrees to write a report on how appeals should be done. 

Emily:

Lockdown was the kickoff. That was the light bulb when we were like, “Oh, we should write this.”

Alex:

Yeah. That’s when we took ownership. You can FOIA my calendar to figure out what that date was, but that’s the point where we really turned away from VBMS and decided that Caseflow and eFolder Express Appeals was going to be our full-time, primary project. That was where we were going to have our impact.

Emily:

Alex, am I remembering this correctly: You wrote a prototype of some sort literally during the lockdown and then showed it to the whole meeting and everyone lost their minds?

Alex:

Yeah, there were a couple things. So, we engineered the meeting not to just be VBA IT. We invited the actual users, a real fucking novelty. Kavi Harshawat and Giuseppe Morgana presented usability research they’d done. I presented either a demo of Caseflow or eFolder Express. And one of the problems somebody mentioned in the meeting was creating templated letters, so I wrote a demo of a product that does exactly that in the meeting. My point was: This is not a multimillion dollar project.

Emily:

This is absolutely one of my favorite USDS stories — Alex casually pulling this thing together and showing them something in a less than 24-hour window. It was really powerful.

Alex:

That was the story of how we tried to treat the lockdown across the board. It certainly wasn’t just me that was presenting the usability research and going through the demos. It was us trying not to make an abstract decision about how to spend $19 million, but a really concrete decision about the product we need to build for the board. Only when we know what we’re building can we talk about the money.

Kathy:

What are things you’ve done that you think have really stuck around? And what are some things you’re really proud of?

Alex:

Certainly the veterans and the refugees. Those are the largest projects I worked on that were directly impacted. We found thousands of claims that had been lost in VA internal systems that could still not have been processed to this day, four years later, if it were not for us. We have 30,000 additional refugees admitted to the U.S. That is a lasting impact. And laying the technical foundations of Caseflow and Connect VBMS — as far as I know, those have not been completely ripped out by the team. They have stood the test of a few years, at least.

Kathy:

Are there ways in which that kind of foundation bled into how the rest of engineering worked, or other teams at USDS?

Alex:

I don’t know. The VA engagement at the time was very distinct. It was the only USDS engagement where we were not working with a significant legacy code base with a significant number of contractors who we were not involved in bringing on. Anything that happened after January 19, 2017, I feel less able to comment on. 

Again, things that I’m most proud of are just working on the refugee program. That’s something I care quite a lot about. And in terms of the impact on the technical side, getting Connect VBMS into something that could be used by lots of different projects. That was a particularly good alignment between something we really needed to build and my technical skills.

Kathy:

So we didn’t actually work on core VBMS stuff, but we got VBMS to connect to other projects?

Alex:

Yeah, Connect VBMS was the SDK we built to consume the VBMS API. They never said this to my face, but it has been communicated to me many times that the VBMS team was saying, “We have an API. Why are none of these teams able to consume it? Why can nobody figure this out? Is everybody else here dumb or something?” This is very paraphrased, and as was communicated to me. Getting that working was the basis of Caseflow. That was the basis for eFolder Express. I’m told Va.gov now uses it.

Kathy:

So they did have an API.

Alex:

Yeah. It was not really documented, but if you were a motivated person with a particular technical skillset, you could make it work.

Kathy:

That’s great. With initiative, we could figure out how to use the API to do other things, which is really interesting. Beyond the VA specific work, is there other work you contributed to within USDS as a whole? 

Alex:

I spent two weeks at OPM and I spent two weeks at USCIS in December of 2015. I was on a discovery sprint at the Department of Education Federal Student Aid. I spent most of 2016 at the State Department working on visas, passports, and refugees. I spent my last couple months at the IRS, and one day with the FAA. In terms of non-project work, I was very involved in engineering hiring: both resume review interviewing, as well as developing the interview questions and competencies.

Emily:

I want to echo the non-project stuff that Alex did. If you asked me about Alex’s legacy, I would definitely say a voice in how USDS approaches hiring. 

Alex:

Thank you.

Kathy:

Is there anything else you want to highlight in the non-project work? 

Alex:

I was involved a little bit in organizing the 2015 USDS offsite. That was not a world record for the most successful contribution.

Kathy:

All of our offsite organizing was always a bit hectic, regardless of the state of USDS.

Alex:

That was pretty much true across the board. I said to every new person who joined the team: “I’m really sorry. Everything’s a mess. It’s super hectic.” That was just the nature of the work.

Another non-project thing was working on the priorities.md linter. I also admined our GitHub.

Kathy:

You made priorities.md fun. To this day, the engineering hiring repository is one of the easiest ones to reference because of the GitHub organization you created. You tied the competencies to the questions for interviewing, something that seemed minor but was really helpful. 

This is not an official US government website.

hello@usdigitalserviceorigins.org