Back in January 2015, the Postal Service proposed a major change in the way it measures on-time service performance for single-piece First Class Mail. Instead of contracting out to a third-party as it had done for many years, the Postal Service wanted to take advantage of the increased use of barcoding and do the measuring itself.
The Postal Service began testing the new internal measurement system during the first quarter of fiscal year 2016, and last week it shared the results for the second and third quarters with the Postal Regulatory Commission.
According to this PRC order, the Commission will host an off-the-record technical conference to discuss the data represented in the quarterly service performance reports on Friday, August 26.
At this point, it appears that the Postal Service may have some explaining to do. In many instances, the on-time performance data produced by the two measurement systems seem significantly different, and it’s not clear what the discrepancies mean.
The two measurement systems: EXFC and SPM
For years now, the Postal Service has been using what it calls the External First-Class Measurement (EXFC) system to measure the end-to-end (mail deposit to delivery) length of time it takes for mail to be delivered. The system is “external” because it’s conducted by an independent third-party contractor, i.e., IBM.
In the EXFC system, the contractor employs “droppers” who deposit letters and flats addressed to “reporters.” The participants record when mail was sent and received, and based on the transit times for those mailpieces the system generates service performance data showing the percent of mail that was delivered on time, i.e., within the service standard.
Because it’s expensive to track individual pieces of mail this way, the EXFC system uses a comparatively small sample of test mailpieces, and then extrapolates to make estimates for mail volumes as a whole. The test mail is statistically analyzed based on sample volume, mail characteristics, and the location where the mail was entered and delivered.
The new internal system is called Service Performance Measurement (SPM). It takes advantage of the fact that much of the mail is now barcoded and can be tracked on automation equipment by the Postal Service itself, which will save the expense of contracting out.
Unlike the EXFC system, the new SPM system doesn’t track particular pieces of mail, end to end. Instead, it uses a statistical composite that averages, weights, and combines live mailpieces on three different stages of the delivery process — First Mile (collection), Processing Operation, and Last Mile (delivery). The data are combined to measure overall transit time for what amounts to “virtual” mailpieces, as opposed to the actual mailpieces tracked by EXFC.
Thanks to barcoding and automation equipment, the SPM system can track a significantly larger volume of mail than EXFC. In fact, since the SPM tracks such a large portion of the mail flow, the Postal Service considers it a “census” rather than “sampling” methodology (just as the U.S. Census tries to include every household, not just a representative sample). As stated in the USPS report describing the new system, “the proposed measurement system changes will increase the number of pieces in measurement – moving from a seed/sample system to more of a census-measurement system.”
Since it’s not tracking actual pieces, the reliability of SPM is more dependent on the quality of the statistical design and the accuracy of sampling. The reliability also depends on whether the categories of mail that are measured (i.e., barcoded mail) are also indicative of the categories of mail that are not measured (i.e., non-barcoded mail).
Running the two systems in parallel
As mandated by the Postal Accountability and Enhancement Act (PAEA) and as indicated in the PRC order creating the docket, a change in measurement systems requires the Commission’s approval, and interested parties been reviewing the proposal for over a year and a half. The participants’ comments, library references, and other materials can be found in Docket PI2015-1.
While the Postal Service’s proposal involves some very significant issues in how the timeliness of the mail is measured, the docket did not get a lot of attention, and only eight parties filed comments, including the National Newspaper Association, the American Postal Workers Union, the Greeting Card Association, PostCom, and postal watchdogs Douglas Carlson and David Popkin.
Popkin and PostCom suggested that one way of testing the reliability of the new system would be to run both systems at the same time to see how the results compared. The Commission had previously indicated that it would like to see this as well. So the Postal Service agreed to do exactly that.
As explained in its Reply Comments (May 18, 2015), the Postal Service said it would run “the proposed SPM plan in parallel with the existing measurement systems to ensure that the proposed measurement system generates service performance data of similar quality as the current system.”
The Postal Service began implementing SPM during the first quarter of fiscal year 2016, and the second quarter was the first time SPM was run for a complete quarter. Last week, the Postal Service gave the PRC the results of the SPM system for the second and third quarters. These reports can now be compared to the previously published performance reports produced using the EXFC system.
Comparing the results
The following table shows the percent of single-piece First Class that was delivered on-time (i.e., within the service standard) for the two measurement systems, on an area-by-area basis, during the second quarter of FY 2016. (A more detailed district-by-district table is here.)
As the table shows, the two systems produced roughly similar results for mail with a three-to-five service standard — the nationwide averages are just 1.4 percent apart (80.2 percent vs. 78.8 percent) — but the results for two-day mail were much different. The national average using EXFC was 93.9 percent, as opposed to 86.8 percent using the SPM system — a difference of 7.1 percent. In some areas, there was a much larger discrepancy. For example, in the Western area, there was a difference of almost 10 percent.
In its Preface to the library reference with the SPM data, the Postal Service offers this explanation: “Both operational and technical challenges were experienced in implementing a program that requires participation from hundreds of thousands of employees involved in acceptance, delivery, and collection of mail nationwide.”
The Postal Service proceeded to make several technical and operational changes to the SPM system, and “over time,” it observes, “significant progress has been made in reducing those challenges.” (The Postal Service offers more details about the issues and changes it made to the system in the narrative accompanying the reports.)
As a result of this progress, the results for the third quarter of FY 2016 were a bit better, that is to say, the discrepancies in the data produced using the two systems were reduced. Here’s a table showing the area-by-area results.
Nationwide, for mail with a two-day service standard, the difference between SPM and EXFC was reduced from 7.1 percent in Q2 to 3.1 percent in Q3. For mail with a three-to-five-day service standard, the difference increased from 1.4 percent in Q2 to 1.7 percent in Q3.
A few questions
The Postal Service says that it anticipates further improvement in SPM down the road, At this point it’s hard to know what to make of these discrepancies.
The fact that such discrepancies exist is in itself not necessarily a problem. As the Postal Service points out in the Preface to the library reference, the two different systems inevitably produce different results:
“The measurement methodology differences will likely result in differences in performance estimates between the two systems. Neither system is measuring every piece of mail from collection to delivery; thus, estimates from both are subject to error, some of which is due to sampling and some due to non-sampling.”
Differences are to be expected, but the main point of using the two systems in parallel for a couple of quarters was to see if SPM “generates service performance data of similar quality as the current system.” Now the question is, has SPM in fact produced results that are as good as EXFC generates? Or to put it another way, are the discrepancies significant enough to cast doubt on the validity of the SPM system?
The test results also raise other questions as well. What specifically in the sampling, measuring, and statistical design may account for the differences in the quarterly reports generated by the two systems? Why did the SPM system consistently produce a lower rate of on-time performance compared to the EXFC rate?
Given that SPM tracks billions more mailpieces, is it possible that its reports are more accurate than those produced by EXFC? If comparing the data from the two systems is like comparing apples and oranges, will SPM reports not be useful as points of comparison to earlier EXFC reports? Is the Postal Service essentially starting over, so that SPM reports can only be compared to previous SPM reports, not EXFC reports?
And some other issues
Aside from the issues raised by the side-by-side performance results, many other issues have been examined during the PRC’s proceedings that may still need to be addressed by the Commission.
For example, while the new SPM system encompasses vastly more mailpieces than the EXFC system can track, it does not measure mailpieces without a barcode, i.e., stamped mail typically sent by average customers. Just over a third of single-piece First Class letters contain an Information-Based Indicia (IBI) barcode, while more than half of single-piece mail contains a stamp. As more than one commenter noted, the SPM system is thus more likely to measure business mail than personal mail.
In response to this concern, the Postal Service noted that the stamped mail category includes a lot of courtesy reply envelopes, which are also barcoded, so it is not correct to say that all stamped mail is excluded from the new system.
Another problem with the new system is that a lot of mail is dropped in collection boxes, much of it stamped and not barcoded, so it cannot be scanned for purposes of First Mile data collection by the SPM system. In response to this concern, the Postal Service has pointed out that while SPM doesn’t track this mail during the First Mile stage, it is included in the second and third stages. Mail processing equipment is designed to barcode such mail at the earliest opportunity, so it should “experience virtually identical processing and delivery times” as barcoded mail.
A related issue raised by commenters is that the SPM system is not designed to sample outgoing mail left in residential mail receptacles for carrier pickup. As the PRC’s Public Representative observed, “Mail that carriers accept from customers (38 percent of Single-Piece First-Class Mail) is currently excluded [by the EXFC system] and will continue to be excluded from service performance measurement [by the SPM system].”
In reply, the Postal Service pointed to the “significant challenges to the feasibility of expanding the new measurement system” to include such mail. In any case, notes the Postal Service, “the new measurement system significantly expands the universe of measured mail by including data from pieces accepted at postal retail counters.”
Commenters on the docket have identified many other issues with the new system, which are reviewed and addressed in the Postal Service’s reply comments. Some of these may be examined further in the technical conference on August. 26.
Sources for the tables: The SPM data for Q2 and Q3 of 2016 can be found in USPS Library Reference PI2015-1/2. We’ve uploaded the second quarter SPM report to Google Drive here, and third quarter here. (More details are in a related library reference, PI2015-1/3.) The EXFC results for the second quarter of 2016 are on the USPS website here; and for the third quarter, here. We’ve combined the SPM and EXFC results for these two quarters in a spreadsheet available on Google Drive here. Note the tabs at the bottom of this spreadsheet for the second quarter and third quarter data.
There’s more about the two systems in this previous post. See also Mark Jamison’s commentary, “Measure to Measure: Data, management, and the reliability of the mail.”