The Analytics Team at Mal Warwick | Donordigital is pleased to re-release the Response Significance Calculator, a quick and handy way to test the statistical significance in response rate between two test panels when working on a direct response marketing project. When using this Calculator make sure that the two test panels were randomly split from the same population of donors, and that there were no differences between the two panels except for the test. For example, if you were testing an email subject line, both emails should have launched at the same time and had the same content. To use the calculator, enter the number of contacts and number of gifts for each of the two test panels, and then choose the calculate button. Click here to use the Response Significance Calculator. Peter Schoewe is a Vice President and Director of Analytics at Mal Warwick | Donordigital, a full-service, integrated fundraising and advocacy agency serving leading charitable organizations.
One of the great joys of direct response fundraising, at least to strange people like me (and maybe you), is the ability to test your assumptions and to strengthen the performance of a program one step at a time. But testing can be one of the most frustrating aspects of fundraising, too—especially when you go to the trouble to set up a test, track the results, and end up with no actionable conclusions when all the returns are in. I’ve experienced both the joy and agony of direct response testing, and I would like to share examples of both with you. Of course, it’s important to remember that no test results can be universalized—results vary by organization, donor segments, and time of year. A classic example of this is the test between Courier and Times New Roman fonts on a letter. I’ve seen this test performed on many occasions and often with highly significant results. The only problem is that, on those occasions when the test yields statistically significant results, half the time Courier wins, and half the time it’s Times New Roman—although there’s usually no statistically significant difference at all. In the end, the only value of this particular test may be showing that donors like something different. Or, even more likely, that it doesn’t matter at all. One of the tests I’ve been most excited about recently was the test with a premium offer expanded from a one-panel buckslip to a full 8-1/2 x 11” sheet. The control package contained a standard buckslip offering prospective donors a free plush toy with their initial membership gift. Expanding the buckslip to a full sheet allowed the premium to be represented in life size, and the response rate to the package increased by 27%. However, at the same time, the more prominent premium offer lowered the average gift by $1. Both of these results were statistically significant, meaning it’s highly unlikely the observed differences in response between the two packages were due to chance. So, in many ways this test did exactly what we wanted it to do. We modified the package in a way that caused a marked difference in donor behavior. We were able to inspire more donors to send a gift, but by doing that, we caused the value of the gift they offered to decline. Because this is an acquisition package, we cannot yet declare a winner. We need to monitor the long-term results to determine whether more donors giving a lower gift amount are more valuable than fewer donors giving a greater amount. But, because of this test, we now have a clearer idea of what influences response to the package, and we’ll be able to track the impact of that difference long term. A test I loved—but one that failed—used faux stamps on an outer envelope. This is a package treatment I’ve seen used on many occasions, so it must have tested well for somebody. But my test was a complete flop. I tested placing two bright and colorful stickers on the outside of a nonprofit envelope as close to the indicia as the post office would allow. The idea was that the stamps would catch the donor’s eye and help overcome the junk mail stigma that can be attached to letters mailed at the nonprofit rate. Other than the faux stamps, I didn’t modify the control envelope at all. It was a plain, cream outer with no teasers or images, other than a name, logo, and address in the upper left-hand corner. When I reviewed the returns, however, my affection for the fake stamps began to cool. There was less than 2/10ths of a percent difference in response rate between the version with the stamps and without—and no statistical difference between the average gifts. In all likelihood, the faux stamps didn’t change a single individual’s mind one way or the other about sending a gift. And, of course, the bright and colorful stamps added an extra measure of expense to the package that wasn’t counteracted by any increase in giving. It’s okay to have a loser test every now and then, because it’s important to test your assumptions of what will work and what won’t. What’s most critical is that you create tests with a well thought-out rationale behind them and that the hypothesized result of the test is aligned with your overall goals. For example, it doesn’t make sense to test a reduced Ask amount to increase response, if your goal is to acquire higher value donors who will upgrade quickly. In other words, each test you perform—and you should be testing as much as you can—should have the goal of moving your program to the next level. Just remember, it won’t be quick and easy, and you may receive as many muddy results as you do clear winners.
Peter Schoewe is a Vice President at Mal Warwick | Donordigital.
The maxims of multichannel fundraising are becoming clichés. We hear about the importance of building a 360-degree view of a donor’s involvement with your organization. That you should have a conversation with your constituents extending across channels, rather than settling for a siloed series of messages dictated by your structural chart. But the simplicity of these statements is deceptive. To create a truly integrated fundraising program, you must embrace complexity. In a multichannel environment, success relies on much greater sophistication in how we create, track, and analyze donor interactions. This new reality was brought home by our agency when trying to analyze the results of a straightforward test my colleagues and I performed several years ago with People for the Ethical Treatment of Animals (Norfolk VA) in its direct mail renewal series. For those donors with prior online giving history, we modified their renewal notice to put a strong drive-to-Web push on the outer envelope. Our hypothesis was that we could inspire multichannel donors to renew more quickly by highlighting the online renewal option before they even opened an envelope. The risk of the test was that by taking emphasis away from the response device in the mail piece, we could depress overall response—even including any potential online membership renewal gifts. And on the initial review of the results, it appeared that this was the case. We tracked all gifts received through the mail and all gifts received through the special URLs included on both the control and test versions of the notice. The control version, without the strong drive-to-Web push on the carrier envelope, had a slightly higher response rate including both online and offline gifts, although it was not a statistically significant difference. However, as we looked at these results and scratched our heads, we noticed something funny. For both the control and the test, there were only a handful of gifts recorded to the unique URLs we included in the mail piece. Because all of the donors in the control and test had a history of online giving, that seemed wrong. We decided to go an extra step and match back any online renewal gifts received within a month of the in-home date of the mail renewal notice. The result was breathtaking. For both the control and test segments, the overall response nearly doubled. And now, the test segment—those donors who received the same renewal messages both online and through the mail, but who had the online giving option emphasized on the outside of their mail notice—had a response rate 11% higher than the control, a statistically significant difference. When we looked at the overall renewal rates for both cohorts at the end of the renewal series, we saw the same effect—overall, more donors renewed who had the drive-to-Web push on their mail notice. We learned a number of things from this test:
- First, and perhaps most noteworthy, we discovered our donors were not behaving in the neat and easily trackable manner we envisioned when we set up the test. Rather than type in the special URL we gave them, the great majority of the donors chose to give online in their own way. And who can blame them? It’s a lot easier to type a couple words into a search box than to painstakingly transpose a URL.
- Second, we proved that offline messaging can influence online behavior in a head-to-head test. The only difference in our messaging between the two donor pools was the stronger drive-to-Web push we tested—and we showed with statistical confidence that more donors renewed their membership when presented with a cross-channel option.
- And, finally, we discovered—especially with a program as sophisticated as PETA’s—that it’s crucial to analyze the full range of donor interactions when determining strategy, even for a channel as established as the mail. Of course, a simple test like this is just the tip of the iceberg. The old reports and donor pathways are no longer sufficient—you must build a way to see, track, and analyze all the different ways your donors are experiencing and interacting with your organization in order to be able to build and refine a true multichannel fundraising and cultivation strategy.