Data is transforming our world, including sustainable development, and its demand shows no sign of slowing. In this blog series, Data Digest, our data experts dive into data—from beginner to advanced topics. The series draws from ACDI/VOCA’s Learning, Evaluation, and Analysis Platform (LEAP), a streamlined system that integrates raw data, visualizations, and more from each of our projects. We’ll share our code throughout this series on the development platform GitHub. Each month, you can expect a closer look at data management.

Your project is taking off; you registered a lot of participants in your project database and are implementing multiple different activities! You successfully aligned the participant calculations with a customized reporting year (as in part one of this blog series) and eliminated double counting for each specific activity (as in part two).

Now for your next challenge. Your first project report is due soon, and you will have to report precise participant numbers across all project activities. You are getting nervous because you know that some project participants have taken part in multiple activities (i.e., you cannot just add together the total sums from all the activities without double counting). Sorting and reconciling this manually seems overwhelming, possibly requiring some late nights and weekends.

Luckily, with the date reference table built into your project database, there’s one clean way to automate all of this!

For this example, we will look at a hypothetical project that needs to report a distinct count of participants across two activities: technical and financial assistance. Some participants took part in both, while others took part in only one.

Technical Assistance                                                                                                         

 

 

 

 

 

 

Financial Assistance

 

 

 

 

 

 

We can start with technical assistance by building an SQL command, like the one demonstrated in part two of this blog series, which will output a list of all participants and binary values to indicate whether they participated in technical assistance at least once during each fiscal year (FY) and for the life of the project (LoP). (A “1” means they participated at least once, and a “0” means they did not participate in the activity.)

In addition to these two activity participation tracking tables, we will also need our date reference table.

SQL for Distinct Participants Taking Part in Technical Assistance

SELECT [Participant Assisted], MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Technical Assistance] INNER JOIN [Date Reference] ON [Technical Assistance].[Date of Assistance]=[Date Reference].[Date Check] GROUP BY [Participant Assisted];

 

 

 

 

 

 

 

 

 

 

 

SQL for Distinct Participants Taking Part in Financial Assistance

SELECT [Participant Assisted], MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Financial Assistance] INNER JOIN [Date Reference] ON [Financial Assistance].[Disbursement Date]=[Date Reference].[Date Check] GROUP BY [Participant Assisted];

 

 

 

 

 

 

 

 

 

Now that we have the precise participation list across all reporting periods for the two activities, the next step is to combine the two SQL commands above to create a consolidated list. To achieve this, we need a new SQL query that includes the commands for both technical and financial assistance, with these two blocks of code seperated by the “UNION” command. For demonstration purposes, I will also add a new column to clarify whether each line represents participation in technical or financial assistance.

SQL to Merge Participant Lists

SELECT [Participant Assisted], ‘Technical Assistance’ AS Activity, MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Technical Assistance] INNER JOIN [Date Reference] ON [Technical Assistance].[Date of Assistance]=[Date Reference].[Date Check] GROUP BY [Participant Assisted] UNION SELECT [Participant Assisted], ‘Financial Assistance’ AS Activity, MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Financial Assistance] INNER JOIN [Date Reference] ON [Financial Assistance].[Disbursement Date]=[Date Reference].[Date Check] GROUP BY [ParticipantAssisted];

Looking at the output table, we see that many participants took part in either one activity or the other. However, a few individuals,such as the last one in the table below, are listed twice — once for financial assistance and once for technical assistance. These are the individuals who would be double counted if we simply added together the participation totals for the two activities.

(Red = Participated in financial assistance only; Blue = Participated in technical assistance only; Purple = Participated in both activities) 

To eliminate double counting, we can simply expand the SQL to take the “MAX” of each column that represents a reporting period (FYs and LoPs). With the “MAX” function, individuals who participated in both activities will still return a value of “1.”

SQL to Eliminate Double Counting from the Consolidated Participant List

SELECT [Participant ID], MAX(FY17) AS FY17, MAX(FY18) AS FY18, MAX(FY19) AS FY19, MAX(FY20) AS FY20, MAX(LoP) AS LoP FROM (SELECT [Participant Assisted], ‘Technical Assistance’ AS Activity, MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Technical Assistance] INNER JOIN [Date Reference] ON [Technical Assistance].[Date of Assistance]=[Date Reference].[Date Check] GROUP BY [Participant Assisted] UNION SELECT [Participant Assisted], ‘Financial Assistance’ AS Activity, MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Financial Assistance] INNER JOIN [Date Reference] ON [Financial Assistance].[Disbursement Date]=[Date Reference].[Date Check] GROUP BY [Participant Assisted]) AS Combined_List GROUP BY [Participant ID];

We now see in the above table that individuals listed twice in the first merged query are now only listed once. (A “1” means they participated in either financial or technical assistance.)

With this reconciled list, we can now expand the query to simply do a “SUM” of each of the FY and LoP columns to come up with the distinct number of participants.

SQL for Final Distinct Count of Participants

SELECT SUM(FY17) AS FY17, SUM(FY18) AS FY18, SUM(FY19) AS FY19, SUM(FY20) AS FY20, SUM(LoP) AS LoP FROM (SELECT [Participant ID], MAX(FY17) AS FY17, MAX(FY18) AS FY18, MAX(FY19) AS FY19, MAX(FY20) AS FY20, MAX(LoP) AS LoP FROM (SELECT [Participant Assisted], ‘Technical Assistance’ AS Activity, MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Technical Assistance] INNER JOIN [Date Reference] ON [Technical Assistance].[Date of Assistance]=[Date Reference].[Date Check] GROUP BY [Participant Assisted] UNION SELECT [Participant Assisted], ‘Financial Assistance’ AS Activity, MAX([Date Reference].FY17) AS FY17, MAX([Date Reference].FY18) AS FY18, MAX([Date Reference].FY19) AS FY19, MAX([Date Reference].FY20) AS FY20, MAX([Date Reference].LoP) AS LoP FROM [Financial Assistance] INNER JOIN [Date Reference] ON [Financial Assistance].[Disbursement Date]=[Date Reference].[Date Check] GROUP BY [Participant Assisted]) AS Combined_List GROUP BY [Participant ID]) AS Unique_List;

 

 

You now have reliable, precise counts that will output automatically  and in real time for the entire duration of your project. As long as you can ensure data entry is done in a timely manner, the generation of reliable numbers for your reports will be seamless and precise!

View all blogs in the Data Digest series