I have created 2 different datasets by importing 2 excel spreadsheets. In dataset A I have active participants and in dataset B I have completed participants. Both completed and active variables are coded as 1 for true, 0 for false. I would like to create a table that will give me the count of active participants by program, count of completed participants by program, then a total of active and complete by program. Both dataset have most of the same variable names (column headings) but not all. What is the best way to go about doing this? I have attempted to merge datasets and sort by program but received a proc tabulate error of "The type of variable is unknown".
Is there a way to just add the one variable (or the one variable along with the program variable) to the data set?
or
Is there a way to create one variable from variables in 2 different datasets?
I have sas9.4.
Hope this is enough info.
Thanks for your help!
Looks like you have common variable PROGRAM in both datasets. If no other common variables to involve there, you could sort both datasets by PROGRAM and merge, then process data BY GROUP using first/last.variables to calculate group totals.
Eg:
data active;
input program:$1 Active:2;
cards;
a 0
a 1
a 0
b 1
b 0
b 1
c 0
c 1
c 0
;
proc print;run; 
data complete;
input program:$1 Completed:2;
cards;
a 1
a 0
a 1
b 0
b 1
b 0
c 1
c 0
c 1
;
proc print;run; 
data want;
	merge active complete;
	by program;
	retain active_total completed_total;
	if first.program then do;
		active_total=active;
		completed_total=completed;
	end; 
	else do;
		active_total+active;
		completed_total+completed;
	end; 
	if last.program;
	Program_total= sum(active_total,completed_total); 
	drop active completed; 
proc print;run; Both dataset have most of the same variable names (column headings) but not all. What is the best way to go about doing this?
Without you showing us a portion of both data sets, I don't think anyone can answer the question. Data should be provided according to these examples and instructions. Do not provide data in other forms. Do not ignore this paragraph.
Also, many of us will not download attachments. Please copy your SAS code as text and paste it into the window that appears when you click on the "little running man" icon.
PROC IMPORT OUT= WORK.active 
            DATAFILE= "C:\Users\Documents\report1" 
            DBMS=EXCEL REPLACE;
     SHEET=Active; 
RUN;
PROC IMPORT OUT= WORK.closed 
            DATAFILE= "C:\Users\Documents\report2" 
            DBMS=EXCEL REPLACE;
     SHEET=complete; 
     
RUN;
Data report1;
set WORK.active;
If Admission_Date > '31MAY2023'd  then active=0;
If Approved_End_Date = ' ' then active=1;
If Approved_End_Date < '31MAY2023'd  then active=0;
If Admission_Date < '31MAY2023'd  AND Approved_End_Date >=  '31MAY2023'd  then active=1;
Else active=0;
run;
Data report2;
set WORK.closed;
If Discharge_Date > '31MAY2023'd  then complete=1;
else complete=0;
Run; 
The 2 variables I want counts of are in different datasets so I think the datasets need to be merged unless there is another way to do this. I would like to have a count of active (from report1) per program and a count of completed (from report2) per program (the program variable is in both datasets). Both variables are coded 1/0, 1 for true, 0 for false. Then I would like to have the total for each program.
The table would look something like this.
Program Active Complete Total
A 2 0 2
B 1 1 2
C 1 1 2
I'm sorry but I cannot work with data in screen captures, or SAS code in screen captures. I was very specific about how to provide this information, but you did not follow the instructions. Please follow the instructions provided.
From your original description it sounded like you just wanted to know how many observations where in each dataset.
But perhaps there is more to it than that?
If you do not just want the total count then what is it you want a count of?
Is there any need to combine the two datasets? For what purpose?
Does either dataset have a variable that uniquely identifies the PARTICIPANT? or some combination of variables?
Is it possible that the same PARTICIPANT could appear in more than one of the datasets? If so what does that mean in terms of the counts you want to create?
Looks like you have common variable PROGRAM in both datasets. If no other common variables to involve there, you could sort both datasets by PROGRAM and merge, then process data BY GROUP using first/last.variables to calculate group totals.
Eg:
data active;
input program:$1 Active:2;
cards;
a 0
a 1
a 0
b 1
b 0
b 1
c 0
c 1
c 0
;
proc print;run; 
data complete;
input program:$1 Completed:2;
cards;
a 1
a 0
a 1
b 0
b 1
b 0
c 1
c 0
c 1
;
proc print;run; 
data want;
	merge active complete;
	by program;
	retain active_total completed_total;
	if first.program then do;
		active_total=active;
		completed_total=completed;
	end; 
	else do;
		active_total+active;
		completed_total+completed;
	end; 
	if last.program;
	Program_total= sum(active_total,completed_total); 
	drop active completed; 
proc print;run; It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
