BookmarkSubscribeRSS Feed
CollinIsmael
Calcite | Level 5

I am working on outputting a dataset that has been a cumbersome aggregation of individual data points and proc frequency outputs from more than a dozen datasets. I have almost a final table and I am stuck on one last piece to code and I can't seem to get it right.

My current data set looks like this (example data below). This table is a bit complicated as each cell value is a data point created by concatenating other variables.

*Note: the total number of subjects is 20 because we carry over the same subjects from year 1 to 2 so we only want to report unique subjects.

The cell listed below as "Currently Empty" is where I am having an issue. I have a separate dataset with only one value in it: "19 (21) 95%". I want that value to appear in the cell.

Any suggestions on how I can get this done without having to go back and recreate a lot of my other datasets that feed into this table?

 

Thanks!!

 

  Year 1 Number of Subjects (Number of Occurences per subject) Year 2 Number of Subjects (Number of Occurences per subject) total

Categoryn= 20n= 10Overall Total = 20 (unique subjects)
group 110 (11) 50%5 (5) 50%15 (16) 75%
group 22 (2) 10%2 (3) 20%4 (5) 20%
Total12 (13) 60%7 (8) 70%CURRENTLY EMPTY
1 REPLY 1
ed_sas_member
Meteorite | Level 14

Hi @CollinIsmael 

 

Here is an approach using proc sql and an UPDATE statement:

 

data have;
	infile datalines dlm="09"x dsd;
	input v1:$20.;
	datalines;	
"19 (21) 95%"
;
run;

data wanted;
	infile datalines dlm="09"x dsd;
	input Category:$50.	N1:$50.	N2:$50.	Overall:$50.;
	datalines;	
group 1	10 (11) 50%	5 (5) 50%	15 (16) 75%
group 2	2 (2) 10%	2 (3) 20%	4 (5) 20%
Total	12 (13) 60%	7 (8) 70%	
;
run;

proc print data=wanted;

proc sql;
	update wanted 
	set Overall = (select v1 from have)
	where Category="Total";
run;

proc print data=wanted;

Capture d’écran 2020-06-12 à 12.15.27.png

 

 

Another approach could be to use an UPDATE statement in a DATA step:

data have;
	infile datalines dlm="09"x dsd;
	input v1:$20.;
	datalines;	
"19 (21) 95%"
;
run;

data wanted;
	infile datalines dlm="09"x dsd;
	input Category:$50.	N1:$50.	N2:$50.	Overall:$50.;
	datalines;	
group 1	10 (11) 50%	5 (5) 50%	15 (16) 75%
group 2	2 (2) 10%	2 (3) 20%	4 (5) 20%
Total	12 (13) 60%	7 (8) 70%	
;
run;

data have2;
	set have;
	Category = 'Total';
	rename V1=Overall;
run;

proc sort data=wanted;
	by Category;
run;

data wanted;
	update wanted have2;
	by Category;
run;

Best,