DATA Step, Macro, Functions and more

Add missing rows for several subgroups

Reply
Occasional Contributor
Posts: 6

Add missing rows for several subgroups

Hello communitySmiley Happy

 

Some background Information for my question:

There a five diagnosis: A,B,C,D,E and four age groups: 0-4, 5-9, 10-15, 16-20 and there is a numerical variable for the number of patients in each group

 

Now i have the following problem. My table looks like this:

diagnosisage_groupnumber_patients
A0-43
A10-152
A5-915
A16-208
B16-201
C5-91
D0-41
D16-206
D5-92
E0-41

 

and i want my table look like this (including all Information even if its missing and in the right order regarding the age age groups):

 

diagnosisage_groupnumber_patients
A0-43
A5-915
A10-152
A16-208
B0-4.
B5-9.
B10-15.
B16-201
C0-4.
C5-91
C10-15.
C16-20.
D0-41
D5-92
D10-15.
D16-206
E0-41
E5-9.
E10-15.
E16-20

.

 

i hope someone could help me or link me to an answer!

 

Thank you for your effort!

Dynamike

 

 

Super User
Posts: 23,293

Re: Add missing rows for several subgroups

1. Use the SPARSE option within PROC FREQ to help fill out your tables

2. Add another variable that can control the order of the age groups. It's a character variable so it sorts alphabetically, or see the sort options in PROC SORT that allow it to not sort alphabetically.

 

proc freq data=have noprint;
table diagnosis*age_group/ out=filled_sparse sparse list;
weight number_patients; run; proc print data=filled_sparse; run;

It essentailly does a cross join between the diagnosis and age_group so any entry will be included. If you're missing an age group for all levels this method will not work. 

 

 

Occasional Contributor
Posts: 10

Re: Add missing rows for several subgroups

[ Edited ]
DATA HAVE;
input diagnosis $ age_group $ number_patients;
DATALINES;
A 0-4 3
A 10-15 2
A 5-9 15
A 16-20 8
B 16-20 1
C 5-9 1
D 0-4 1
D 16-20 6
D 5-9 2
E 0-4 1
;
RUN;


data Categories;

FORMAT  Diagnosis $ 1.
		Age_group $ 8.;


DO Diagnosis = 'A','B','C','D','E';
	DO Age_Group = '0-4','5-9','10-15','16-20';
	OUTPUT;
	end;
end;

RUN;

Proc sql;

CREATE Table want AS
	SELECT cat.Diagnosis, cat.Age_group, h.number_patients
	FROM work.Categories AS Cat
	LEFT JOIN work.have AS h ON h.Diagnosis = cat.diagnosis
	AND h.Age_group = Cat.Age_group
	
;
QUIT;

I am assuming you already have the result and that it is just a case of adding the missing values and having it all sorted. This achieves that although there are most likely better answers out there.

Valued Guide
Posts: 559

Re: Add missing rows for several subgroups

Something like this would work:

data have;
infile datalines dlm='09'x missover;
input diagnosis :$	age_group :$	number_patients;
datalines;
A	0-4	3
A	10-15	2
A	5-9	15
A	16-20	8
B	16-20	1
C	5-9	1
D	0-4	1
D	16-20	6
D	5-9	2
E	0-4	1
;
run;

proc sql;
select a.*,b.number_patients
from (select * from (select distinct diagnosis from have),
	 (select distinct age_group from have)) a
left join	have b on a.diagnosis=b.diagnosis and a.age_group=b.age_group

;

quit;
Thanks,
Suryakiran
Super User
Posts: 13,321

Re: Add missing rows for several subgroups

Do you actually want to add records to a data set or just display the existing data in the desired format?

PROC Star
Posts: 1,580

Re: Add missing rows for several subgroups

data have;
input diagnosis	$ age_group	$ number_patients;
cards;
A	0-4	3
A	10-15	2
A	5-9	15
A	16-20	8
B	16-20	1
C	5-9	1
D	0-4	1
D	16-20	6
D	5-9	2
E	0-4	1
;

data want;
if _n_=1 then do;
if 0 then set have;
  dcl hash H (dataset:'have') ;
   h.definekey  ("age_group") ;
   h.definedata ("age_group") ;
   h.definedone () ;
   declare hiter iter('h');
 dcl hash H1 () ;
   h1.definekey  ("age_group") ;
   h1.definedone () ;
 end;
h1.clear();
do until(last.diagnosis);
	set have;
	by diagnosis;
	if h.check()=0 and h1.num_items ne h.num_items then do; h1.add();output;end;
	if last.diagnosis then do;
	rc = iter.first();
	do while (rc = 0);
 	 if h1.check() ne 0 then do;call missing(number_patients); output;end;
  	rc = iter.next();
	end;
	end;
end;
drop rc;
run;
Ask a Question
Discussion stats
  • 5 replies
  • 159 views
  • 1 like
  • 6 in conversation