Hello community:)
Some background Information for my question:
There a five diagnosis: A,B,C,D,E and four age groups: 0-4, 5-9, 10-15, 16-20 and there is a numerical variable for the number of patients in each group
Now i have the following problem. My table looks like this:
diagnosis | age_group | number_patients |
A | 0-4 | 3 |
A | 10-15 | 2 |
A | 5-9 | 15 |
A | 16-20 | 8 |
B | 16-20 | 1 |
C | 5-9 | 1 |
D | 0-4 | 1 |
D | 16-20 | 6 |
D | 5-9 | 2 |
E | 0-4 | 1 |
and i want my table look like this (including all Information even if its missing and in the right order regarding the age age groups):
diagnosis | age_group | number_patients |
A | 0-4 | 3 |
A | 5-9 | 15 |
A | 10-15 | 2 |
A | 16-20 | 8 |
B | 0-4 | . |
B | 5-9 | . |
B | 10-15 | . |
B | 16-20 | 1 |
C | 0-4 | . |
C | 5-9 | 1 |
C | 10-15 | . |
C | 16-20 | . |
D | 0-4 | 1 |
D | 5-9 | 2 |
D | 10-15 | . |
D | 16-20 | 6 |
E | 0-4 | 1 |
E | 5-9 | . |
E | 10-15 | . |
E | 16-20 | . |
i hope someone could help me or link me to an answer!
Thank you for your effort!
Dynamike
1. Use the SPARSE option within PROC FREQ to help fill out your tables
2. Add another variable that can control the order of the age groups. It's a character variable so it sorts alphabetically, or see the sort options in PROC SORT that allow it to not sort alphabetically.
proc freq data=have noprint;
table diagnosis*age_group/ out=filled_sparse sparse list;
weight number_patients;
run;
proc print data=filled_sparse;
run;
It essentailly does a cross join between the diagnosis and age_group so any entry will be included. If you're missing an age group for all levels this method will not work.
DATA HAVE;
input diagnosis $ age_group $ number_patients;
DATALINES;
A 0-4 3
A 10-15 2
A 5-9 15
A 16-20 8
B 16-20 1
C 5-9 1
D 0-4 1
D 16-20 6
D 5-9 2
E 0-4 1
;
RUN;
data Categories;
FORMAT Diagnosis $ 1.
Age_group $ 8.;
DO Diagnosis = 'A','B','C','D','E';
DO Age_Group = '0-4','5-9','10-15','16-20';
OUTPUT;
end;
end;
RUN;
Proc sql;
CREATE Table want AS
SELECT cat.Diagnosis, cat.Age_group, h.number_patients
FROM work.Categories AS Cat
LEFT JOIN work.have AS h ON h.Diagnosis = cat.diagnosis
AND h.Age_group = Cat.Age_group
;
QUIT;
I am assuming you already have the result and that it is just a case of adding the missing values and having it all sorted. This achieves that although there are most likely better answers out there.
Something like this would work:
data have;
infile datalines dlm='09'x missover;
input diagnosis :$ age_group :$ number_patients;
datalines;
A 0-4 3
A 10-15 2
A 5-9 15
A 16-20 8
B 16-20 1
C 5-9 1
D 0-4 1
D 16-20 6
D 5-9 2
E 0-4 1
;
run;
proc sql;
select a.*,b.number_patients
from (select * from (select distinct diagnosis from have),
(select distinct age_group from have)) a
left join have b on a.diagnosis=b.diagnosis and a.age_group=b.age_group
;
quit;
Do you actually want to add records to a data set or just display the existing data in the desired format?
data have;
input diagnosis $ age_group $ number_patients;
cards;
A 0-4 3
A 10-15 2
A 5-9 15
A 16-20 8
B 16-20 1
C 5-9 1
D 0-4 1
D 16-20 6
D 5-9 2
E 0-4 1
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have') ;
h.definekey ("age_group") ;
h.definedata ("age_group") ;
h.definedone () ;
declare hiter iter('h');
dcl hash H1 () ;
h1.definekey ("age_group") ;
h1.definedone () ;
end;
h1.clear();
do until(last.diagnosis);
set have;
by diagnosis;
if h.check()=0 and h1.num_items ne h.num_items then do; h1.add();output;end;
if last.diagnosis then do;
rc = iter.first();
do while (rc = 0);
if h1.check() ne 0 then do;call missing(number_patients); output;end;
rc = iter.next();
end;
end;
end;
drop rc;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.