Release is 9.4. I'm using PROC FORMAT to map a continuous variable into groups. The format has 4 levels while the input data values fall into 2 of the 4 levels. I want PROC SQL to display all 4 rows including rows where all values for the aggregation functions are missing. Instead I get just 2 rows. How do I get PROC SQL to display the rows where all values for the row are missing?
data df;
infile cards dlm = ',';
length district $ 3
sector $ 5
name $ 8;
input district $
sector $
name $
pre
post
age;
list;
datalines;
I, North, Patton, 17, 27, 22
I, South, Joyner, 13, 22, 19
I, East, Williams, 111, 121, 29
I, West, Jurat, 51, 55, 22
II, North, Aden, 71, 70, 17
II, South, Tanner, 113, 122, 32
II, East, Jenkins, 99, 99, 24
II, West, Milner, 15, 65, 22
III, North, Chang, 69, 101, 21
III, South, Gupta, 11, 22, 21
III, East, Haskins, 45, 41, 19
III, West, LeMay, 35, 69, 20
III, West, LeMay, 35, 69, 20
;;;;
proc format;
value age_fmt
0 - 25 = '0-25'
26 - 50 = '26-50'
51 - 75 = '51-75'
76 - high = 'Over 75';
run;
proc sql;
select unique put(age, age_fmt.) label = 'Group'
, count(age) label = 'Count'
, mean(age) label = 'Mean'
, max(age) label = 'Max'
, mean(age) label = 'Mean'
from df
group by put(age, age_fmt.);
quit;
Do you want to accomplish this in one pass of sql when there are no records that fall to the other age category with the existing set up?
I am thinking , how about cntlin your proc format and apply as you did and create cntlout of the proc format full set and join the two in the same query ? Sorry if this doesn't make sense.
Proc SQL will not create data where none exists.
You may want to look at proc SUMMARY, and more specifically at its CLASSDATA option.
I think @novinosrin describes something like this :
data df;
infile cards dlm = ',';
length district $ 3
sector $ 5
name $ 8;
input district $
sector $
name $
pre
post
age;
list;
datalines;
I, North, Patton, 17, 27, 22
I, South, Joyner, 13, 22, 19
I, East, Williams, 111, 121, 29
I, West, Jurat, 51, 55, 22
II, North, Aden, 71, 70, 17
II, South, Tanner, 113, 122, 32
II, East, Jenkins, 99, 99, 24
II, West, Milner, 15, 65, 22
III, North, Chang, 69, 101, 21
III, South, Gupta, 11, 22, 21
III, East, Haskins, 45, 41, 19
III, West, LeMay, 35, 69, 20
III, West, LeMay, 35, 69, 20
;;;;
proc format cntlout=groups;
value age_fmt
0 - 25 = '0-25'
26 - 50 = '26-50'
51 - 75 = '51-75'
76 - high = 'Over 75';
run;
proc sql;
select
a.label label = 'Group',
count(b.age) label = 'Count',
mean(b.age) label = 'Mean',
max(b.age) label = 'Max',
mean(b.age) label = 'Mean'
from
groups as a left join
df as b on a.label = put(b.age, age_fmt.)
group by a.label;
quit;
Yes boss @PGStats indeed sir. 10.30 pm at chicago , little too many pints with my mates to demo, but who better than Prodigygeniusstats aka PGstats. Thanks again . Good night!
Cheers! Good night!
Spot on! Thank you both.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.