Release is 9.4. I'm using PROC FORMAT to map a continuous variable into groups. The format has 4 levels while the input data values fall into 2 of the 4 levels. I want PROC SQL to display all 4 rows including rows where all values for the aggregation functions are missing. Instead I get just 2 rows. How do I get PROC SQL to display the rows where all values for the row are missing?
data df;
infile cards dlm = ',';
length district $ 3
sector $ 5
name $ 8;
input district $
sector $
name $
pre
post
age;
list;
datalines;
I, North, Patton, 17, 27, 22
I, South, Joyner, 13, 22, 19
I, East, Williams, 111, 121, 29
I, West, Jurat, 51, 55, 22
II, North, Aden, 71, 70, 17
II, South, Tanner, 113, 122, 32
II, East, Jenkins, 99, 99, 24
II, West, Milner, 15, 65, 22
III, North, Chang, 69, 101, 21
III, South, Gupta, 11, 22, 21
III, East, Haskins, 45, 41, 19
III, West, LeMay, 35, 69, 20
III, West, LeMay, 35, 69, 20
;;;;
proc format;
value age_fmt
0 - 25 = '0-25'
26 - 50 = '26-50'
51 - 75 = '51-75'
76 - high = 'Over 75';
run;
proc sql;
select unique put(age, age_fmt.) label = 'Group'
, count(age) label = 'Count'
, mean(age) label = 'Mean'
, max(age) label = 'Max'
, mean(age) label = 'Mean'
from df
group by put(age, age_fmt.);
quit;
Do you want to accomplish this in one pass of sql when there are no records that fall to the other age category with the existing set up?
I am thinking , how about cntlin your proc format and apply as you did and create cntlout of the proc format full set and join the two in the same query ? Sorry if this doesn't make sense.
Proc SQL will not create data where none exists.
You may want to look at proc SUMMARY, and more specifically at its CLASSDATA option.
I think @novinosrin describes something like this :
data df;
infile cards dlm = ',';
length district $ 3
sector $ 5
name $ 8;
input district $
sector $
name $
pre
post
age;
list;
datalines;
I, North, Patton, 17, 27, 22
I, South, Joyner, 13, 22, 19
I, East, Williams, 111, 121, 29
I, West, Jurat, 51, 55, 22
II, North, Aden, 71, 70, 17
II, South, Tanner, 113, 122, 32
II, East, Jenkins, 99, 99, 24
II, West, Milner, 15, 65, 22
III, North, Chang, 69, 101, 21
III, South, Gupta, 11, 22, 21
III, East, Haskins, 45, 41, 19
III, West, LeMay, 35, 69, 20
III, West, LeMay, 35, 69, 20
;;;;
proc format cntlout=groups;
value age_fmt
0 - 25 = '0-25'
26 - 50 = '26-50'
51 - 75 = '51-75'
76 - high = 'Over 75';
run;
proc sql;
select
a.label label = 'Group',
count(b.age) label = 'Count',
mean(b.age) label = 'Mean',
max(b.age) label = 'Max',
mean(b.age) label = 'Mean'
from
groups as a left join
df as b on a.label = put(b.age, age_fmt.)
group by a.label;
quit;
Yes boss @PGStats indeed sir. 10.30 pm at chicago , little too many pints with my mates to demo, but who better than Prodigygeniusstats aka PGstats. Thanks again . Good night!
Cheers! Good night!
Spot on! Thank you both.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.