Release is 9.4. I'm using PROC FORMAT to map a continuous variable into groups. The format has 4 levels while the input data values fall into 2 of the 4 levels. I want PROC SQL to display all 4 rows including rows where all values for the aggregation functions are missing. Instead I get just 2 rows. How do I get PROC SQL to display the rows where all values for the row are missing?
data df;
infile cards dlm = ',';
length district $ 3
sector $ 5
name $ 8;
input district $
sector $
name $
pre
post
age;
list;
datalines;
I, North, Patton, 17, 27, 22
I, South, Joyner, 13, 22, 19
I, East, Williams, 111, 121, 29
I, West, Jurat, 51, 55, 22
II, North, Aden, 71, 70, 17
II, South, Tanner, 113, 122, 32
II, East, Jenkins, 99, 99, 24
II, West, Milner, 15, 65, 22
III, North, Chang, 69, 101, 21
III, South, Gupta, 11, 22, 21
III, East, Haskins, 45, 41, 19
III, West, LeMay, 35, 69, 20
III, West, LeMay, 35, 69, 20
;;;;
proc format;
value age_fmt
0 - 25 = '0-25'
26 - 50 = '26-50'
51 - 75 = '51-75'
76 - high = 'Over 75';
run;
proc sql;
select unique put(age, age_fmt.) label = 'Group'
, count(age) label = 'Count'
, mean(age) label = 'Mean'
, max(age) label = 'Max'
, mean(age) label = 'Mean'
from df
group by put(age, age_fmt.);
quit;
Do you want to accomplish this in one pass of sql when there are no records that fall to the other age category with the existing set up?
I am thinking , how about cntlin your proc format and apply as you did and create cntlout of the proc format full set and join the two in the same query ? Sorry if this doesn't make sense.
Proc SQL will not create data where none exists.
You may want to look at proc SUMMARY, and more specifically at its CLASSDATA option.
I think @novinosrin describes something like this :
data df;
infile cards dlm = ',';
length district $ 3
sector $ 5
name $ 8;
input district $
sector $
name $
pre
post
age;
list;
datalines;
I, North, Patton, 17, 27, 22
I, South, Joyner, 13, 22, 19
I, East, Williams, 111, 121, 29
I, West, Jurat, 51, 55, 22
II, North, Aden, 71, 70, 17
II, South, Tanner, 113, 122, 32
II, East, Jenkins, 99, 99, 24
II, West, Milner, 15, 65, 22
III, North, Chang, 69, 101, 21
III, South, Gupta, 11, 22, 21
III, East, Haskins, 45, 41, 19
III, West, LeMay, 35, 69, 20
III, West, LeMay, 35, 69, 20
;;;;
proc format cntlout=groups;
value age_fmt
0 - 25 = '0-25'
26 - 50 = '26-50'
51 - 75 = '51-75'
76 - high = 'Over 75';
run;
proc sql;
select
a.label label = 'Group',
count(b.age) label = 'Count',
mean(b.age) label = 'Mean',
max(b.age) label = 'Max',
mean(b.age) label = 'Mean'
from
groups as a left join
df as b on a.label = put(b.age, age_fmt.)
group by a.label;
quit;
Yes boss @PGStats indeed sir. 10.30 pm at chicago , little too many pints with my mates to demo, but who better than Prodigygeniusstats aka PGstats. Thanks again . Good night!
Cheers! Good night!
Spot on! Thank you both.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.