Solved: Re: PROC SQL not displaying rows where all values are missing

rbetancourt · Posted 09-18-2018 09:57 PM

Release is 9.4. I'm using PROC FORMAT to map a continuous variable into groups. The format has 4 levels while the input data values fall into 2 of the 4 levels. I want PROC SQL to display all 4 rows including rows where all values for the aggregation functions are missing. Instead I get just 2 rows. How do I get PROC SQL to display the rows where all values for the row are missing?

data df;
infile cards dlm = ',';
length district $ 3
       sector $ 5
	   name $ 8;
input district $
      sector $
	  name $
	  pre
	  post
	  age;
list;
datalines; 
I,   North, Patton,   17,  27,  22
I,   South, Joyner,   13,  22,  19
I,   East,  Williams, 111, 121, 29
I,   West,  Jurat,    51,  55,  22
II,  North, Aden,     71,  70,  17
II,  South, Tanner,   113, 122, 32
II,  East,  Jenkins,  99,  99,  24
II,  West,  Milner,   15,  65,  22
III, North, Chang,    69,  101, 21
III, South, Gupta,    11,  22,  21
III, East,  Haskins,  45,  41,  19
III, West,  LeMay,    35,  69,  20
III, West,  LeMay,    35,  69,  20
;;;;

proc format;
    value age_fmt 
        0 - 25  = '0-25'
        26 - 50 = '26-50'
		51 - 75 = '51-75'
		76 - high = 'Over 75';
run; 
proc sql;
   select unique put(age, age_fmt.) label = 'Group'  
         , count(age) label = 'Count' 
         , mean(age)  label = 'Mean'
		 , max(age)   label = 'Max'
		 , mean(age)  label = 'Mean'
		 
   from df
   group by put(age, age_fmt.);
   quit;

PGStats · Posted 09-18-2018 11:43 PM

Cheers! Good night!

PG

View solution in original post

novinosrin · Posted 09-18-2018 10:45 PM

Do you want to accomplish this in one pass of sql when there are no records that fall to the other age category with the existing set up?

I am thinking , how about cntlin your proc format and apply as you did and create cntlout of the proc format full set and join the two in the same query ? Sorry if this doesn't make sense.

rbetancourt · Posted 09-18-2018 11:05 PM

Thank you, novinosrin. I was hoping for a single SQL query with the present
approach. Your idea does make sense and I may need to consider it.

Best,

Randy

ChrisNZ · Posted 09-18-2018 10:54 PM

Proc SQL will not create data where none exists.

You may want to look at proc SUMMARY, and more specifically at its CLASSDATA option.

High-Performance SAS Coding - Third Edition

PGStats · Posted 09-18-2018 11:29 PM

I think @novinosrin describes something like this :

data df;
infile cards dlm = ',';
length district $ 3
       sector $ 5
	   name $ 8;
input district $
      sector $
	  name $
	  pre
	  post
	  age;
list;
datalines; 
I,   North, Patton,   17,  27,  22
I,   South, Joyner,   13,  22,  19
I,   East,  Williams, 111, 121, 29
I,   West,  Jurat,    51,  55,  22
II,  North, Aden,     71,  70,  17
II,  South, Tanner,   113, 122, 32
II,  East,  Jenkins,  99,  99,  24
II,  West,  Milner,   15,  65,  22
III, North, Chang,    69,  101, 21
III, South, Gupta,    11,  22,  21
III, East,  Haskins,  45,  41,  19
III, West,  LeMay,    35,  69,  20
III, West,  LeMay,    35,  69,  20
;;;;

proc format cntlout=groups;
    value age_fmt 
        0 - 25  = '0-25'
        26 - 50 = '26-50'
		51 - 75 = '51-75'
		76 - high = 'Over 75';
run; 

proc sql;
select 
    a.label      label = 'Group',
    count(b.age) label = 'Count',
    mean(b.age)  label = 'Mean',
	max(b.age)   label = 'Max',
	mean(b.age)  label = 'Mean'
from 
    groups as a left join
    df as b on a.label = put(b.age, age_fmt.)
group by a.label;
quit;

PG

novinosrin · Posted 09-18-2018 11:33 PM

Yes boss @PGStats indeed sir. 10.30 pm at chicago , little too many pints with my mates to demo, but who better than Prodigygeniusstats aka PGstats. Thanks again . Good night!

PGStats · Posted 09-18-2018 11:43 PM

Cheers! Good night!

PG

rbetancourt · Posted 09-19-2018 12:06 AM

Spot on! Thank you both.

Registration is open

SAS Training: Just a Click Away