BookmarkSubscribeRSS Feed
purvaj
Calcite | Level 5

I am trying to get the dataset on the right to have each age category from the left dataset (4-18) displayed within each mssa_id. So each mssa_id would have agecat 4-18 although not all agecat within each mssa_id in the new merged dataset will have a totcases populated along with other variables from the right dataset. The reason being is that the strata (agecat) need to be identical in order to run the proc stdrate procedure. I have tried doing different join procedures, and no matter what the end result is a dataset with only the rows from the left dataset.

 

One thing to note is that the left dataset is 18 rows, but the right dataset has data on different agecats (again not each agecat exists yet for each mssa_id in this dataset) for each mssa_id and is much longer.

 

Screenshot 2018-07-11 21.58.58.png

 

 

proc sql;

createtable num2 as

select a.*, b.*

from stdpop a left join num b

on a.agecat = b.agecat

order by mssa_id, agecat;

quit;

 

Screenshot 2018-07-11 21.56.40.png

10 REPLIES 10
Kurt_Bremser
Super User

Please post example data in a usable form (data steps with datalines, see my footnotes for help in converting datasets to those). That way we get something usable for testing by doing a simple copy/paste and submit. Don't expect we have the time to tediously type data off pictures.

proc sql;
create table num2 as
select a.*, b.*
from stdpop a left join num b
on a.agecat = b.agecat
order by mssa_id, agecat;
quit;

Using the asterisks in this way is sloppy programming. Since at least agecat is present in both datasets, you'll get a WARNING for that. Use an explicit list of variables, and identify their source by using the proper alias for each. The same has to happen in the order by clause.

purvaj
Calcite | Level 5

 

data work.STDPOP;
  infile datalines dsd truncover;
  input age:$17. agecat:BEST. stdpop:BEST. stdpop_py:32.;
  format agecat BEST. stdpop BEST.;
  label age="age" agecat="agecat" stdpop="stdpop";
datalines4;
15 to 19 years,4,10367000,145138000
20 to 24 years,5,10470000,146580000
25 to 29 years,6,10525000,147350000
30 to 34 years,7,9871000,138194000
35 to 39 years,8,9986000,139804000
40 to 44 years,9,10387000,145418000
45 to 49 years,10,11435000,160090000
50 to 54 years,11,11169000,156366000
55 to 59 years,12,9854000,137956000
60 to 64 years,13,8556000,119784000
65 to 69 years,14,6349000,88886000
70 to 74 years,15,4873000,68222000
75 to 79 years,16,4161000,58254000
80 to 84 years,17,3375000,47250000
85 years and over,18,3062000,42868000
;;;;

data work.NUM;
  infile datalines dsd truncover;
  input MSSA_ID:$10. AGECAT:4. totcases:32. pop:32. pop_py:32.;
  format AGECAT 4.;
  label AGECAT="Age category at diagnosis";
datalines4;
1.1,8,3,1830,25620
1.1,9,3,2088,29232
1.1,10,2,2439,34146
1.1,11,4,2547,35658
1.1,12,1,1480,20720
1.1,14,1,1113,15582
1.1,16,1,596,8344
1.2,7,1,1132,15848
1.2,9,3,1502,21028
1.2,10,2,2000,28000
1.2,11,1,1766,24724
1.2,12,2,1298,18172
1.2,14,1,604,8456
1.2,16,1,457,6398
10,5,4,1690,23660
10,6,2,1427,19978
10,7,14,1239,17346
10,8,9,1159,16226
10,9,15,1188,16632
10,10,10,1591,22274
10,11,14,1800,25200
10,12,15,1353,18942
10,13,9,1632,22848
10,14,6,1274,17836
10,15,5,1183,16562
10,16,2,809,11326
10,17,2,594,8316
10,18,3,758,10612
100,6,1,86,1204
100,7,2,69,966
100,8,3,51,714
100,9,8,29,406
100,10,1,50,700
100,11,6,41,574
100,12,1,100,1400
100,13,6,87,1218
100,15,2,5,70
102,6,1,205,2870
102,7,1,184,2576
102,8,3,141,1974
102,9,5,181,2534
102,10,7,190,2660
102,11,4,284,3976
102,12,1,270,3780
102,13,2,206,2884
102,14,1,213,2982
102,15,1,159,2226
102,17,1,61,854
103,6,1,330,4620
103,7,2,235,3290
;;;;

 

I attached the stdpop dataset (first) and the num dataset (second) using the macro. The num dataset is the one in which I need each agecat displayed 4-18 within each mssa_id. The totcases, pop and pop_py would appear as . for some rows after merging with stdpop.

 

Kurt_Bremser
Super User
proc sql;
create table num2 as
select
  a.*,
  b.mssa_id,
  b.totcases,
  b.pop,
  b.pop_py
from stdpop a left join num b
on a.agecat = b.agecat
order by b.mssa_id, a.agecat;
quit;

Since you have no entry for agecat 4 in dataset num, the values for agecat 4 will be missing.

purvaj
Calcite | Level 5

This did not work for me, my data comes out as follows. I had deleted the existing num2 dataset and run the code you provided.

data work.NUM2;
  infile datalines dsd truncover;
  input age:$17. agecat:BEST. stdpop:BEST. stdpop_py:32. MSSA_ID:$10. totcases:32. pop:32. pop_py:32.;
  format agecat BEST. stdpop BEST.;
  label age="age" agecat="agecat" stdpop="stdpop";
datalines4;
35 to 39 years,8,9986000,139804000,1.1,3,1830,25620
40 to 44 years,9,10387000,145418000,1.1,3,2088,29232
45 to 49 years,10,11435000,160090000,1.1,2,2439,34146
50 to 54 years,11,11169000,156366000,1.1,4,2547,35658
55 to 59 years,12,9854000,137956000,1.1,1,1480,20720
65 to 69 years,14,6349000,88886000,1.1,1,1113,15582
75 to 79 years,16,4161000,58254000,1.1,1,596,8344
30 to 34 years,7,9871000,138194000,1.2,1,1132,15848
40 to 44 years,9,10387000,145418000,1.2,3,1502,21028
45 to 49 years,10,11435000,160090000,1.2,2,2000,28000
50 to 54 years,11,11169000,156366000,1.2,1,1766,24724
55 to 59 years,12,9854000,137956000,1.2,2,1298,18172
65 to 69 years,14,6349000,88886000,1.2,1,604,8456
75 to 79 years,16,4161000,58254000,1.2,1,457,6398
20 to 24 years,5,10470000,146580000,10,4,1690,23660
25 to 29 years,6,10525000,147350000,10,2,1427,19978
30 to 34 years,7,9871000,138194000,10,14,1239,17346
35 to 39 years,8,9986000,139804000,10,9,1159,16226
40 to 44 years,9,10387000,145418000,10,15,1188,16632
45 to 49 years,10,11435000,160090000,10,10,1591,22274
50 to 54 years,11,11169000,156366000,10,14,1800,25200
55 to 59 years,12,9854000,137956000,10,15,1353,18942
60 to 64 years,13,8556000,119784000,10,9,1632,22848
65 to 69 years,14,6349000,88886000,10,6,1274,17836
70 to 74 years,15,4873000,68222000,10,5,1183,16562
75 to 79 years,16,4161000,58254000,10,2,809,11326
80 to 84 years,17,3375000,47250000,10,2,594,8316
85 years and over,18,3062000,42868000,10,3,758,10612
25 to 29 years,6,10525000,147350000,100,1,86,1204
30 to 34 years,7,9871000,138194000,100,2,69,966
35 to 39 years,8,9986000,139804000,100,3,51,714
40 to 44 years,9,10387000,145418000,100,8,29,406
45 to 49 years,10,11435000,160090000,100,1,50,700
50 to 54 years,11,11169000,156366000,100,6,41,574
55 to 59 years,12,9854000,137956000,100,1,100,1400
60 to 64 years,13,8556000,119784000,100,6,87,1218
70 to 74 years,15,4873000,68222000,100,2,5,70
25 to 29 years,6,10525000,147350000,102,1,205,2870
30 to 34 years,7,9871000,138194000,102,1,184,2576
35 to 39 years,8,9986000,139804000,102,3,141,1974
40 to 44 years,9,10387000,145418000,102,5,181,2534
45 to 49 years,10,11435000,160090000,102,7,190,2660
50 to 54 years,11,11169000,156366000,102,4,284,3976
55 to 59 years,12,9854000,137956000,102,1,270,3780
60 to 64 years,13,8556000,119784000,102,2,206,2884
65 to 69 years,14,6349000,88886000,102,1,213,2982
70 to 74 years,15,4873000,68222000,102,1,159,2226
80 to 84 years,17,3375000,47250000,102,1,61,854
25 to 29 years,6,10525000,147350000,103,1,330,4620
30 to 34 years,7,9871000,138194000,103,2,235,3290
;;;;
Kurt_Bremser
Super User

Keep in mind that "did not work" on its own is one of the most useless statements one can make. Please state IN DETAIL what you expected from your data and code, and where the result does not meet your expectations.

purvaj
Calcite | Level 5

Well the goal is still the same as my original post above: "I am trying to get the dataset on the right to have each age category from the left dataset (4-18) displayed within each mssa_id."

Kurt_Bremser
Super User

And I already gave you the answer to that:

 

"Since you have no entry for agecat 4 in dataset num, the values for agecat 4 will be missing."

 

Ex:

data work.NUM;
  infile datalines dsd truncover;
  input MSSA_ID:$10. AGECAT:4. totcases:32. pop:32. pop_py:32.;
  format AGECAT 4.;
  label AGECAT="Age category at diagnosis";
datalines4;
1.1,8,3,1830,25620
1.1,9,3,2088,29232
1.1,10,2,2439,34146
1.1,11,4,2547,35658
1.1,12,1,1480,20720
1.1,14,1,1113,15582
1.1,16,1,596,8344
1.2,7,1,1132,15848
1.2,9,3,1502,21028
1.2,10,2,2000,28000
1.2,11,1,1766,24724
1.2,12,2,1298,18172
1.2,14,1,604,8456
1.2,16,1,457,6398
10,5,4,1690,23660
10,6,2,1427,19978
10,7,14,1239,17346
10,8,9,1159,16226
10,9,15,1188,16632
10,10,10,1591,22274
10,11,14,1800,25200
10,12,15,1353,18942
10,13,9,1632,22848
10,14,6,1274,17836
10,15,5,1183,16562
10,16,2,809,11326
10,17,2,594,8316
10,18,3,758,10612
100,6,1,86,1204
100,7,2,69,966
100,8,3,51,714
100,9,8,29,406
100,10,1,50,700
100,11,6,41,574
100,12,1,100,1400
100,13,6,87,1218
100,15,2,5,70
102,6,1,205,2870
102,7,1,184,2576
102,8,3,141,1974
102,9,5,181,2534
102,10,7,190,2660
102,11,4,284,3976
102,12,1,270,3780
102,13,2,206,2884
102,14,1,213,2982
102,15,1,159,2226
102,17,1,61,854
103,6,1,330,4620
103,7,2,235,3290
;;;;
run;

proc freq data=num;
tables agecat;
run;

Result:

                     The FREQ Procedure

                 Age category at diagnosis

                                   Cumulative    Cumulative
AGECAT    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
     5           1        2.00             1         2.00  
     6           4        8.00             5        10.00  
     7           5       10.00            10        20.00  
     8           4        8.00            14        28.00  
     9           5       10.00            19        38.00  
    10           5       10.00            24        48.00  
    11           5       10.00            29        58.00  
    12           5       10.00            34        68.00  
    13           3        6.00            37        74.00  
    14           4        8.00            41        82.00  
    15           3        6.00            44        88.00  
    16           3        6.00            47        94.00  
    17           2        4.00            49        98.00  
    18           1        2.00            50       100.00  

No agecat 4

 

ballardw
Super User

I might suggest looking at the option PRELOADFMT with proc report after combining your data to display things that aren't actually in you data. This would require creating an appropriate format for the desired variable(s).

purvaj
Calcite | Level 5
Thank you I will try this out! The end goal is to get proc stdrate functioning. The procedure wants datasets have the same exact strata regardless of whether the age strata within the dataset that is getting standardized has cases in each stratum. When I ran my code and datasets on another computer it worked fine. SAS knew to skip those strata in the standardization process that did not exist.
Kurt_Bremser
Super User

@purvaj wrote:
Thank you I will try this out! The end goal is to get proc stdrate functioning. The procedure wants datasets have the same exact strata regardless of whether the age strata within the dataset that is getting standardized has cases in each stratum. When I ran my code and datasets on another computer it worked fine. SAS knew to skip those strata in the standardization process that did not exist.

So your problem is not with the SQL left join (which, as proven, works as it should), but with the ability of proc stdrate to deal with missing values.

You can deal with that in the SQL by taking care of the situation yourself:

proc sql;
create table num2 as
select
  a.*,
  case
    when b.agecat is missing
    then '1.1' /* set a default value that works for your analysis "/
    else b.mssa_id
  end as mssa_id,
  /* do similar for the rest of the columns you want */
from stdpop a left join num b
on a.agecat = b.agecat
order by b.mssa_id, a.agecat;
quit;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3952 views
  • 0 likes
  • 3 in conversation