I am trying to collapse my data using proc sql. However, i noticed that when I tried to collapse my data I lost a bunch of variables that I wanted to keep. I am trying to collapse my data based on the variable MRN. The other variables I want to keep are CITY and SITE and these are constant for each unique MRN, so collapsing them should be fine.
Here is the code I am using
proc sql; create table collapsed_data as select distinct mrn, sum(msk_tx_yes) as msk_tx_yes, sum(msk_cancel_tx_yes) as msk_cancel_tx_yes, sum(msk_ca_yes) as msk_ca_yes, sum(msk_cancel_ca_yes) as msk_cancel_ca_yes, sum(msk_dc_yes) as msk_dc_yes, sum(conc_psych_tx_yes) as conc_psych_tx_yes, sum(conc_psych_ca_yes) as conc_psych_ca_yes, sum (conc_psych_dc_yes) as conc_psych_dc_yes, sum (conc_yes) as conc_yes, sum (psych_yes) as psych_yes, sum (foot_prog) as foot_prog, sum (hand_prog) as hand_prog, sum (surg_prog) as surg_prog, sum (sx_yes) as sx_yes from temp_collapsed_data group by mrn; quit;
I'm not sure how to use the SELECT and DISTINCT functions together.
I thought maybe I could add the variables CITY and STATE after SELECT, while keeping DISTINCT but it doens't sem to work.
Anyone know how I can achieve this?
Okay, time for Plan B
It's odd that you don't have access to the SASHELP datasets; you should ask your SAS administrator to set that up. They are very convenient for things like this.
However, we can just create one on the fly, and use it to test your code.
When I run:
data class;
input Name $ Sex $ Age Height Weight;
cards;
Alfred M 14 69 112.5
Alice F 13 56.5 84
Barbara F 13 65.3 98
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84
John M 12 59 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90
Louise F 12 56.3 77
Mary F 15 66.5 112
Philip M 16 72 150
Robert M 12 64.8 128
Ronald M 15 67 133
Thomas M 11 57.5 85
William M 15 66.5 112
run;
proc sql;
create table collapsed_data as
select Sex,
max(Name) as Name,
sum(Age) as msk_tx_yes
from class
group by Sex;
quit;
the result dataset looks like
Sex | Name | msk_tx_yes |
F | Mary | 119 |
M | William | 134 |
with just one line per unique value of Sex.
What happens with you?
Tom
Looks really good.
First, get rid of the "distinct" on mrn. The fact that you're grouping it will result in distinct values.
Next, as long as you're SURE that CITY and SITE are constant, you can just do a min or a max function on them, along with all of your SUM functions.
Tom
but I need the distinct b/c I want to collapse my data on DISTINCT/UNIQUE MRN values while still keeping my other variables that i am not summing for.
Also, I can't sum CITY and SITE variables b/c they are character values.
Is there any way to keep CITY and SITE while still using DISTINCT and collapsing on unique MRNs?
I don't really want to make dummy codes for CITY and SITE if I don't have to.
Try playing around with this code to see if it's going where you want:
proc sql;
create table collapsed_data as
select Subsidiary,
max(Region) as Region,
sum(Stores) as msk_tx_yes,
sum(Sales) as msk_cancel_tx_yes
from sashelp.Shoes
group by Subsidiary;
quit;
1. If you use MAX or MIN on a character variable, not SUM, it will work.
2. Doing GROUP BY MRN will summarize by MRN, and keep only one line per MRN.
Tom
Thanks for replying Tom
I unfortunately don't have the sashelp.shoes database for some reason. but I tried it on my own dataset substituting the values I wanted.
Groupin by MRN DOES NOT seem to keep unique values (i.e., one unique MRN per line). Does this work for you???
Nothing really seems to happen in my dataset when I use the code you provided. I also tried with another dataset but grouping does not create distinct MRNs....
Okay, time for Plan B
It's odd that you don't have access to the SASHELP datasets; you should ask your SAS administrator to set that up. They are very convenient for things like this.
However, we can just create one on the fly, and use it to test your code.
When I run:
data class;
input Name $ Sex $ Age Height Weight;
cards;
Alfred M 14 69 112.5
Alice F 13 56.5 84
Barbara F 13 65.3 98
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84
John M 12 59 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90
Louise F 12 56.3 77
Mary F 15 66.5 112
Philip M 16 72 150
Robert M 12 64.8 128
Ronald M 15 67 133
Thomas M 11 57.5 85
William M 15 66.5 112
run;
proc sql;
create table collapsed_data as
select Sex,
max(Name) as Name,
sum(Age) as msk_tx_yes
from class
group by Sex;
quit;
the result dataset looks like
Sex | Name | msk_tx_yes |
F | Mary | 119 |
M | William | 134 |
with just one line per unique value of Sex.
What happens with you?
Tom
Tom THANK YOU SO MUCH! it worked...
I'm not sure what I was doing wrong initially...took a lunch break and when I came back everthing works as you said!
Must have been coding for too long.
thank you for taking the time to create that dataset for me and helping me out. Really apprecaite it.
Glad you're back in business!
Tom
Hello christinagting0,
I'm glad you found some useful information. If one of the replies was the exact solution to your problem, can you please "Accept it as a solution"? Or if one was particularly helpful, feel free to "Like" it. This will help other community members who may run into the same issue know what worked.
Thanks,
Laura
Thanks Laura, just did it!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.