SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Collapsing dataset while keeping other variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 88
Accepted Solution

Collapsing dataset while keeping other variables

I am trying to collapse my data using proc sql. However, i noticed that when I tried to collapse my data I lost a bunch of variables that I wanted to keep. I am trying to collapse my data based on the variable MRN. The other variables I want to keep are CITY and SITE and these are constant for each unique MRN, so collapsing them should be fine.

 

Here is the code I am using

proc sql;
    create table collapsed_data  as 
		select distinct mrn,
			sum(msk_tx_yes) as msk_tx_yes,
			sum(msk_cancel_tx_yes) as msk_cancel_tx_yes,
        	sum(msk_ca_yes) as msk_ca_yes, 
			sum(msk_cancel_ca_yes) as msk_cancel_ca_yes, 
			sum(msk_dc_yes) as msk_dc_yes,
			sum(conc_psych_tx_yes) as conc_psych_tx_yes,
			sum(conc_psych_ca_yes) as conc_psych_ca_yes,
			sum (conc_psych_dc_yes) as conc_psych_dc_yes,
			sum (conc_yes) as conc_yes,
			sum (psych_yes) as psych_yes,
			sum (foot_prog) as foot_prog,
			sum (hand_prog) as hand_prog,
			sum (surg_prog) as surg_prog,
			sum (sx_yes) as sx_yes
        from temp_collapsed_data
        group by mrn;
quit;

I'm not sure how to use the SELECT and DISTINCT functions together. 

 

I thought maybe I could add the variables CITY and STATE after SELECT, while keeping DISTINCT but it doens't sem to work.

 

Anyone know how I can achieve this?

 

 

 

 

 

 

 

 

 

 

 

 


Accepted Solutions
Solution
‎07-05-2016 01:57 PM
PROC Star
Posts: 1,167

Re: Collapsing dataset while keeping other variables

Posted in reply to christinagting0

Okay, time for Plan B

 

It's odd that you don't have access to the SASHELP datasets; you should ask your SAS administrator to set that up. They are very convenient for things like this.

 

However, we can just create one on the fly, and use it to test your code.

 

When I run:

 

data class;

input Name $ Sex $ Age Height Weight;

cards;

Alfred M 14 69 112.5

Alice F 13 56.5 84

Barbara F 13 65.3 98

Carol F 14 62.8 102.5

Henry M 14 63.5 102.5

James M 12 57.3 83

Jane F 12 59.8 84.5

Janet F 15 62.5 112.5

Jeffrey M 13 62.5 84

John M 12 59 99.5

Joyce F 11 51.3 50.5

Judy F 14 64.3 90

Louise F 12 56.3 77

Mary F 15 66.5 112

Philip M 16 72 150

Robert M 12 64.8 128

Ronald M 15 67 133

Thomas M 11 57.5 85

William M 15 66.5 112

run;

proc sql;

create table collapsed_data as

select Sex,

max(Name) as Name,

sum(Age) as msk_tx_yes

from class

group by Sex;

quit;

 

the result dataset looks like

 

Sex Name msk_tx_yes
F Mary 119
M William 134

 

with just one line per unique value of Sex.

 

What happens with you?

 

Tom

View solution in original post


All Replies
PROC Star
Posts: 1,167

Re: Collapsing dataset while keeping other variables

Posted in reply to christinagting0

Looks really good.

 

First, get rid of the "distinct" on mrn. The fact that you're grouping it will result in distinct values.

 

Next, as long as you're SURE that CITY and SITE are constant, you can just do a min or a max function on them, along with all of your SUM functions.

 

Tom

Frequent Contributor
Posts: 88

Re: Collapsing dataset while keeping other variables

[ Edited ]

but I need the distinct b/c I want to collapse my data on DISTINCT/UNIQUE MRN values while still keeping my other variables that i am not summing for.

 

Also, I can't sum CITY and SITE variables b/c they are character values.

 

Is there any way to keep CITY and SITE while still using DISTINCT and collapsing on unique MRNs?

 

I don't really want to make dummy codes for CITY and SITE if I don't have to.

 

 

 

PROC Star
Posts: 1,167

Re: Collapsing dataset while keeping other variables

Posted in reply to christinagting0

Try playing around with this code to see if it's going where you want:

 

proc sql;

create table collapsed_data as

select Subsidiary,

max(Region) as Region,

sum(Stores) as msk_tx_yes,

sum(Sales) as msk_cancel_tx_yes

from sashelp.Shoes

group by Subsidiary;

quit;

 

1. If you use MAX or MIN on a character variable, not SUM, it will work.

2. Doing GROUP BY MRN will summarize by MRN, and keep only one line per MRN.

 

Tom

 

 

 

Frequent Contributor
Posts: 88

Re: Collapsing dataset while keeping other variables

Thanks for replying Tom

 

I unfortunately don't have the sashelp.shoes database for some reason. but I tried it on my own dataset substituting the values I wanted. 

 

Groupin by MRN DOES NOT seem to keep unique values (i.e., one unique MRN per line). Does this work for you???

 

Nothing really seems to happen in my dataset when I use the code you provided. I also tried with another dataset but grouping does not create distinct MRNs....

Solution
‎07-05-2016 01:57 PM
PROC Star
Posts: 1,167

Re: Collapsing dataset while keeping other variables

Posted in reply to christinagting0

Okay, time for Plan B

 

It's odd that you don't have access to the SASHELP datasets; you should ask your SAS administrator to set that up. They are very convenient for things like this.

 

However, we can just create one on the fly, and use it to test your code.

 

When I run:

 

data class;

input Name $ Sex $ Age Height Weight;

cards;

Alfred M 14 69 112.5

Alice F 13 56.5 84

Barbara F 13 65.3 98

Carol F 14 62.8 102.5

Henry M 14 63.5 102.5

James M 12 57.3 83

Jane F 12 59.8 84.5

Janet F 15 62.5 112.5

Jeffrey M 13 62.5 84

John M 12 59 99.5

Joyce F 11 51.3 50.5

Judy F 14 64.3 90

Louise F 12 56.3 77

Mary F 15 66.5 112

Philip M 16 72 150

Robert M 12 64.8 128

Ronald M 15 67 133

Thomas M 11 57.5 85

William M 15 66.5 112

run;

proc sql;

create table collapsed_data as

select Sex,

max(Name) as Name,

sum(Age) as msk_tx_yes

from class

group by Sex;

quit;

 

the result dataset looks like

 

Sex Name msk_tx_yes
F Mary 119
M William 134

 

with just one line per unique value of Sex.

 

What happens with you?

 

Tom

Frequent Contributor
Posts: 88

Re: Collapsing dataset while keeping other variables

Tom THANK YOU SO MUCH! it worked...

 

I'm not sure what I was doing wrong initially...took a lunch break and when I came back everthing works as you said!

 

Must have been coding for too long.

 

thank you for taking the time to create that dataset for me and helping me out. Really apprecaite it.

PROC Star
Posts: 1,167

Re: Collapsing dataset while keeping other variables

Posted in reply to christinagting0

Glad you're back in business!

 

Tom

Community Manager
Posts: 23

Re: Collapsing dataset while keeping other variables

Posted in reply to christinagting0

Hello christinagting0,

 

I'm glad you found some useful information. If one of the replies was the exact solution to your problem, can you please "Accept it as a solution"? Or if one was particularly helpful, feel free to "Like" it. This will help other community members who may run into the same issue know what worked.

 

Thanks,

Laura

Frequent Contributor
Posts: 88

Re: Collapsing dataset while keeping other variables

Posted in reply to lauralawton

Thanks Laura, just did it!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 708 views
  • 7 likes
  • 3 in conversation