BookmarkSubscribeRSS Feed
d0816
Quartz | Level 8

The sample dataset Have has multiple rows for an ID with the same or different session name.

Dataset: Have 
IDSessionNameDaysInSession
1AAA63
2BBB40
3CCC21
3BBB37
3BBB91
3BBB71
4AAA25
4AAA23
4BBB20
4BBB10
5BBB5
5CCC80
5AAA61
6CCC58
7AAA159
7AAA3
8BBB26
8CCC101

 

How to create two datasets where,

In dataset Want_1, I need three columns where operation I need to do is to add "DaysInSession" for the SAME "SessionName" for an ID. Operation Needed is shown in the 4th column.

 

In dataset Want_2, I need two columns where operation I need to do is add " "DaysInSession" for the ANY "SessionName" for an ID. Operation Needed is shown in the 3rd column.

Dataset: Want_1  
IDSessionNameDaysInSessionOperations Needed
1AAA6363
2BBB4040
3CCC2121
3BBB19937+91+71=199
4AAA4825+23=48
4BBB3020+10=30
5BBB55
5CCC8080
5AAA6161
6CCC5858
7AAA162159+3=162
8BBB2626
8CCC101

101

 

Dataset: Want_2 
IDDaysInSessionOperations Needed
16363
24040
322021+37+91+71=220
47825+23+20+10=78
51465+80+61=146
65858
7162159+3=162
812726+101=127

 

Thank you for the help.

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

May I ask why you want to do this? The required column is bound to be a character variable.

d0816
Quartz | Level 8
The "Operation Needed" Column is just to show the operation that needs to happen. I do not need that column in my dataset Want_1 and Want_2.



Thanks.


ballardw
Super User

And a different approach from @novinosrin

data have;
input ID $ SessionName $ DaysInSession ;
datalines;
1 AAA 63 
2 BBB 40 
3 CCC 21 
3 BBB 37 
3 BBB 91 
3 BBB 71 
4 AAA 25 
4 AAA 23 
4 BBB 20 
4 BBB 10 
5 BBB 5 
5 CCC 80 
5 AAA 61 
6 CCC 58 
7 AAA 159 
7 AAA 3 
8 BBB 26 
8 CCC 101 
;
run;

proc summary data=have;
   class id SessionName;
   var DaysInSession;
   output out=want sum =;
run;

Take a close look at the want data set variable _type_ and the results of the sum. You can select combinations of the CLASS variables for the desired sum using the _type_ variable. If you really must then you can separate the data but I likely wouldn't but would use the _type_ variable as a BY for processing and in a WHERE to select which ones I wanted later.

 

novinosrin
Tourmaline | Level 20

data have;
input ID	SessionName $	DaysInSession;
cards;
1	AAA	63
2	BBB	40
3	CCC	21
3	BBB	37
3	BBB	91
3	BBB	71
4	AAA	25
4	AAA	23
4	BBB	20
4	BBB	10
5	BBB	5
5	CCC	80
5	AAA	61
6	CCC	58
7	AAA	159
7	AAA	3
8	BBB	26
8	CCC	101
;

proc sql;
create table want1 as
select ID,	SessionName,sum(DaysInSession) as DaysInSession,sum(DaysInSession) as operations_need
from have
group by id,SessionName
order by id,sessionname;
quit;

proc sql;
create table want2 as
select ID,	sum(DaysInSession) as DaysInSession,sum(DaysInSession) as operations_need
from have
group by id;
quit;


d0816
Quartz | Level 8

Hi, I modified your code as following. Basically I selected all variables because I needed other variables in the original dataset which I did not show in the sample dataset.

proc sql;
create table want1 as
select  *, sum(DaysInSession) as DaysInSession
from have
group by id,SessionName
order by id,sessionname;
quit;

proc sql;
create table want2 as
select *, sum(DaysInSession) as DaysInSession
from have
group by id;
quit;

 

As a result, I got the following sample dataset from the first set of code. How to subset this dataset to keep only the rows with the " Keep" for an ID?  So keep all rows with different session for an ID and  If an ID have multiple rows with same session name then, keep the row with the earliest session begin date for an ID.

 

For e.g for ID #3,5,7 having multiple rows with same session name, keep the row with the earliest begin date. For ID #6, with different session name, keep both the rows. Any suggestion?

DateIDSessionSessionBeginDateTotalDaysinSessionKeep or Delete
2018-081MMM2018-08-1363Keep
2018-082OOO2018-08-2240Keep
2017-113OOO2017-11-30199Keep
2018-033OOO2018-03-02199 
2017-103OOO2017-10-23199 
2018-034MMM2018-03-0741Keep
2018-085MMM2018-08-0748Keep
2018-095MMM2018-09-0848 
2018-076MMM2018-07-0261Keep
2017-036OOO2017-03-085Keep
2017-107OOO2017-10-16194Keep
2017-127OOO2017-12-04194 
2017-118OOO2017-11-20244Keep
2017-058OOO2017-05-12244 
novinosrin
Tourmaline | Level 20

If you select all vars, proc sql would remerge and produce every record in the output dataset. So this will warrant change in logic/code. Can you post a representative sample of what you have and the required output plz

d0816
Quartz | Level 8

HI,

I have created another sample dataset have.  

 

In dataset Want_1, I would like to calculate the "Total Days in Class" for each "Class" type for an ID, put that operation output in a row with the earliest "ClassBeginDate" for a "Class" type for an ID and keep those rows.

 

In dataset Want_2, I would like to calculate the "Total Days in Class" of all " Class" type for an ID, put that operation output in a row with the earliest "ClassBeginDate" for all "Class" type for an ID and keep that row for an ID.

Dataset: Have     
IDClassClassStartDateDaysinClassMonthsInCollegeVariable1Variable2
1AAA8/13/20186320  
2BBB8/22/2018401  
3CCC10/2/2017210  
3BBB10/23/2017370  
3BBB11/30/2017911  
3BBB3/2/2018714  
4BBB7/1/2018206  
4BBB7/23/2018107  
4AAA8/7/2018258  
4AAA9/8/2018239  
5BBB3/8/201751  
5CCC3/13/2017801  
5AAA7/2/2018618  
6CCC3/15/2017580  
7AAA3/26/201815972  
7AAA9/3/2018378  
8BBB6/5/2017261  
8CCC10/23/20171011  

 

Dataset: Want_1       
IDClassClassStartDateDaysinClassMonthsInCollegeVariable1Variable2Operations Needed (TotalDaysInClass)
1AAA8/13/20186320  63
2BBB8/22/2018401  40
3CCC10/2/2017210  21
3BBB10/23/20171990  37+91+71=199
4BBB7/1/2018306  20+10=30
4AAA8/7/2018488  25+23=48
5BBB3/8/201751  5
5CCC3/13/2017801  80
5AAA7/2/2018618  61
6CCC3/15/2017580  58
7AAA3/26/201816272  159+3=162
8BBB6/5/2017261  26
8CCC10/23/20171011  101

 

Dataset: Want_2       
IDClassClassStartDateDaysinClassMonthsInCollegeVariable1Variable2Operations Needed (TotalDaysInClass)
1AAA8/13/20186320  63
2BBB8/22/2018401  40
3CCC10/2/20172200  21+37+91+71=220
4BBB7/1/2018486  25+23+20+10=78
5BBB3/8/20171461  5+80+61=146
6CCC3/15/2017580  58
7AAA3/26/201816272  159+3=162
8BBB6/5/201712726  26+101=127

 

Thank you.

mkeintz
PROC Star

The proc summary approach offered by @ballardw not only does SUMs.  It also does other stastistics, including MINs.  You can ask it to provide minimal data values for "variables other than"  (edited correction)  the class variables of interest.

 

proc summary is a widely used and powerful sas procedure.  It's worth earnest study.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20
data have;
infile cards truncover;
input ID	Class $	ClassStartDate :mmddyy10.	DaysinClass	MonthsInCollege	Variable1	Variable2;
format ClassStartDate mmddyy10.;
cards;
1	AAA	8/13/2018	63	20	 	 
2	BBB	8/22/2018	40	1	 	 
3	CCC	10/2/2017	21	0	 	 
3	BBB	10/23/2017	37	0	 	 
3	BBB	11/30/2017	91	1	 	 
3	BBB	3/2/2018	71	4	 	 
4	BBB	7/1/2018	20	6	 	 
4	BBB	7/23/2018	10	7	 	 
4	AAA	8/7/2018	25	8	 	 
4	AAA	9/8/2018	23	9	 	 
5	BBB	3/8/2017	5	1	 	 
5	CCC	3/13/2017	80	1	 	 
5	AAA	7/2/2018	61	8	 	 
6	CCC	3/15/2017	58	0	 	 
7	AAA	3/26/2018	159	72	 	 
7	AAA	9/3/2018	3	78	 	 
8	BBB	6/5/2017	26	1	 	 
8	CCC	10/23/2017	101	1	 
; 


proc sql;
create table want1 as
select *,sum(DaysinClass) as operations_need
from have
group by id,Class
having DaysinClass=min(DaysinClass)
order by id,Class;
quit;

proc sql;
create table want2 as
select *,sum(DaysinClass) as operations_need
from have
group by id
having DaysinClass=min(DaysinClass)
order by id;
quit;
d0816
Quartz | Level 8

I still got the same note:

The query requires remerging summary statistics back with the original data.

 

Dataset Want_2 should be smaller but I got same no. of observation in Want_1 and Want_2.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 844 views
  • 0 likes
  • 5 in conversation