Re: How to add numbers of a variable conditionally

d0816 · Posted 10-16-2018 12:56 PM

The sample dataset Have has multiple rows for an ID with the same or different session name.

Dataset: Have
ID	SessionName	DaysInSession
1	AAA	63
2	BBB	40
3	CCC	21
3	BBB	37
3	BBB	91
3	BBB	71
4	AAA	25
4	AAA	23
4	BBB	20
4	BBB	10
5	BBB	5
5	CCC	80
5	AAA	61
6	CCC	58
7	AAA	159
7	AAA	3
8	BBB	26
8	CCC	101

How to create two datasets where,

In dataset Want_1, I need three columns where operation I need to do is to add "DaysInSession" for the SAME "SessionName" for an ID. Operation Needed is shown in the 4th column.

In dataset Want_2, I need two columns where operation I need to do is add " "DaysInSession" for the ANY "SessionName" for an ID. Operation Needed is shown in the 3rd column.

Dataset: Want_1
ID	SessionName	DaysInSession	Operations Needed
1	AAA	63	63
2	BBB	40	40
3	CCC	21	21
3	BBB	199	37+91+71=199
4	AAA	48	25+23=48
4	BBB	30	20+10=30
5	BBB	5	5
5	CCC	80	80
5	AAA	61	61
6	CCC	58	58
7	AAA	162	159+3=162
8	BBB	26	26
8	CCC	101	101

Dataset: Want_2
ID	DaysInSession	Operations Needed
1	63	63
2	40	40
3	220	21+37+91+71=220
4	78	25+23+20+10=78
5	146	5+80+61=146
6	58	58
7	162	159+3=162
8	127	26+101=127

Thank you for the help.

PeterClemmensen · Posted 10-16-2018 12:59 PM

May I ask why you want to do this? The required column is bound to be a character variable.

The DATA to DATA Step Macro
Blog: SASnrd

d0816 · Posted 10-16-2018 01:02 PM

The "Operation Needed" Column is just to show the operation that needs to happen. I do not need that column in my dataset Want_1 and Want_2.

Thanks.

ballardw · Posted 10-16-2018 03:54 PM

And a different approach from @novinosrin

data have;
input ID $ SessionName $ DaysInSession ;
datalines;
1 AAA 63 
2 BBB 40 
3 CCC 21 
3 BBB 37 
3 BBB 91 
3 BBB 71 
4 AAA 25 
4 AAA 23 
4 BBB 20 
4 BBB 10 
5 BBB 5 
5 CCC 80 
5 AAA 61 
6 CCC 58 
7 AAA 159 
7 AAA 3 
8 BBB 26 
8 CCC 101 
;
run;

proc summary data=have;
   class id SessionName;
   var DaysInSession;
   output out=want sum =;
run;

Take a close look at the want data set variable _type_ and the results of the sum. You can select combinations of the CLASS variables for the desired sum using the _type_ variable. If you really must then you can separate the data but I likely wouldn't but would use the _type_ variable as a BY for processing and in a WHERE to select which ones I wanted later.

novinosrin · Posted 10-16-2018 01:03 PM


data have;
input ID	SessionName $	DaysInSession;
cards;
1	AAA	63
2	BBB	40
3	CCC	21
3	BBB	37
3	BBB	91
3	BBB	71
4	AAA	25
4	AAA	23
4	BBB	20
4	BBB	10
5	BBB	5
5	CCC	80
5	AAA	61
6	CCC	58
7	AAA	159
7	AAA	3
8	BBB	26
8	CCC	101
;

proc sql;
create table want1 as
select ID,	SessionName,sum(DaysInSession) as DaysInSession,sum(DaysInSession) as operations_need
from have
group by id,SessionName
order by id,sessionname;
quit;

proc sql;
create table want2 as
select ID,	sum(DaysInSession) as DaysInSession,sum(DaysInSession) as operations_need
from have
group by id;
quit;

d0816 · Posted 10-16-2018 04:42 PM

Hi, I modified your code as following. Basically I selected all variables because I needed other variables in the original dataset which I did not show in the sample dataset.

proc sql;
create table want1 as
select *, sum(DaysInSession) as DaysInSession
from have
group by id,SessionName
order by id,sessionname;
quit;

proc sql;
create table want2 as
select *, sum(DaysInSession) as DaysInSession
from have
group by id;
quit;

As a result, I got the following sample dataset from the first set of code. How to subset this dataset to keep only the rows with the " Keep" for an ID? So keep all rows with different session for an ID and If an ID have multiple rows with same session name then, keep the row with the earliest session begin date for an ID.

For e.g for ID #3,5,7 having multiple rows with same session name, keep the row with the earliest begin date. For ID #6, with different session name, keep both the rows. Any suggestion?

Date	ID	Session	SessionBeginDate	TotalDaysinSession	Keep or Delete
2018-08	1	MMM	2018-08-13	63	Keep
2018-08	2	OOO	2018-08-22	40	Keep
2017-11	3	OOO	2017-11-30	199	Keep
2018-03	3	OOO	2018-03-02	199
2017-10	3	OOO	2017-10-23	199
2018-03	4	MMM	2018-03-07	41	Keep
2018-08	5	MMM	2018-08-07	48	Keep
2018-09	5	MMM	2018-09-08	48
2018-07	6	MMM	2018-07-02	61	Keep
2017-03	6	OOO	2017-03-08	5	Keep
2017-10	7	OOO	2017-10-16	194	Keep
2017-12	7	OOO	2017-12-04	194
2017-11	8	OOO	2017-11-20	244	Keep
2017-05	8	OOO	2017-05-12	244

novinosrin · Posted 10-16-2018 04:44 PM

If you select all vars, proc sql would remerge and produce every record in the output dataset. So this will warrant change in logic/code. Can you post a representative sample of what you have and the required output plz

d0816 · Posted 10-17-2018 09:55 AM

HI,

I have created another sample dataset have.

In dataset Want_1, I would like to calculate the "Total Days in Class" for each "Class" type for an ID, put that operation output in a row with the earliest "ClassBeginDate" for a "Class" type for an ID and keep those rows.

In dataset Want_2, I would like to calculate the "Total Days in Class" of all " Class" type for an ID, put that operation output in a row with the earliest "ClassBeginDate" for all "Class" type for an ID and keep that row for an ID.

Dataset: Have
ID	Class	ClassStartDate	DaysinClass	MonthsInCollege	Variable1	Variable2
1	AAA	8/13/2018	63	20
2	BBB	8/22/2018	40	1
3	CCC	10/2/2017	21	0
3	BBB	10/23/2017	37	0
3	BBB	11/30/2017	91	1
3	BBB	3/2/2018	71	4
4	BBB	7/1/2018	20	6
4	BBB	7/23/2018	10	7
4	AAA	8/7/2018	25	8
4	AAA	9/8/2018	23	9
5	BBB	3/8/2017	5	1
5	CCC	3/13/2017	80	1
5	AAA	7/2/2018	61	8
6	CCC	3/15/2017	58	0
7	AAA	3/26/2018	159	72
7	AAA	9/3/2018	3	78
8	BBB	6/5/2017	26	1
8	CCC	10/23/2017	101	1

Dataset: Want_1
ID	Class	ClassStartDate	DaysinClass	MonthsInCollege	Variable1	Variable2	Operations Needed (TotalDaysInClass)
1	AAA	8/13/2018	63	20			63
2	BBB	8/22/2018	40	1			40
3	CCC	10/2/2017	21	0			21
3	BBB	10/23/2017	199	0			37+91+71=199
4	BBB	7/1/2018	30	6			20+10=30
4	AAA	8/7/2018	48	8			25+23=48
5	BBB	3/8/2017	5	1			5
5	CCC	3/13/2017	80	1			80
5	AAA	7/2/2018	61	8			61
6	CCC	3/15/2017	58	0			58
7	AAA	3/26/2018	162	72			159+3=162
8	BBB	6/5/2017	26	1			26
8	CCC	10/23/2017	101	1			101

Dataset: Want_2
ID	Class	ClassStartDate	DaysinClass	MonthsInCollege	Variable1	Variable2	Operations Needed (TotalDaysInClass)
1	AAA	8/13/2018	63	20			63
2	BBB	8/22/2018	40	1			40
3	CCC	10/2/2017	220	0			21+37+91+71=220
4	BBB	7/1/2018	48	6			25+23+20+10=78
5	BBB	3/8/2017	146	1			5+80+61=146
6	CCC	3/15/2017	58	0			58
7	AAA	3/26/2018	162	72			159+3=162
8	BBB	6/5/2017	127	26			26+101=127

Thank you.

mkeintz · Posted 10-17-2018 10:43 AM

The proc summary approach offered by @ballardw not only does SUMs. It also does other stastistics, including MINs. You can ask it to provide minimal data values for "variables other than" (edited correction) the class variables of interest.

proc summary is a widely used and powerful sas procedure. It's worth earnest study.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 10-17-2018 10:47 AM

data have;
infile cards truncover;
input ID	Class $	ClassStartDate :mmddyy10.	DaysinClass	MonthsInCollege	Variable1	Variable2;
format ClassStartDate mmddyy10.;
cards;
1	AAA	8/13/2018	63	20	 	 
2	BBB	8/22/2018	40	1	 	 
3	CCC	10/2/2017	21	0	 	 
3	BBB	10/23/2017	37	0	 	 
3	BBB	11/30/2017	91	1	 	 
3	BBB	3/2/2018	71	4	 	 
4	BBB	7/1/2018	20	6	 	 
4	BBB	7/23/2018	10	7	 	 
4	AAA	8/7/2018	25	8	 	 
4	AAA	9/8/2018	23	9	 	 
5	BBB	3/8/2017	5	1	 	 
5	CCC	3/13/2017	80	1	 	 
5	AAA	7/2/2018	61	8	 	 
6	CCC	3/15/2017	58	0	 	 
7	AAA	3/26/2018	159	72	 	 
7	AAA	9/3/2018	3	78	 	 
8	BBB	6/5/2017	26	1	 	 
8	CCC	10/23/2017	101	1	 
; 


proc sql;
create table want1 as
select *,sum(DaysinClass) as operations_need
from have
group by id,Class
having DaysinClass=min(DaysinClass)
order by id,Class;
quit;

proc sql;
create table want2 as
select *,sum(DaysinClass) as operations_need
from have
group by id
having DaysinClass=min(DaysinClass)
order by id;
quit;

d0816 · Posted 10-17-2018 11:47 AM

I still got the same note:

The query requires remerging summary statistics back with the original data.

Dataset Want_2 should be smaller but I got same no. of observation in Want_1 and Want_2.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away