The sample dataset Have has multiple rows for an ID with the same or different session name.
Dataset: Have | ||
ID | SessionName | DaysInSession |
1 | AAA | 63 |
2 | BBB | 40 |
3 | CCC | 21 |
3 | BBB | 37 |
3 | BBB | 91 |
3 | BBB | 71 |
4 | AAA | 25 |
4 | AAA | 23 |
4 | BBB | 20 |
4 | BBB | 10 |
5 | BBB | 5 |
5 | CCC | 80 |
5 | AAA | 61 |
6 | CCC | 58 |
7 | AAA | 159 |
7 | AAA | 3 |
8 | BBB | 26 |
8 | CCC | 101 |
How to create two datasets where,
In dataset Want_1, I need three columns where operation I need to do is to add "DaysInSession" for the SAME "SessionName" for an ID. Operation Needed is shown in the 4th column.
In dataset Want_2, I need two columns where operation I need to do is add " "DaysInSession" for the ANY "SessionName" for an ID. Operation Needed is shown in the 3rd column.
Dataset: Want_1 | |||
ID | SessionName | DaysInSession | Operations Needed |
1 | AAA | 63 | 63 |
2 | BBB | 40 | 40 |
3 | CCC | 21 | 21 |
3 | BBB | 199 | 37+91+71=199 |
4 | AAA | 48 | 25+23=48 |
4 | BBB | 30 | 20+10=30 |
5 | BBB | 5 | 5 |
5 | CCC | 80 | 80 |
5 | AAA | 61 | 61 |
6 | CCC | 58 | 58 |
7 | AAA | 162 | 159+3=162 |
8 | BBB | 26 | 26 |
8 | CCC | 101 | 101 |
Dataset: Want_2 | ||
ID | DaysInSession | Operations Needed |
1 | 63 | 63 |
2 | 40 | 40 |
3 | 220 | 21+37+91+71=220 |
4 | 78 | 25+23+20+10=78 |
5 | 146 | 5+80+61=146 |
6 | 58 | 58 |
7 | 162 | 159+3=162 |
8 | 127 | 26+101=127 |
Thank you for the help.
May I ask why you want to do this? The required column is bound to be a character variable.
And a different approach from @novinosrin
data have; input ID $ SessionName $ DaysInSession ; datalines; 1 AAA 63 2 BBB 40 3 CCC 21 3 BBB 37 3 BBB 91 3 BBB 71 4 AAA 25 4 AAA 23 4 BBB 20 4 BBB 10 5 BBB 5 5 CCC 80 5 AAA 61 6 CCC 58 7 AAA 159 7 AAA 3 8 BBB 26 8 CCC 101 ; run; proc summary data=have; class id SessionName; var DaysInSession; output out=want sum =; run;
Take a close look at the want data set variable _type_ and the results of the sum. You can select combinations of the CLASS variables for the desired sum using the _type_ variable. If you really must then you can separate the data but I likely wouldn't but would use the _type_ variable as a BY for processing and in a WHERE to select which ones I wanted later.
data have;
input ID SessionName $ DaysInSession;
cards;
1 AAA 63
2 BBB 40
3 CCC 21
3 BBB 37
3 BBB 91
3 BBB 71
4 AAA 25
4 AAA 23
4 BBB 20
4 BBB 10
5 BBB 5
5 CCC 80
5 AAA 61
6 CCC 58
7 AAA 159
7 AAA 3
8 BBB 26
8 CCC 101
;
proc sql;
create table want1 as
select ID, SessionName,sum(DaysInSession) as DaysInSession,sum(DaysInSession) as operations_need
from have
group by id,SessionName
order by id,sessionname;
quit;
proc sql;
create table want2 as
select ID, sum(DaysInSession) as DaysInSession,sum(DaysInSession) as operations_need
from have
group by id;
quit;
Hi, I modified your code as following. Basically I selected all variables because I needed other variables in the original dataset which I did not show in the sample dataset.
proc sql;
create table want1 as
select *, sum(DaysInSession) as DaysInSession
from have
group by id,SessionName
order by id,sessionname;
quit;
proc sql;
create table want2 as
select *, sum(DaysInSession) as DaysInSession
from have
group by id;
quit;
As a result, I got the following sample dataset from the first set of code. How to subset this dataset to keep only the rows with the " Keep" for an ID? So keep all rows with different session for an ID and If an ID have multiple rows with same session name then, keep the row with the earliest session begin date for an ID.
For e.g for ID #3,5,7 having multiple rows with same session name, keep the row with the earliest begin date. For ID #6, with different session name, keep both the rows. Any suggestion?
Date | ID | Session | SessionBeginDate | TotalDaysinSession | Keep or Delete |
2018-08 | 1 | MMM | 2018-08-13 | 63 | Keep |
2018-08 | 2 | OOO | 2018-08-22 | 40 | Keep |
2017-11 | 3 | OOO | 2017-11-30 | 199 | Keep |
2018-03 | 3 | OOO | 2018-03-02 | 199 | |
2017-10 | 3 | OOO | 2017-10-23 | 199 | |
2018-03 | 4 | MMM | 2018-03-07 | 41 | Keep |
2018-08 | 5 | MMM | 2018-08-07 | 48 | Keep |
2018-09 | 5 | MMM | 2018-09-08 | 48 | |
2018-07 | 6 | MMM | 2018-07-02 | 61 | Keep |
2017-03 | 6 | OOO | 2017-03-08 | 5 | Keep |
2017-10 | 7 | OOO | 2017-10-16 | 194 | Keep |
2017-12 | 7 | OOO | 2017-12-04 | 194 | |
2017-11 | 8 | OOO | 2017-11-20 | 244 | Keep |
2017-05 | 8 | OOO | 2017-05-12 | 244 |
If you select all vars, proc sql would remerge and produce every record in the output dataset. So this will warrant change in logic/code. Can you post a representative sample of what you have and the required output plz
HI,
I have created another sample dataset have.
In dataset Want_1, I would like to calculate the "Total Days in Class" for each "Class" type for an ID, put that operation output in a row with the earliest "ClassBeginDate" for a "Class" type for an ID and keep those rows.
In dataset Want_2, I would like to calculate the "Total Days in Class" of all " Class" type for an ID, put that operation output in a row with the earliest "ClassBeginDate" for all "Class" type for an ID and keep that row for an ID.
Dataset: Have | ||||||
ID | Class | ClassStartDate | DaysinClass | MonthsInCollege | Variable1 | Variable2 |
1 | AAA | 8/13/2018 | 63 | 20 | ||
2 | BBB | 8/22/2018 | 40 | 1 | ||
3 | CCC | 10/2/2017 | 21 | 0 | ||
3 | BBB | 10/23/2017 | 37 | 0 | ||
3 | BBB | 11/30/2017 | 91 | 1 | ||
3 | BBB | 3/2/2018 | 71 | 4 | ||
4 | BBB | 7/1/2018 | 20 | 6 | ||
4 | BBB | 7/23/2018 | 10 | 7 | ||
4 | AAA | 8/7/2018 | 25 | 8 | ||
4 | AAA | 9/8/2018 | 23 | 9 | ||
5 | BBB | 3/8/2017 | 5 | 1 | ||
5 | CCC | 3/13/2017 | 80 | 1 | ||
5 | AAA | 7/2/2018 | 61 | 8 | ||
6 | CCC | 3/15/2017 | 58 | 0 | ||
7 | AAA | 3/26/2018 | 159 | 72 | ||
7 | AAA | 9/3/2018 | 3 | 78 | ||
8 | BBB | 6/5/2017 | 26 | 1 | ||
8 | CCC | 10/23/2017 | 101 | 1 |
Dataset: Want_1 | |||||||
ID | Class | ClassStartDate | DaysinClass | MonthsInCollege | Variable1 | Variable2 | Operations Needed (TotalDaysInClass) |
1 | AAA | 8/13/2018 | 63 | 20 | 63 | ||
2 | BBB | 8/22/2018 | 40 | 1 | 40 | ||
3 | CCC | 10/2/2017 | 21 | 0 | 21 | ||
3 | BBB | 10/23/2017 | 199 | 0 | 37+91+71=199 | ||
4 | BBB | 7/1/2018 | 30 | 6 | 20+10=30 | ||
4 | AAA | 8/7/2018 | 48 | 8 | 25+23=48 | ||
5 | BBB | 3/8/2017 | 5 | 1 | 5 | ||
5 | CCC | 3/13/2017 | 80 | 1 | 80 | ||
5 | AAA | 7/2/2018 | 61 | 8 | 61 | ||
6 | CCC | 3/15/2017 | 58 | 0 | 58 | ||
7 | AAA | 3/26/2018 | 162 | 72 | 159+3=162 | ||
8 | BBB | 6/5/2017 | 26 | 1 | 26 | ||
8 | CCC | 10/23/2017 | 101 | 1 | 101 |
Dataset: Want_2 | |||||||
ID | Class | ClassStartDate | DaysinClass | MonthsInCollege | Variable1 | Variable2 | Operations Needed (TotalDaysInClass) |
1 | AAA | 8/13/2018 | 63 | 20 | 63 | ||
2 | BBB | 8/22/2018 | 40 | 1 | 40 | ||
3 | CCC | 10/2/2017 | 220 | 0 | 21+37+91+71=220 | ||
4 | BBB | 7/1/2018 | 48 | 6 | 25+23+20+10=78 | ||
5 | BBB | 3/8/2017 | 146 | 1 | 5+80+61=146 | ||
6 | CCC | 3/15/2017 | 58 | 0 | 58 | ||
7 | AAA | 3/26/2018 | 162 | 72 | 159+3=162 | ||
8 | BBB | 6/5/2017 | 127 | 26 | 26+101=127 |
Thank you.
The proc summary approach offered by @ballardw not only does SUMs. It also does other stastistics, including MINs. You can ask it to provide minimal data values for "variables other than" (edited correction) the class variables of interest.
proc summary is a widely used and powerful sas procedure. It's worth earnest study.
data have;
infile cards truncover;
input ID Class $ ClassStartDate :mmddyy10. DaysinClass MonthsInCollege Variable1 Variable2;
format ClassStartDate mmddyy10.;
cards;
1 AAA 8/13/2018 63 20
2 BBB 8/22/2018 40 1
3 CCC 10/2/2017 21 0
3 BBB 10/23/2017 37 0
3 BBB 11/30/2017 91 1
3 BBB 3/2/2018 71 4
4 BBB 7/1/2018 20 6
4 BBB 7/23/2018 10 7
4 AAA 8/7/2018 25 8
4 AAA 9/8/2018 23 9
5 BBB 3/8/2017 5 1
5 CCC 3/13/2017 80 1
5 AAA 7/2/2018 61 8
6 CCC 3/15/2017 58 0
7 AAA 3/26/2018 159 72
7 AAA 9/3/2018 3 78
8 BBB 6/5/2017 26 1
8 CCC 10/23/2017 101 1
;
proc sql;
create table want1 as
select *,sum(DaysinClass) as operations_need
from have
group by id,Class
having DaysinClass=min(DaysinClass)
order by id,Class;
quit;
proc sql;
create table want2 as
select *,sum(DaysinClass) as operations_need
from have
group by id
having DaysinClass=min(DaysinClass)
order by id;
quit;
I still got the same note:
The query requires remerging summary statistics back with the original data.
Dataset Want_2 should be smaller but I got same no. of observation in Want_1 and Want_2.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.