Hello
I have the following table of two columns. The third column is a description of the term. The term code is defined in a way that I cannot wrap my head around it how to select the time of a period that subjects (IDs) were registered as clients. A Year is separated in three terms: Fall, Spring and Summer.
My objective was to select all the rows up to a two year period and then have two indicator columns to show that those subjects were registered for one year or for two years.
Have table:
ID | TERM_Code | Term_Desc | Comments |
1 | 201520 | Spring 2015 | |
1 | 201530 | Summer 2015 | |
1 | 201610 | Fall 2015 | |
1 | 201620 | Spring 2016 | |
1 | 201630 | Summer 2016 | |
1 | 201710 | Fall 2016 | |
1 | 201720 | Spring 2017 | More than two years |
1 | 201730 | Summer 2017 | More than two years |
1 | 201810 | Fall 2019 | |
2 | 201710 | Fall 2016 | |
2 | 201720 | Spring 2016 | *Skipped 201730, Summer 2017 |
2 | 201810 | Fall 2017 | |
2 | 201820 | Spring 2018 | |
2 | 201830 | Summer 2018 | |
3 | 201510 | Fall 2014 | |
3 | 201520 | Spring 2015 |
I am looking to end up in table like this:
ID | TERM_Code | Term_Desc | Yr1 | Yr2 |
1 | 201520 | Spring 2015 | Yes | Yes |
1 | 201530 | Summer 2015 | Yes | Yes |
1 | 201610 | Fall 2015 | Yes | Yes |
1 | 201620 | Spring 2016 | Yes | Yes |
1 | 201630 | Summer 2016 | Yes | Yes |
1 | 201710 | Fall 2016 | Yes | Yes |
2 | 201710 | Fall 2016 | Yes | No |
2 | 201720 | Spring 2016 | Yes | No |
2 | 201810 | Fall 2017 | Yes | No |
2 | 201820 | Spring 2018 | Yes | No |
2 | 201830 | Summer 2018 | Yes | No |
3 | 201510 | Fall 2014 | No | No |
3 | 201520 | Spring 2015 | No | No |
I am not sure how to maniputale "have" table to get to "want" tabel with this weird term for semesters
Might you have any ideas on how to go on this?
It is really uneasy. Hope the following code could get you right answer.
data have;
infile cards dlm=',';
input ID TERM_Code Term_Desc $;
year=int(TERM_Code/100);
cards;
1, 201520, Spring 2015
1, 201530, Summer 2015
1, 201610, Fall 2015
1, 201620, Spring 2016
1, 201630, Summer 2016
1, 201710, Fall 2016
1, 201720, Spring 2017
1, 201730, Summer 2017
1, 201810, Fall 2019
2, 201710, Fall 2016
2, 201720, Spring 2016
2, 201810, Fall 2017
2, 201820, Spring 2018
2, 201830, Summer 2018
3, 201510, Fall 2014
3, 201520, Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
do year=min to max;
TERM_Code=100*year+10;output;
TERM_Code=100*year+20;output;
TERM_Code=100*year+30;output;
end;
drop min max year;
run;
data temp;
merge key have(drop=year);
by id TERM_Code;
run;
data temp1;
set temp;
by id;
if first.id then group=0;
if not missing(Term_Desc) then group+1;
else group=0;
run;
proc sql;
create table want as
select *,
case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
from temp1
group by id;
quit;
What happened to the 2017 records? Why did you drop some of the ID 1 observations? Also, what are you planning to do next. I cannot see how this format could be useful, in this case I could see a wide format being useful, but having a hard time seeing how this format could be useful.
@triunk wrote:
Hello
I have the following table of two columns. The third column is a description of the term. The term code is defined in a way that I cannot wrap my head around it how to select the time of a period that subjects (IDs) were registered as clients. A Year is separated in three terms: Fall, Spring and Summer.
My objective was to select all the rows up to a two year period and then have two indicator columns to show that those subjects were registered for one year or for two years.
Have table:
ID TERM_Code Term_Desc Comments 1 201520 Spring 2015 1 201530 Summer 2015 1 201610 Fall 2015 1 201620 Spring 2016 1 201630 Summer 2016 1 201710 Fall 2016 1 201720 Spring 2017 More than two years 1 201730 Summer 2017 More than two years 1 201810 Fall 2019 2 201710 Fall 2016 2 201720 Spring 2016 *Skipped 201730, Summer 2017 2 201810 Fall 2017 2 201820 Spring 2018 2 201830 Summer 2018 3 201510 Fall 2014 3 201520 Spring 2015
I am looking to end up in table like this:
ID TERM_Code Term_Desc Yr1 Yr2 1 201520 Spring 2015 Yes Yes 1 201530 Summer 2015 Yes Yes 1 201610 Fall 2015 Yes Yes 1 201620 Spring 2016 Yes Yes 1 201630 Summer 2016 Yes Yes 1 201710 Fall 2016 Yes Yes 2 201710 Fall 2016 Yes No 2 201720 Spring 2016 Yes No 2 201810 Fall 2017 Yes No 2 201820 Spring 2018 Yes No 2 201830 Summer 2018 Yes No 3 201510 Fall 2014 No No 3 201520 Spring 2015 No No
I am not sure how to maniputale "have" table to get to "want" tabel with this weird term for semesters
Might you have any ideas on how to go on this?
Hi Reeza
Thank you for your questions
I am trying to select subjects for a period of two years which is 6 terms
For instance for subject one, those terms are
ID | TERM_Code | Term_Desc | Consecutive Terms |
1 | 201520 | Spring 2015 | 1 |
1 | 201530 | Summer 2015 | 2 |
1 | 201610 | Fall 2015 | 3 |
1 | 201620 | Spring 2016 | 4 |
1 | 201630 | Summer 2016 | 5 |
1 | 201710 | Fall 2016 | 6 |
The other two twrms 201720, Spring 2017 and TErm 201730, Summer 2017 are not included because there are
not included in those criteria
The other two subjects ID=2 and ID=3 have not completed the cycle of their two years.
The updated HAVE table is here:
ID | TERM_Code | Term_Desc | ||||
1 | 201520 | Spring 2015 | ||||
1 | 201530 | Summer 2015 | ||||
1 | 201610 | Fall 2015 | ||||
1 | 201620 | Spring 2016 | ||||
1 | 201630 | Summer 2016 | ||||
1 | 201710 | Fall 2016 | ||||
1 | 201720 | Spring 2017 | ||||
1 | 201730 | Summer 2017 | ||||
1 | 201810 | Fall 2019 | ||||
2 | 201710 | Fall 2016 | ||||
2 | 201720 | Spring 2017 | *Skipped 201730, Summer 2017 | |||
2 | 201810 | Fall 2017 | ||||
2 | 201820 | Spring 2018 | ||||
2 | 201830 | Summer 2018 | ||||
3 | 201510 | Fall 2014 | ||||
3 | 201520 | Spring 2015 |
The WANT table is here
ID | TERM_Code | Term_Desc | Yr1 | Yr2 |
1 | 201520 | Spring 2015 | Yes | Yes |
1 | 201530 | Summer 2015 | Yes | Yes |
1 | 201610 | Fall 2015 | Yes | Yes |
1 | 201620 | Spring 2016 | Yes | Yes |
1 | 201630 | Summer 2016 | Yes | Yes |
1 | 201710 | Fall 2016 | Yes | Yes |
2 | 201710 | Fall 2016 | Yes | No |
2 | 201720 | Spring 2017 | Yes | No |
2 | 201810 | Fall 2017 | Yes | No |
2 | 201820 | Spring 2018 | Yes | No |
2 | 201830 | Summer 2018 | Yes | No |
3 | 201510 | Fall 2014 | No | No |
3 | 201520 | Spring 2015 | No | No |
Might this makes a little bit more sense?
I don't think this answers your question, but may help you get closer.
The counter tells you the consecutive terms. I assumed that I was correct in assigning 20 to spring, 30 to summer and 10 to fall.
data have;
infile cards dlm=',';
informat term_desc $20.;
input ID TERM_Code Term_Desc $;
cards;
1, 201520, Spring 2015
1, 201530, Summer 2015
1, 201610, Fall 2015
1, 201620, Spring 2016
1, 201630, Summer 2016
1, 201710, Fall 2016
1, 201720, Spring 2017
1, 201730, Summer 2017
1, 201810, Fall 2019
2, 201710, Fall 2016
2, 201720, Spring 2016
2, 201810, Fall 2017
2, 201820, Spring 2018
2, 201830, Summer 2018
3, 201510, Fall 2014
3, 201520, Spring 2015
;;;;
run;
proc format;
value semester
10 = 'Fall'
20 = 'Spring'
30 = 'Summer';
run;
data long;
set have;
Year = input(substrn(term_code, 1, 4), 8.);
Term = input(substrn(term_code, 5, 2), 8.);
*if term=10 then term=40;*recode to sort correctly;
run;
proc sort data=long;
by ID Year Term;
run;
data want;
set long;
by ID year term;
lag_term = lag(term);
if first.id then do;
counter=1;
lag_term = .;
end;
else counter+1;
*check previous terms for consecutive;
if (term=20 and lag_term ne 10) or
(term=30 and lag_term ne 20) or
(term=10 and lag_term ne 30)
then counter = 1;
run;
Yes that is a good starting point Reeza thank you!
My struggle was really the selction part not so much the transformation but this defintely goes to the right direction
I really appreciate your prompt reply on this you are A SUPER SAS person indeed
It is really uneasy. Hope the following code could get you right answer.
data have;
infile cards dlm=',';
input ID TERM_Code Term_Desc $;
year=int(TERM_Code/100);
cards;
1, 201520, Spring 2015
1, 201530, Summer 2015
1, 201610, Fall 2015
1, 201620, Spring 2016
1, 201630, Summer 2016
1, 201710, Fall 2016
1, 201720, Spring 2017
1, 201730, Summer 2017
1, 201810, Fall 2019
2, 201710, Fall 2016
2, 201720, Spring 2016
2, 201810, Fall 2017
2, 201820, Spring 2018
2, 201830, Summer 2018
3, 201510, Fall 2014
3, 201520, Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
do year=min to max;
TERM_Code=100*year+10;output;
TERM_Code=100*year+20;output;
TERM_Code=100*year+30;output;
end;
drop min max year;
run;
data temp;
merge key have(drop=year);
by id TERM_Code;
run;
data temp1;
set temp;
by id;
if first.id then group=0;
if not missing(Term_Desc) then group+1;
else group=0;
run;
proc sql;
create table want as
select *,
case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
from temp1
group by id;
quit;
Wow indeed but this is brilliant!
Thank you
It is really uneasy. Hope the following code could get you right answer.
data have;
infile cards dlm=',';
input ID TERM_Code Term_Desc $;
year=int(TERM_Code/100);
cards;
1, 201520, Spring 2015
1, 201530, Summer 2015
1, 201610, Fall 2015
1, 201620, Spring 2016
1, 201630, Summer 2016
1, 201710, Fall 2016
1, 201720, Spring 2017
1, 201730, Summer 2017
1, 201810, Fall 2019
2, 201710, Fall 2016
2, 201720, Spring 2016
2, 201810, Fall 2017
2, 201820, Spring 2018
2, 201830, Summer 2018
3, 201510, Fall 2014
3, 201520, Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
do year=min to max;
TERM_Code=100*year+10;output;
TERM_Code=100*year+20;output;
TERM_Code=100*year+30;output;
end;
drop min max year;
run;
data temp;
merge key have(drop=year);
by id TERM_Code;
run;
data temp1;
set temp;
by id;
if first.id then group=0;
if not missing(Term_Desc) then group+1;
else group=0;
run;
proc sql;
create table want as
select *,
case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
from temp1
group by id;
quit;
It is really uneasy. Hope the following code could get you right answer.
data have;
infile cards dlm=',';
input ID TERM_Code Term_Desc $;
year=int(TERM_Code/100);
cards;
1, 201520, Spring 2015
1, 201530, Summer 2015
1, 201610, Fall 2015
1, 201620, Spring 2016
1, 201630, Summer 2016
1, 201710, Fall 2016
1, 201720, Spring 2017
1, 201730, Summer 2017
1, 201810, Fall 2019
2, 201710, Fall 2016
2, 201720, Spring 2016
2, 201810, Fall 2017
2, 201820, Spring 2018
2, 201830, Summer 2018
3, 201510, Fall 2014
3, 201520, Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
do year=min to max;
TERM_Code=100*year+10;output;
TERM_Code=100*year+20;output;
TERM_Code=100*year+30;output;
end;
drop min max year;
run;
data temp;
merge key have(drop=year);
by id TERM_Code;
run;
data temp1;
set temp;
by id;
if first.id then group=0;
if not missing(Term_Desc) then group+1;
else group=0;
run;
proc sql;
create table want as
select *,
case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
from temp1
group by id;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.