BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
triunk
Obsidian | Level 7

Hello 

 

I have the following table of two columns. The third column is a description of the term. The term code is defined in a way that I cannot wrap my head around it how to select the time of a period that subjects (IDs) were registered as clients. A Year is separated in three terms: Fall, Spring and Summer.

 

My objective was to select all the rows up to a two year period and then have two indicator columns to show that those subjects were registered for one year or for two years.

 

Have table:

 

ID TERM_Code Term_Desc  Comments
1 201520 Spring 2015  
1 201530 Summer 2015  
1 201610 Fall 2015  
1 201620 Spring 2016  
1 201630 Summer 2016  
1 201710 Fall 2016  
1 201720 Spring 2017 More than two years 
1 201730 Summer 2017 More than two years 
1 201810 Fall 2019  
2 201710 Fall 2016  
2 201720 Spring 2016 *Skipped 201730, Summer 2017
2 201810 Fall 2017  
2 201820 Spring 2018  
2 201830 Summer 2018  
3 201510 Fall 2014  
3 201520 Spring 2015  

 

I am looking to end up in table like this:

 

ID TERM_Code Term_Desc Yr1 Yr2
1 201520 Spring 2015 Yes Yes
1 201530 Summer 2015 Yes Yes
1 201610 Fall 2015 Yes Yes
1 201620 Spring 2016 Yes Yes
1 201630 Summer 2016 Yes Yes
1 201710 Fall 2016 Yes Yes
2 201710 Fall 2016 Yes No
2 201720 Spring 2016 Yes No
2 201810 Fall 2017 Yes No
2 201820 Spring 2018 Yes No
2 201830 Summer 2018 Yes No
3 201510 Fall 2014 No No
3 201520 Spring 2015 No No

 

I am not sure how to maniputale "have" table to get to "want" tabel with this weird term for semesters

 

Might you have any ideas on how to go on this?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It is really uneasy. Hope the following code could get you right answer.

 

data have;
infile cards dlm=',';
input ID  TERM_Code   Term_Desc  $;
year=int(TERM_Code/100);
cards;
1,   201520,  Spring 2015  
1,   201530,  Summer 2015  
1,   201610,  Fall 2015    
1,   201620,  Spring 2016  
1,   201630,  Summer 2016  
1,   201710,  Fall 2016    
1,   201720,  Spring 2017
1,   201730,  Summer 2017
1,   201810,  Fall 2019    
2,   201710,  Fall 2016    
2,   201720,  Spring 2016
2,   201810,  Fall 2017    
2,   201820,  Spring 2018  
2,   201830,  Summer 2018  
3,   201510,  Fall 2014    
3,   201520,  Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
 do year=min to max;
    TERM_Code=100*year+10;output;
    TERM_Code=100*year+20;output;
    TERM_Code=100*year+30;output;
 end;
drop min max year;
run;
data temp;
 merge key have(drop=year);
 by id TERM_Code;
run;
data temp1;
 set temp;
 by id;
 if first.id then group=0;
 if not missing(Term_Desc) then group+1;
  else group=0;
run;
proc sql;
create table want as
 select *,
  case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
  case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
  from temp1
   group by id;
quit;

View solution in original post

9 REPLIES 9
Reeza
Super User

What happened to the 2017 records? Why did you drop some of the ID 1 observations? Also, what are you planning to do next. I cannot see how this format could be useful, in this case I could see a wide format being useful, but having a hard time seeing how this format could be useful. 

 


@triunk wrote:

Hello 

 

I have the following table of two columns. The third column is a description of the term. The term code is defined in a way that I cannot wrap my head around it how to select the time of a period that subjects (IDs) were registered as clients. A Year is separated in three terms: Fall, Spring and Summer.

 

My objective was to select all the rows up to a two year period and then have two indicator columns to show that those subjects were registered for one year or for two years.

 

Have table:

 

ID TERM_Code Term_Desc  Comments
1 201520 Spring 2015  
1 201530 Summer 2015  
1 201610 Fall 2015  
1 201620 Spring 2016  
1 201630 Summer 2016  
1 201710 Fall 2016  
1 201720 Spring 2017 More than two years 
1 201730 Summer 2017 More than two years 
1 201810 Fall 2019  
2 201710 Fall 2016  
2 201720 Spring 2016 *Skipped 201730, Summer 2017
2 201810 Fall 2017  
2 201820 Spring 2018  
2 201830 Summer 2018  
3 201510 Fall 2014  
3 201520 Spring 2015  

 

I am looking to end up in table like this:

 

ID TERM_Code Term_Desc Yr1 Yr2
1 201520 Spring 2015 Yes Yes
1 201530 Summer 2015 Yes Yes
1 201610 Fall 2015 Yes Yes
1 201620 Spring 2016 Yes Yes
1 201630 Summer 2016 Yes Yes
1 201710 Fall 2016 Yes Yes
2 201710 Fall 2016 Yes No
2 201720 Spring 2016 Yes No
2 201810 Fall 2017 Yes No
2 201820 Spring 2018 Yes No
2 201830 Summer 2018 Yes No
3 201510 Fall 2014 No No
3 201520 Spring 2015 No No

 

I am not sure how to maniputale "have" table to get to "want" tabel with this weird term for semesters

 

Might you have any ideas on how to go on this?


 

triunk
Obsidian | Level 7

Hi Reeza

 

Thank you for your questions

 

I am trying to select subjects for a period of two years which is 6 terms

For instance for subject one, those terms are

 

ID TERM_Code Term_Desc Consecutive Terms
1 201520 Spring 2015 1
1 201530 Summer 2015 2
1 201610 Fall 2015 3
1 201620 Spring 2016 4
1 201630 Summer 2016 5
1 201710 Fall 2016 6

 

 

The other two twrms 201720, Spring 2017 and TErm 201730, Summer 2017 are not included because there are

not included in those criteria

 

The other two subjects ID=2 and ID=3 have not completed the cycle of their two years.

 

The updated HAVE table is here:

ID TERM_Code Term_Desc        
1 201520 Spring 2015        
1 201530 Summer 2015        
1 201610 Fall 2015        
1 201620 Spring 2016        
1 201630 Summer 2016        
1 201710 Fall 2016        
1 201720 Spring 2017        
1 201730 Summer 2017        
1 201810 Fall 2019        
2 201710 Fall 2016        
2 201720 Spring 2017 *Skipped 201730, Summer 2017
2 201810 Fall 2017        
2 201820 Spring 2018        
2 201830 Summer 2018        
3 201510 Fall 2014        
3 201520 Spring 2015        

 

The WANT table is here

ID TERM_Code Term_Desc Yr1 Yr2
1 201520 Spring 2015 Yes Yes
1 201530 Summer 2015 Yes Yes
1 201610 Fall 2015 Yes Yes
1 201620 Spring 2016 Yes Yes
1 201630 Summer 2016 Yes Yes
1 201710 Fall 2016 Yes Yes
2 201710 Fall 2016 Yes No
2 201720 Spring 2017 Yes No
2 201810 Fall 2017 Yes No
2 201820 Spring 2018 Yes No
2 201830 Summer 2018 Yes No
3 201510 Fall 2014 No No
3 201520 Spring 2015 No No

 

Might this makes a little bit more sense?

Reeza
Super User
Where did the consecutive part come into play? As long as it's how many terms consecutively? ID2 only has 5 which is not two full years either, it should 6 shouldn't it?
Reeza
Super User

I don't think this answers your question, but may help you get closer. 

The counter tells you the consecutive terms. I assumed that I was correct in assigning 20 to spring, 30 to summer and 10 to fall. 

 

data have;
infile cards dlm=',';
informat term_desc $20.;
input ID  TERM_Code   Term_Desc  $;
cards;
1,   201520,  Spring 2015  
1,   201530,  Summer 2015  
1,   201610,  Fall 2015    
1,   201620,  Spring 2016  
1,   201630,  Summer 2016  
1,   201710,  Fall 2016    
1,   201720,  Spring 2017
1,   201730,  Summer 2017
1,   201810,  Fall 2019    
2,   201710,  Fall 2016    
2,   201720,  Spring 2016
2,   201810,  Fall 2017    
2,   201820,  Spring 2018  
2,   201830,  Summer 2018  
3,   201510,  Fall 2014    
3,   201520,  Spring 2015
;;;;
run;

proc format;
value semester 
10 = 'Fall'
20 = 'Spring'
30 = 'Summer';
run;


data long;
set have;
Year = input(substrn(term_code, 1, 4), 8.);
Term = input(substrn(term_code, 5, 2), 8.);

*if term=10 then term=40;*recode to sort correctly;
run;

proc sort data=long;
by ID Year Term;
run;

data want;
set long;
by ID year term;


lag_term = lag(term);

if first.id then do;
    counter=1;
    lag_term = .;
end;
else counter+1;

*check previous terms for consecutive;
if (term=20 and lag_term ne 10) or 
     (term=30 and lag_term ne 20) or
     (term=10 and lag_term ne 30) 
then counter = 1;


run;
triunk
Obsidian | Level 7

Yes that is a good starting point Reeza thank you!

 

My struggle was really the selction part not so much the transformation but this defintely goes to the right direction

 

I really appreciate your prompt reply on this you are A SUPER SAS person indeed

Ksharp
Super User

It is really uneasy. Hope the following code could get you right answer.

 

data have;
infile cards dlm=',';
input ID  TERM_Code   Term_Desc  $;
year=int(TERM_Code/100);
cards;
1,   201520,  Spring 2015  
1,   201530,  Summer 2015  
1,   201610,  Fall 2015    
1,   201620,  Spring 2016  
1,   201630,  Summer 2016  
1,   201710,  Fall 2016    
1,   201720,  Spring 2017
1,   201730,  Summer 2017
1,   201810,  Fall 2019    
2,   201710,  Fall 2016    
2,   201720,  Spring 2016
2,   201810,  Fall 2017    
2,   201820,  Spring 2018  
2,   201830,  Summer 2018  
3,   201510,  Fall 2014    
3,   201520,  Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
 do year=min to max;
    TERM_Code=100*year+10;output;
    TERM_Code=100*year+20;output;
    TERM_Code=100*year+30;output;
 end;
drop min max year;
run;
data temp;
 merge key have(drop=year);
 by id TERM_Code;
run;
data temp1;
 set temp;
 by id;
 if first.id then group=0;
 if not missing(Term_Desc) then group+1;
  else group=0;
run;
proc sql;
create table want as
 select *,
  case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
  case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
  from temp1
   group by id;
quit;
triunk
Obsidian | Level 7

Wow indeed but this is brilliant! 

 

Thank you

Ksharp
Super User

It is really uneasy. Hope the following code could get you right answer.

 

data have;
infile cards dlm=',';
input ID  TERM_Code   Term_Desc  $;
year=int(TERM_Code/100);
cards;
1,   201520,  Spring 2015  
1,   201530,  Summer 2015  
1,   201610,  Fall 2015    
1,   201620,  Spring 2016  
1,   201630,  Summer 2016  
1,   201710,  Fall 2016    
1,   201720,  Spring 2017
1,   201730,  Summer 2017
1,   201810,  Fall 2019    
2,   201710,  Fall 2016    
2,   201720,  Spring 2016
2,   201810,  Fall 2017    
2,   201820,  Spring 2018  
2,   201830,  Summer 2018  
3,   201510,  Fall 2014    
3,   201520,  Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
 do year=min to max;
    TERM_Code=100*year+10;output;
    TERM_Code=100*year+20;output;
    TERM_Code=100*year+30;output;
 end;
drop min max year;
run;
data temp;
 merge key have(drop=year);
 by id TERM_Code;
run;
data temp1;
 set temp;
 by id;
 if first.id then group=0;
 if not missing(Term_Desc) then group+1;
  else group=0;
run;
proc sql;
create table want as
 select *,
  case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
  case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
  from temp1
   group by id;
quit;
Ksharp
Super User

It is really uneasy. Hope the following code could get you right answer.

 

data have;
infile cards dlm=',';
input ID  TERM_Code   Term_Desc  $;
year=int(TERM_Code/100);
cards;
1,   201520,  Spring 2015  
1,   201530,  Summer 2015  
1,   201610,  Fall 2015    
1,   201620,  Spring 2016  
1,   201630,  Summer 2016  
1,   201710,  Fall 2016    
1,   201720,  Spring 2017
1,   201730,  Summer 2017
1,   201810,  Fall 2019    
2,   201710,  Fall 2016    
2,   201720,  Spring 2016
2,   201810,  Fall 2017    
2,   201820,  Spring 2018  
2,   201830,  Summer 2018  
3,   201510,  Fall 2014    
3,   201520,  Spring 2015
;;;;
run;
proc summary data=have;
by id;
var year;
output out=year(drop=_:) min=min max=max;
run;
data key;
set year;
 do year=min to max;
    TERM_Code=100*year+10;output;
    TERM_Code=100*year+20;output;
    TERM_Code=100*year+30;output;
 end;
drop min max year;
run;
data temp;
 merge key have(drop=year);
 by id TERM_Code;
run;
data temp1;
 set temp;
 by id;
 if first.id then group=0;
 if not missing(Term_Desc) then group+1;
  else group=0;
run;
proc sql;
create table want as
 select *,
  case when(max(group)>2) then 'Yes' else 'No ' end as Yr1,
  case when(max(group)>5) then 'Yes' else 'No ' end as Yr2
  from temp1
   group by id;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 973 views
  • 3 likes
  • 3 in conversation