BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dakshu92
Calcite | Level 5

Hi all, 

Have

id         Date                            Subject 1      Subject 2                  

1          July 10, 2020               Math               Science

1          July 10, 2020               .                       Science

1          July 10, 2020               .                       Science

2          Feb 10, 2020               History             Physics

2          Feb 10, 2020               Biology             Physics

3          Jun 24, 2020                .                       Math

3          Jun 24, 2020                .                       Math

4          May 7, 2020                History            Math              

4          May 7, 2020                English             Math

4          May 7, 2020                 .                        Math

5          Feb 2, 2020                 .                       French

5          Feb 2, 2020                 .                       French

 

Want

id         Date                            Subject                        

1          July 10, 2020               Math              

2          Feb 10, 2020               History                        

2          Feb 10, 2020               Biology            

3          Jun 24, 2020                Math  

4          May 7, 2020                History                                   

4          May 7, 2020                English            

5          Feb 2, 2020                 French

 

So, basically, I want to replace subject 1 with subject 2 only when there is no subject mentioned for each student id.

 

Any help is appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input (id         Date                            Subject1      Subject2 ) (& $40.);
cards; 
1          July 10, 2020               Math               Science
1          July 10, 2020               .                       Science
1          July 10, 2020               .                       Science
2          Feb 10, 2020               History             Physics
2          Feb 10, 2020               Biology             Physics
3          Jun 24, 2020                .                       Math
3          Jun 24, 2020                .                       Math
4          May 7, 2020                History            Math              
4          May 7, 2020                English             Math
4          May 7, 2020                 .                        Math
5          Feb 2, 2020                 .                       French
5          Feb 2, 2020                 .                       French
;

proc sql;
create table want as
select id,date,Subject1 as Subject
 from have
  where Subject1 is not missing
union
select id,date,max(Subject2)
 from have
  group by id,date
   having n(Subject1)=0;
quit;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Please explain why ID 1 has 3 rows in the input data set and one row in the output data set. Please explain why ID 4 has 3 rows in the input data set and 2 rows in the output data set. Please explain why ID 3 has 2 rows in the input data set and 1 row in the output data set.

--
Paige Miller
dakshu92
Calcite | Level 5

For the final data that I want each unique subject 1 needs to be there in the dataset. Since  ID1 has Math and 2 missing values in subject 1 the final data will have only math (unique values from subject 1).

For ID3 since subject 1 has 2 unique values I want to keep both of them. 

And only if there is no information on subject 1 I want to replace it with subject 2. 

Thank you!

ballardw
Super User

Does the DATE value actually have any impact in this?

mkeintz
PROC Star

I understand your objective as this:

  1.  If an ID has at least one non-blank subject1 then output only those records, with the resulting SUBJECT=SUBJECT1.
  2. But if an ID has no non-blank subject1's, then output only one record, with SUBJECT=SUBJECT2

It appears that whenever condition 2 is the case, that subject2 is a constant for that ID.

 

No guarantees below, in the absence of sample data presented in the form of a working data step:

 

data want (drop=_:);
  set have;
  by id;

  if first.id then call missing(_subject1_count);
  _subject1_count+ (subject1^=' ');

  if subject1^=' ' or (last.id=1 and _subject1_count=0);
  subject=coalescec(subject1,subject2);
run;

 

Edited note:  And it also appears that each ID has only a single value for DATE, which makes condition2 easier to honor.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
input (id         Date                            Subject1      Subject2 ) (& $40.);
cards; 
1          July 10, 2020               Math               Science
1          July 10, 2020               .                       Science
1          July 10, 2020               .                       Science
2          Feb 10, 2020               History             Physics
2          Feb 10, 2020               Biology             Physics
3          Jun 24, 2020                .                       Math
3          Jun 24, 2020                .                       Math
4          May 7, 2020                History            Math              
4          May 7, 2020                English             Math
4          May 7, 2020                 .                        Math
5          Feb 2, 2020                 .                       French
5          Feb 2, 2020                 .                       French
;

proc sql;
create table want as
select id,date,Subject1 as Subject
 from have
  where Subject1 is not missing
union
select id,date,max(Subject2)
 from have
  group by id,date
   having n(Subject1)=0;
quit;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1514 views
  • 0 likes
  • 5 in conversation