BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dakshu92
Calcite | Level 5

Hi all, 

Have

id         Date                            Subject 1      Subject 2                  

1          July 10, 2020               Math               Science

1          July 10, 2020               .                       Science

1          July 10, 2020               .                       Science

2          Feb 10, 2020               History             Physics

2          Feb 10, 2020               Biology             Physics

3          Jun 24, 2020                .                       Math

3          Jun 24, 2020                .                       Math

4          May 7, 2020                History            Math              

4          May 7, 2020                English             Math

4          May 7, 2020                 .                        Math

5          Feb 2, 2020                 .                       French

5          Feb 2, 2020                 .                       French

 

Want

id         Date                            Subject                        

1          July 10, 2020               Math              

2          Feb 10, 2020               History                        

2          Feb 10, 2020               Biology            

3          Jun 24, 2020                Math  

4          May 7, 2020                History                                   

4          May 7, 2020                English            

5          Feb 2, 2020                 French

 

So, basically, I want to replace subject 1 with subject 2 only when there is no subject mentioned for each student id.

 

Any help is appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input (id         Date                            Subject1      Subject2 ) (& $40.);
cards; 
1          July 10, 2020               Math               Science
1          July 10, 2020               .                       Science
1          July 10, 2020               .                       Science
2          Feb 10, 2020               History             Physics
2          Feb 10, 2020               Biology             Physics
3          Jun 24, 2020                .                       Math
3          Jun 24, 2020                .                       Math
4          May 7, 2020                History            Math              
4          May 7, 2020                English             Math
4          May 7, 2020                 .                        Math
5          Feb 2, 2020                 .                       French
5          Feb 2, 2020                 .                       French
;

proc sql;
create table want as
select id,date,Subject1 as Subject
 from have
  where Subject1 is not missing
union
select id,date,max(Subject2)
 from have
  group by id,date
   having n(Subject1)=0;
quit;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Please explain why ID 1 has 3 rows in the input data set and one row in the output data set. Please explain why ID 4 has 3 rows in the input data set and 2 rows in the output data set. Please explain why ID 3 has 2 rows in the input data set and 1 row in the output data set.

--
Paige Miller
dakshu92
Calcite | Level 5

For the final data that I want each unique subject 1 needs to be there in the dataset. Since  ID1 has Math and 2 missing values in subject 1 the final data will have only math (unique values from subject 1).

For ID3 since subject 1 has 2 unique values I want to keep both of them. 

And only if there is no information on subject 1 I want to replace it with subject 2. 

Thank you!

ballardw
Super User

Does the DATE value actually have any impact in this?

mkeintz
PROC Star

I understand your objective as this:

  1.  If an ID has at least one non-blank subject1 then output only those records, with the resulting SUBJECT=SUBJECT1.
  2. But if an ID has no non-blank subject1's, then output only one record, with SUBJECT=SUBJECT2

It appears that whenever condition 2 is the case, that subject2 is a constant for that ID.

 

No guarantees below, in the absence of sample data presented in the form of a working data step:

 

data want (drop=_:);
  set have;
  by id;

  if first.id then call missing(_subject1_count);
  _subject1_count+ (subject1^=' ');

  if subject1^=' ' or (last.id=1 and _subject1_count=0);
  subject=coalescec(subject1,subject2);
run;

 

Edited note:  And it also appears that each ID has only a single value for DATE, which makes condition2 easier to honor.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
input (id         Date                            Subject1      Subject2 ) (& $40.);
cards; 
1          July 10, 2020               Math               Science
1          July 10, 2020               .                       Science
1          July 10, 2020               .                       Science
2          Feb 10, 2020               History             Physics
2          Feb 10, 2020               Biology             Physics
3          Jun 24, 2020                .                       Math
3          Jun 24, 2020                .                       Math
4          May 7, 2020                History            Math              
4          May 7, 2020                English             Math
4          May 7, 2020                 .                        Math
5          Feb 2, 2020                 .                       French
5          Feb 2, 2020                 .                       French
;

proc sql;
create table want as
select id,date,Subject1 as Subject
 from have
  where Subject1 is not missing
union
select id,date,max(Subject2)
 from have
  group by id,date
   having n(Subject1)=0;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 503 views
  • 0 likes
  • 5 in conversation