Solved: Select unique observations with missing value and replace it with a di...

dakshu92 · Posted 08-12-2023 03:42 PM

Hi all,

Have

id Date Subject 1 Subject 2

1 July 10, 2020 Math Science

1 July 10, 2020 . Science

2 Feb 10, 2020 History Physics

2 Feb 10, 2020 Biology Physics

3 Jun 24, 2020 . Math

4 May 7, 2020 History Math

4 May 7, 2020 English Math

4 May 7, 2020 . Math

5 Feb 2, 2020 . French

Want

id Date Subject

1 July 10, 2020 Math

2 Feb 10, 2020 History

2 Feb 10, 2020 Biology

3 Jun 24, 2020 Math

4 May 7, 2020 History

4 May 7, 2020 English

5 Feb 2, 2020 French

So, basically, I want to replace subject 1 with subject 2 only when there is no subject mentioned for each student id.

Any help is appreciated.

Ksharp · Posted 08-13-2023 05:32 AM

data have;
input (id         Date                            Subject1      Subject2 ) (& $40.);
cards; 
1          July 10, 2020               Math               Science
1          July 10, 2020               .                       Science
1          July 10, 2020               .                       Science
2          Feb 10, 2020               History             Physics
2          Feb 10, 2020               Biology             Physics
3          Jun 24, 2020                .                       Math
3          Jun 24, 2020                .                       Math
4          May 7, 2020                History            Math              
4          May 7, 2020                English             Math
4          May 7, 2020                 .                        Math
5          Feb 2, 2020                 .                       French
5          Feb 2, 2020                 .                       French
;

proc sql;
create table want as
select id,date,Subject1 as Subject
 from have
  where Subject1 is not missing
union
select id,date,max(Subject2)
 from have
  group by id,date
   having n(Subject1)=0;
quit;

View solution in original post

PaigeMiller · Posted 08-12-2023 05:10 PM

Please explain why ID 1 has 3 rows in the input data set and one row in the output data set. Please explain why ID 4 has 3 rows in the input data set and 2 rows in the output data set. Please explain why ID 3 has 2 rows in the input data set and 1 row in the output data set.

--
Paige Miller

dakshu92 · Posted 08-12-2023 05:23 PM

For the final data that I want each unique subject 1 needs to be there in the dataset. Since ID1 has Math and 2 missing values in subject 1 the final data will have only math (unique values from subject 1).

For ID3 since subject 1 has 2 unique values I want to keep both of them.

And only if there is no information on subject 1 I want to replace it with subject 2.

Thank you!

ballardw · Posted 08-12-2023 08:19 PM

Does the DATE value actually have any impact in this?

mkeintz · Posted 08-12-2023 08:33 PM

I understand your objective as this:

If an ID has at least one non-blank subject1 then output only those records, with the resulting SUBJECT=SUBJECT1.
But if an ID has no non-blank subject1's, then output only one record, with SUBJECT=SUBJECT2

It appears that whenever condition 2 is the case, that subject2 is a constant for that ID.

No guarantees below, in the absence of sample data presented in the form of a working data step:

data want (drop=_:);
  set have;
  by id;

  if first.id then call missing(_subject1_count);
  _subject1_count+ (subject1^=' ');

  if subject1^=' ' or (last.id=1 and _subject1_count=0);
  subject=coalescec(subject1,subject2);
run;

Edited note: And it also appears that each ID has only a single value for DATE, which makes condition2 easier to honor.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 08-13-2023 05:32 AM

data have;
input (id         Date                            Subject1      Subject2 ) (& $40.);
cards; 
1          July 10, 2020               Math               Science
1          July 10, 2020               .                       Science
1          July 10, 2020               .                       Science
2          Feb 10, 2020               History             Physics
2          Feb 10, 2020               Biology             Physics
3          Jun 24, 2020                .                       Math
3          Jun 24, 2020                .                       Math
4          May 7, 2020                History            Math              
4          May 7, 2020                English             Math
4          May 7, 2020                 .                        Math
5          Feb 2, 2020                 .                       French
5          Feb 2, 2020                 .                       French
;

proc sql;
create table want as
select id,date,Subject1 as Subject
 from have
  where Subject1 is not missing
union
select id,date,max(Subject2)
 from have
  group by id,date
   having n(Subject1)=0;
quit;

Select unique observations with missing value and replace it with a different value

Re: Select unique observations with missing value and replace it with a different value

Re: Select unique observations with missing value and replace it with a different value

Re: Select unique observations with missing value and replace it with a different value

Re: Select unique observations with missing value and replace it with a different value

Re: Select unique observations with missing value and replace it with a different value

Re: Select unique observations with missing value and replace it with a different value

Classroom Training Available!