SAS Programming

emaneman · Posted 11-28-2021 05:19 PM

Astounding · Posted 11-28-2021 09:14 PM

Here's an easy way to handle the situation:

data want;
   update have (obs=0) have;
   by id;
run;

View solution in original post

emaneman · Posted 11-28-2021 05:27 PM

Dear community,
I am having an issue with a dataset in which I have duplicate observations, which are, however, not identical. The identifier shows that it is the same observations, but on some columns/variables one line has data and on some other columns/variables it is another line that has the data. Some duplicate observations have all the values, and the duplicate has none.
In the datasets below the top part shows the initial file and the bottom part shows the result that i want to obtain.
As you can see, observations with ID A and D are duplicate in the initial file, and are "merged" in the outcome file.
Any suggestion on how to do that in SAS?
Thanks in advance.
Eman
ID var1 var2 var3 var4 var5 var6
A 1 1 1
A 2 2 2
B 1 3 2 2 3 3
C 1 2 3 3 3 3
D 2 2 2
D 2 2 2 3 3 3

ID var1 var2 var3 var4 var5 var6
A 1 1 1 2 2 2
B 1 3 2 2 3 3
C 1 2 3 3 3 3
D 2 2 2 3 3 3

emaneman · Posted 11-28-2021 05:40 PM

I apologize. When pasting the data the format was lost.

See attached excel file for the "have" and "wanted" files.

PaigeMiller · Posted 11-28-2021 06:31 PM

Many of us (including me) will not download Microsoft office files as they can be a security threat. The preferred way to make your data available is as SAS data step code (instructions).

--
Paige Miller

emaneman · Posted 11-28-2021 07:03 PM

absolutely right. The easiest is to put a picture, here it is.

Screenshot 2021-11-29 at 01.03.06.png

PaigeMiller · Posted 11-28-2021 07:09 PM

I'm not going to try to program from data in a screen capture. Please provide the data as a SAS data step, which you can create yourself, or you can use the instructions I gave.

--
Paige Miller

emaneman · Posted 11-28-2021 07:22 PM

Got it. I include SAS syntax

data have;
   infile datalines delimiter='	'; 

input ID$ v1-v6;
datalines;
A	1	1	1	.	.	.
A	.	.	.	2	2	2
B	1	3	2	2	3	3
C	1	2	3	3	3	3
D	2	2	2	.	.	.
D	2	2	2	3	3	3
;

proc print;


data want;
   infile datalines delimiter='	'; 

input ID$ v1-v6;
datalines;
A	1	1	1	2	2	2
B	1	3	2	2	3	3
C	1	2	3	3	3	3
D	2	2	2	3	3	3
;

proc print;

Sajid01 · Posted 11-28-2021 07:28 PM

Please see the following previous SAS post
https://communities.sas.com/t5/SAS-Programming/Fill-missing-values-with-the-previous-values/td-p/326...

Astounding · Posted 11-28-2021 09:14 PM

Here's an easy way to handle the situation:

data want;
   update have (obs=0) have;
   by id;
run;

emaneman · Posted 11-30-2021 05:53 AM

Thank you, that was simple and works perfectly.

There is a further complication, which I will probably handle by hand, but which I outline below in case any of you has some idea.

For certain duplicate observations, one or more columns both have values in it, but they are not the same values.

I include a SAS syntax with the same have dataset than before, with one more variable, V7, which for the duplicate obs A has the same value (x) in both lines while for duplicate obs D it has different values (x, y)on the two lines.

SAS seems to choose one of the values, probably the value of the observation that comes first, I would guess, and of course it has no way of choosing which one is correct. What would be useful would be to get an output in which all the duplicate obs (that is, obs that have the same ID value) that have at least one column with different values, flagged.

Eman

data have;
   infile datalines delimiter='	'; 

input ID$ v1-v7;
datalines;
A	1	1	1	.	.	.	x
A	.	.	.	2	2	2	x
B	1	3	2	2	3	3	y
C	1	2	3	3	3	3	x
D	2	2	2	.	.	.	x
D	2	2	2	3	3	3	y
;

eman

Astounding · Posted 11-30-2021 12:21 PM

Just for the record, with multiple observations, UPDATE takes the last non-missing value for each variable.

To get the sort of "different nonmissing" cases that you describe, I would switch to PROC SUMMARY. It can calculate the minimum value and the maximum value for each variable, and you can program with that to find cases where minimum is not equal to maximum. It won't give you the specific observations, but it will identify both the IDs and the variables. This would be a good starting point for what you are asking, because of compications when there are multiple observations (not just two) for the same ID.

This would be a good starting point:

proc summary data=have;
   by id;
   var var1-var5;
   output out=minimums (drop=_type_ _freq_) min=;
   output out=maximums (drop=_type_ _freq_) max=;
run;

data want;
   set minimums maximums;
   by id;
   array vars {5} var1-var5;
   do _n_= 1 to 5;
      change = dif(var{_n_});
      if change ne 0 then do;
         varname = vname{var{_n_});
         output;
      end;
   end;
   keep id varname change;
run;

It's untested code, but looks about right. See how it fits when you want.

emaneman · Posted 12-08-2021 06:14 PM

thank you, I will check try this out.

SAS Programming

Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Re: Dataset with double observations

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...