Here's an easy way to handle the situation:
data want;
update have (obs=0) have;
by id;
run;
I apologize. When pasting the data the format was lost.
See attached excel file for the "have" and "wanted" files.
Many of us (including me) will not download Microsoft office files as they can be a security threat. The preferred way to make your data available is as SAS data step code (instructions).
absolutely right. The easiest is to put a picture, here it is.
I'm not going to try to program from data in a screen capture. Please provide the data as a SAS data step, which you can create yourself, or you can use the instructions I gave.
Got it. I include SAS syntax
data have;
infile datalines delimiter=' ';
input ID$ v1-v6;
datalines;
A 1 1 1 . . .
A . . . 2 2 2
B 1 3 2 2 3 3
C 1 2 3 3 3 3
D 2 2 2 . . .
D 2 2 2 3 3 3
;
proc print;
data want;
infile datalines delimiter=' ';
input ID$ v1-v6;
datalines;
A 1 1 1 2 2 2
B 1 3 2 2 3 3
C 1 2 3 3 3 3
D 2 2 2 3 3 3
;
proc print;
Please see the following previous SAS post
https://communities.sas.com/t5/SAS-Programming/Fill-missing-values-with-the-previous-values/td-p/326...
Here's an easy way to handle the situation:
data want;
update have (obs=0) have;
by id;
run;
Thank you, that was simple and works perfectly.
There is a further complication, which I will probably handle by hand, but which I outline below in case any of you has some idea.
For certain duplicate observations, one or more columns both have values in it, but they are not the same values.
I include a SAS syntax with the same have dataset than before, with one more variable, V7, which for the duplicate obs A has the same value (x) in both lines while for duplicate obs D it has different values (x, y)on the two lines.
SAS seems to choose one of the values, probably the value of the observation that comes first, I would guess, and of course it has no way of choosing which one is correct. What would be useful would be to get an output in which all the duplicate obs (that is, obs that have the same ID value) that have at least one column with different values, flagged.
Eman
data have;
infile datalines delimiter=' ';
input ID$ v1-v7;
datalines;
A 1 1 1 . . . x
A . . . 2 2 2 x
B 1 3 2 2 3 3 y
C 1 2 3 3 3 3 x
D 2 2 2 . . . x
D 2 2 2 3 3 3 y
;
eman
Just for the record, with multiple observations, UPDATE takes the last non-missing value for each variable.
To get the sort of "different nonmissing" cases that you describe, I would switch to PROC SUMMARY. It can calculate the minimum value and the maximum value for each variable, and you can program with that to find cases where minimum is not equal to maximum. It won't give you the specific observations, but it will identify both the IDs and the variables. This would be a good starting point for what you are asking, because of compications when there are multiple observations (not just two) for the same ID.
This would be a good starting point:
proc summary data=have;
by id;
var var1-var5;
output out=minimums (drop=_type_ _freq_) min=;
output out=maximums (drop=_type_ _freq_) max=;
run;
data want;
set minimums maximums;
by id;
array vars {5} var1-var5;
do _n_= 1 to 5;
change = dif(var{_n_});
if change ne 0 then do;
varname = vname{var{_n_});
output;
end;
end;
keep id varname change;
run;
It's untested code, but looks about right. See how it fits when you want.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.