I am trying to correct the master dataset using the Correct dataset. The correct dataset was produced after the double-entry. The second column is the Correct dataset indicates the name of the variable that needs to be corrected. The problem about the code below is the missing. Some of the entries that need to be corrected is the missing. If missing is the correct entry then the master dataset should be updated as such. However, when I transpose the Correct dataset, additional empty cells are created, which SAS can't distinguish the different between the correct missing vs missing entries resulted from the transpose. I need a code that only updates the variables listed in the Correct dataset. Thank you in advance.
data master;
input ID $ var1 $ var2 $;
datalines;
123 Yes Red
124 No Blue
125 Yes Yellow;
data correct ;
input ID $ variable $ correct $;
datalines;
123 var1 No
123 var2 Orange
124 var2 Red
125 var1 99;
proc sort data=Correct;by ID;run;
proc transpose data=correct out=cwide (drop=_name_);
by ID;
var correct;
id variable;
run;
proc contents data =cwide;run ;
Data Final;
update master cwide;
by ID;
run;
proc print data=final; title "Corrected";run;
Did you execute the SAS code piece and analyze/compare the outcome? The resulting WORK.FINAL dataset is correct, with the replaced column/variable values from the WORK.CORRECT file as would be expected. With an UPDATE request, SAS does not manipulate any columns for which there is a blank / missing value identified in the UPDATE statement (second argument / file).
Scott Barry
SBBWorks, Inc.
Right. This is a "correct missing vs missing entries" question. The var1 for 125 should be empty in the Final dataset but it is not because after transposing Correct dataset, SAS creates the missing field for Var1 for 124 and Var2 for 125. When I update the Master with the transposed dataset, SAS can't differentiate the 'correct' missing (=Var1 for 125) and other missing entries created by SAS (Var1 for 124 and Var2 for 125).
The Final dataset should show that Var1 for 125 is empty. I hope this clear up some confusion. Sorry about that.
I'd add a step to identify missing. Not sure that I would have come up with the same solution from the start but after looking at the transpose that makes sense to keep. Although I enjoyed the smiley faces that didn't really answer my previous question. Below I am loading the file as CSV, so the blank comes through and is clearly missing due to the comma.
This will work, not sure if it suits your purposes.
data master;
infile datalines dsd;
input ID $ var1 $ var2 $;
datalines;
123,Yes,Red
124,No,Blue
125,Yes,Yellow
;
data correct;
infile datalines dsd;
input ID $ variable $ correct $;
datalines;
123,var1,No
123,var2,Orange
124,var2,Red
125,var1,
;
data correct_II;
set correct;
if missing(correct) then correct = 'missing';
run;
proc sort data=Correct;by ID;run;
proc transpose data=correct_II out=cwide (drop=_name_);
by ID;
var correct;
id variable;
run;
data final;
update master cwide;
by id;
if var1 = 'missing' then call missing(var1);
if var2 = 'missing' then call missing(var2);
run;
What do you mean by "correct missing vs missing entries" ?
Do you mean if any one of var1 and var2 in CORRECT is missing , the updated should be stopped ?
data master;
input ID $ var1 $ var2 $;
datalines;
123 Yes Red
124 No Blue
125 Yes Yellow
;
data correct ;
input ID $ variable $ correct $;
datalines;
123 var1 No
123 var2 Orange
124 var2 Red
125 var1 99
;
proc sort data=Correct;by ID;run;
proc transpose data=correct out=cwide (drop=_name_);
by ID;
var correct;
id variable;
run;
Data Final;
merge master cwide(rename=(var1-var2=_var1-_var2));
by ID;
if cmiss(of _var1-_var2)=0 then do;var1=_var1;var2=_var2;end;
drop _:;
run;
Xia Keshan
"correct missing vs missing entries". Sorry for the confusion.
If something comes through in the correct dataset as missing how is it shown, How do you know something should be 'correctly' missing?
Right. When you transpose the Correct dataset it will get:
**I modified the original Correct dataset -- For 125, the var1 should be empty.
The problem is that when I transpose the Correct dataset, transposed dataset will automatically create the missing field in var1 for 124 and var 2 for 125 (marked with :smileycry:) -- var1 (marked with :smileygrin: ) should be empty for 125 as it is "correctly' missing. However when I update using this transposed Correct dataset, SAS won't update the Var1 to missing as SAS can't tell the difference between :smileygrin: and :smileycry:.
ID Var1 Var2
123 No Orange
124 :smileycry: Red
125 :smileygrin: :smileycry:
I hope this helps to clear up things.
*I modified the original Correct dataset -- For 125, the var1 should be empty.
The problem is that when I transpose the Correct dataset, transposed dataset will automatically create the missing field in var1 for 124 and var 2 for 125 (marked with :smileycry:) -- var1 (marked with :smileygrin: ) should be empty for 125. However I am only interested to update ones from the Correct dataset --meaning the Final dataset should show that var1 for 125 is empty.
ID Var1 Var2
123 No Orange
124 :smileycry: Red
125 :smileygrin: :smileycry:
data master;
input ID $ var1 $ var2 $;
datalines;
123 Yes Red
124 No Blue
125 Yes Yellow;
data correct ;
input ID $ variable $ correct $;
datalines;
123 var1 No
123 var2 Orange
124 var2 Red
125 var1 . ; ** This should be empty. (Character variable -- I didn't know what to put other than .)
proc sort data=Correct;by ID;run;
proc transpose data=correct out=cwide (drop=_name_);
by ID;
var correct;
id variable;
run;
proc contents data =cwide;run ;
Data Final;
update master cwide;
by ID;
run;
Clearly UPDATE processing doesn't work with such desired behavior, so you will want to consider assigning some "placeholder" unique-string data-value in WORK.CORRECT that would represent "empty" (which would not appear in your data normally), then in the final DATA step to create WORK.FINAL, adjust the data-value(s) back to blank/missing after the BY statement.
Scott Barry
SBBWorks, Inc.
If you want your transactions to be able to convert the value to missing then you need to set the value to ._ in the transaction dataset.
data master;
input ID $ var1 $ var2 $;
datalines;
123 Yes Red
124 No Blue
125 Yes Yellow
;;;;
data trans ;
infile cards truncover ;
if 0 then set master;
input id= var1= var2= ;
datalines;
id=123 var1= No
id=123 var2= Orange
id=124 var2= Red
id=125 var1= _
;;;;
data Final;
update master trans;
by ID;
run;
proc print;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
