I have a dataset where it appears some ht and wt values were in put in the wrong column (e.g., ht in wt and vice versa).
My gorilla code (see: use brute force) below worked, but I'm wondering if I could do this more succinctly, i.e., swap ht and wt values for certain observations meeting conditions specified by my IF statements.
I also recognize I could combine the two datasteps below into one, and subsequently did, but wanted to be able to check my code was doing what I wanted in between. This is the same reason I didn't drop my htnew2/wtnew2 variables.
data htwt;
set wtnew;
htnew2=0; /*added because otherwise this variable was uninitialized*/
wtnew2=0; /*added because otherwise this variable was uninitialized*/
IF 0<htnew<150 and wtnew>110 then htnew2=wtnew;
IF 0<htnew<150 and wtnew>110 then wtnew2=htnew;
run;
/*I then checked these variables with proc freq; also ran proc print of those observations*/
data htwtcorrect;
set htwt;
IF wtnew2>0 and htnew2>0 then htnew=htnew2;
IF wtnew2>0 and htnew2>0 then wtnew=wtnew2;
run;
If you have variables that identify the observations that need correction it is relatively easy. The complexity may come from how many variables are involved and how many observations. If you have a single variable such as an ID variable then a data step with an IN condition:
data example; set have; if idvariable in (123 456 2222) then do; <code to execute> end; run;
If the list of Id values is long, or if there are multiple variables needed then a helper data set and a merge.
Proc sort data=have; by <list of id variables>; run; Proc sort data=helper; by <list of id variables>; run; data want; merge have helper (in=inhelper) ; by <list of id variables>; if inhelper then do; <code to do> end; run;
The merge is going to want the two data sets to be sorted by the same values to match up observations. The IN=option creates a numeric 1/0, for true/false, variable that indicates when the current observation has a contribution from the helper data data set. So you can execute code conditionally. For this sort of problem the Helper data set should only have the variables needed to identify the problem observations.
And I do understand the problem. I have had the same thing happen with some child data. In my case it was all observations at a couple of schools so I could use the the first approach with an IN list for the couple of schools involved.
In general to swap two values you need a third variable to hold one of them.
_temp_=LEFT;
left=RIGHT;
right=_temp_;
drop _temp_;
What is your test that indicates the HT and WT should be swapped?
If you have variables that identify the observations that need correction it is relatively easy. The complexity may come from how many variables are involved and how many observations. If you have a single variable such as an ID variable then a data step with an IN condition:
data example; set have; if idvariable in (123 456 2222) then do; <code to execute> end; run;
If the list of Id values is long, or if there are multiple variables needed then a helper data set and a merge.
Proc sort data=have; by <list of id variables>; run; Proc sort data=helper; by <list of id variables>; run; data want; merge have helper (in=inhelper) ; by <list of id variables>; if inhelper then do; <code to do> end; run;
The merge is going to want the two data sets to be sorted by the same values to match up observations. The IN=option creates a numeric 1/0, for true/false, variable that indicates when the current observation has a contribution from the helper data data set. So you can execute code conditionally. For this sort of problem the Helper data set should only have the variables needed to identify the problem observations.
And I do understand the problem. I have had the same thing happen with some child data. In my case it was all observations at a couple of schools so I could use the the first approach with an IN list for the couple of schools involved.
This is helpful, thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.