Solved: Best way to swap certain values of variables that were input in wrong ...

sasgorilla · Posted 10-22-2024 11:07 AM

I have a dataset where it appears some ht and wt values were in put in the wrong column (e.g., ht in wt and vice versa).

My gorilla code (see: use brute force) below worked, but I'm wondering if I could do this more succinctly, i.e., swap ht and wt values for certain observations meeting conditions specified by my IF statements.

I also recognize I could combine the two datasteps below into one, and subsequently did, but wanted to be able to check my code was doing what I wanted in between. This is the same reason I didn't drop my htnew2/wtnew2 variables.

data htwt;
	set wtnew;
	htnew2=0; /*added because otherwise this variable was uninitialized*/
	wtnew2=0; /*added because otherwise this variable was uninitialized*/
	IF 0<htnew<150 and wtnew>110 then htnew2=wtnew; 
	IF 0<htnew<150 and wtnew>110 then wtnew2=htnew;
	run;

/*I then checked these variables with proc freq; also ran proc print of those observations*/

data htwtcorrect;
	set htwt;
	IF wtnew2>0 and htnew2>0 then htnew=htnew2;
	IF wtnew2>0 and htnew2>0 then wtnew=wtnew2;
run;

ballardw · Posted 10-22-2024 11:45 AM

If you have variables that identify the observations that need correction it is relatively easy. The complexity may come from how many variables are involved and how many observations. If you have a single variable such as an ID variable then a data step with an IN condition:

data example;
   set have;
   if idvariable in (123   456   2222) then do;
  <code to execute>
  end;
run;

If the list of Id values is long, or if there are multiple variables needed then a helper data set and a merge.

Proc sort data=have;
  by <list of id variables>;
run;

Proc sort data=helper;
  by <list of id variables>;
run;


 data want;
     merge
         have 
        helper (in=inhelper)
   ;
  by <list of id variables>;
  if inhelper then do;
  <code to do>
  end;
run;

The merge is going to want the two data sets to be sorted by the same values to match up observations. The IN=option creates a numeric 1/0, for true/false, variable that indicates when the current observation has a contribution from the helper data data set. So you can execute code conditionally. For this sort of problem the Helper data set should only have the variables needed to identify the problem observations.

And I do understand the problem. I have had the same thing happen with some child data. In my case it was all observations at a couple of schools so I could use the the first approach with an IN list for the couple of schools involved.

View solution in original post

Tom · Posted 10-22-2024 11:29 AM

In general to swap two values you need a third variable to hold one of them.

_temp_=LEFT;
left=RIGHT;
right=_temp_;
drop _temp_;

What is your test that indicates the HT and WT should be swapped?

sasgorilla · Posted 10-22-2024 11:37 AM

I believe that is essentially what I did with my htnew2 and wtnew2 variables. Thank you for confirming.

As for my test, I first looked at the distribution of each via proc univariate with robustscale and plot options. Then, I looked at specific observations based on the outliers.

That was where I noticed that observations with nonsensical height and weight were within expected distribution if the variables were reversed, leading me to surmise these were input in reverse.

ballardw · Posted 10-22-2024 11:45 AM

If you have variables that identify the observations that need correction it is relatively easy. The complexity may come from how many variables are involved and how many observations. If you have a single variable such as an ID variable then a data step with an IN condition:

data example;
   set have;
   if idvariable in (123   456   2222) then do;
  <code to execute>
  end;
run;

If the list of Id values is long, or if there are multiple variables needed then a helper data set and a merge.

Proc sort data=have;
  by <list of id variables>;
run;

Proc sort data=helper;
  by <list of id variables>;
run;


 data want;
     merge
         have 
        helper (in=inhelper)
   ;
  by <list of id variables>;
  if inhelper then do;
  <code to do>
  end;
run;

The merge is going to want the two data sets to be sorted by the same values to match up observations. The IN=option creates a numeric 1/0, for true/false, variable that indicates when the current observation has a contribution from the helper data data set. So you can execute code conditionally. For this sort of problem the Helper data set should only have the variables needed to identify the problem observations.

And I do understand the problem. I have had the same thing happen with some child data. In my case it was all observations at a couple of schools so I could use the the first approach with an IN list for the couple of schools involved.

sasgorilla · Posted 10-22-2024 11:48 AM

This is helpful, thank you!

Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

Re: Best way to swap certain values of variables that were input in wrong columns

SAS Innovate 2025: Register Now