BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Quartz | Level 8

I have a dataset where it appears some ht and wt values were in put in the wrong column (e.g., ht in wt and vice versa). 

 

My gorilla code (see: use brute force) below worked, but I'm wondering if I could do this more succinctly, i.e., swap ht and wt values for certain observations meeting conditions specified by my IF statements.  

I also recognize I could combine the two datasteps below into one, and subsequently did, but wanted to be able to check my code was doing what I wanted in between. This is the same reason I didn't drop my htnew2/wtnew2 variables. 

data htwt;
	set wtnew;
	htnew2=0; /*added because otherwise this variable was uninitialized*/
	wtnew2=0; /*added because otherwise this variable was uninitialized*/
	IF 0<htnew<150 and wtnew>110 then htnew2=wtnew; 
	IF 0<htnew<150 and wtnew>110 then wtnew2=htnew;
	run;

/*I then checked these variables with proc freq; also ran proc print of those observations*/
data htwtcorrect; set htwt; IF wtnew2>0 and htnew2>0 then htnew=htnew2; IF wtnew2>0 and htnew2>0 then wtnew=wtnew2; run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If you have variables that identify the observations that need correction it is relatively easy. The complexity may come from how many variables are involved and how many observations. If you have a single variable such as an ID variable then a data step with an IN condition:

 

data example;
   set have;
   if idvariable in (123   456   2222) then do;
  <code to execute>
  end;
run;

If the list of Id values is long, or if there are multiple variables needed then a helper data set and a merge.

Proc sort data=have;
  by <list of id variables>;
run;

Proc sort data=helper;
  by <list of id variables>;
run;


 data want;
     merge
         have 
        helper (in=inhelper)
   ;
  by <list of id variables>;
  if inhelper then do;
  <code to do>
  end;
run;

The merge is going to want the two data sets to be sorted by the same values to match up observations. The IN=option creates a numeric 1/0, for true/false, variable that indicates when the current observation has a contribution from the helper data data set. So you can execute code conditionally. For this sort of problem the Helper data set should only have the variables needed to identify the problem observations.

 

And I do understand the problem. I have had the same thing happen with some child data. In my case it was all observations at a couple of schools so I could use the the first approach with an IN list for the couple of schools involved.

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

In general to swap two values you need a third variable to hold one of them.

_temp_=LEFT;
left=RIGHT;
right=_temp_;
drop _temp_;

What is your test that indicates the HT and WT should be swapped?

sasgorilla
Quartz | Level 8
I believe that is essentially what I did with my htnew2 and wtnew2 variables. Thank you for confirming.

As for my test, I first looked at the distribution of each via proc univariate with robustscale and plot options. Then, I looked at specific observations based on the outliers.

That was where I noticed that observations with nonsensical height and weight were within expected distribution if the variables were reversed, leading me to surmise these were input in reverse.
ballardw
Super User

If you have variables that identify the observations that need correction it is relatively easy. The complexity may come from how many variables are involved and how many observations. If you have a single variable such as an ID variable then a data step with an IN condition:

 

data example;
   set have;
   if idvariable in (123   456   2222) then do;
  <code to execute>
  end;
run;

If the list of Id values is long, or if there are multiple variables needed then a helper data set and a merge.

Proc sort data=have;
  by <list of id variables>;
run;

Proc sort data=helper;
  by <list of id variables>;
run;


 data want;
     merge
         have 
        helper (in=inhelper)
   ;
  by <list of id variables>;
  if inhelper then do;
  <code to do>
  end;
run;

The merge is going to want the two data sets to be sorted by the same values to match up observations. The IN=option creates a numeric 1/0, for true/false, variable that indicates when the current observation has a contribution from the helper data data set. So you can execute code conditionally. For this sort of problem the Helper data set should only have the variables needed to identify the problem observations.

 

And I do understand the problem. I have had the same thing happen with some child data. In my case it was all observations at a couple of schools so I could use the the first approach with an IN list for the couple of schools involved.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 581 views
  • 2 likes
  • 3 in conversation