Solved: Re: if/do SAS Practice Exam

JC411911 · Posted 05-24-2021 09:24 PM

I am practicing for the base certification and having trouble understanding the ask in Step 3. The answer to this has a if group='A' then kilograms=79 else kilograms=89; I don't understand where or why these values are used. Any help would be appreciated.

The Question Asked:

Step 1:

create a temporary data set, cleandata36.
In this data set, convert all group values to upper case.
Then keep only observations with group equal to 'A' or 'B'.

Step 2:

Determine the MEDIAN value for the Kilograms variable for each group (A,B) in the cleandata36 data set. Round MEDIAN to the nearest whole number.

Step 3:

create results.output36 from cleandata36
Ensure that all values for variable Kilograms are between 40 and 200, inclusively.
If the value is missing or out of range, replace the value with the MEDIAN Kilograms value for the respective group (A,B) calculated in step 2.

Step 1: 
data work.cleandata36;
  set cert.input36;
  group=upcase(group);
  if upcase(group) in ('A','B');
run;

Step 2:
proc means data=work.cleandata36 median;
class group;
var kilograms;
run;

Step 3:
data results.output36;
  set cleandata36;
  if Kilograms < 40 OR Kilograms > 200 then do;
  if group='A' then kilograms=79;
  else kilograms=89; 
end;
run;

jimbarbour · Posted 05-25-2021 01:31 AM

Elaborating on my earlier answer, here's a program that does I believe everything that they're asking for. It uses the calculated means in the final step where values out of range are replaced with the median values. My data has a median of 79 for Group A and 89 for Group B.

As a suggestion: You might try running my program and playing with it a bit in order to understand it better.

See also the comments in the code, such as they are.

Jim

LIBNAME	Cert	'X:\sastemp';
LIBNAME	Results	'X:\sastemp';

**	Step 0	- Load data	**;
data Cert.INPUT36;
	LENGTH	Group	$1;
	
  infile	datalines4	dsd	dlm='09'x;
  input		Group	$
  			Kilograms	
			;
  			
DATALINES4;
A	24
A	54
A	60
A	44
A	84
A	42
A	52
A	72
A	74
A	36
A	38
a	104
a	190
a	180
a	120
a	100
a	128
a	130
a	132
a	134
b	24
b	44
b	60
b	54
b	94
b	52
b	62
b	92
b	84
b	86
b	88
b	84
b	170
b	180
b	120
b	110
b	138
b	140
b	142
b	144
b	52
b	62
b	92
b	90
c	999
d	888
e	777
f	666
;;;;
run;

**	Step 1	- Clean the data	**;
data work.cleandata36;
  set cert.input36;
  group=upcase(group);
  if upcase(group) in ('A','B');
run;

**	Step 2	- Calculate the median save it in a data set named Summary	**;
*ODS TRACE	ON;
ODS	OUTPUT	Summary=Summary;
proc means data=work.cleandata36 median;
	class group;
	var kilograms;
run;
*ODS TRACE	OFF;

**	Step 3	- Populate macro variables containing the calculated median for each group	**;
DATA	_NULL_;
	set		WORK.Summary;
	CALL	SYMPUTX(CATS('Median_', Group), ROUND(Kilograms_Median), 'G');
RUN;

**	Step 4	- Substitute the median values when the actual value is out of range	**;
data results.output36;
	set cleandata36;
	if Kilograms < 40 OR Kilograms > 200 then 
		do;
			if group='A' then 
				kilograms=&Median_A;
			else 
				kilograms=&Median_B; 
		end;
run;

View solution in original post

jimbarbour · Posted 05-25-2021 12:39 AM

I don't have access to the certification data (at least not that I know of). What is the median value for Group A? What is the median value for Group B? I assume that those values are 79 and 89, respectively. If one sets a missing value to the median value, then one does not shift a curve one way or another, at least I think that's the idea behind this question. Using the median value is an attempt to avoid skewing the data.

Jim

jimbarbour · Posted 05-25-2021 01:31 AM

Elaborating on my earlier answer, here's a program that does I believe everything that they're asking for. It uses the calculated means in the final step where values out of range are replaced with the median values. My data has a median of 79 for Group A and 89 for Group B.

As a suggestion: You might try running my program and playing with it a bit in order to understand it better.

See also the comments in the code, such as they are.

Jim

LIBNAME	Cert	'X:\sastemp';
LIBNAME	Results	'X:\sastemp';

**	Step 0	- Load data	**;
data Cert.INPUT36;
	LENGTH	Group	$1;
	
  infile	datalines4	dsd	dlm='09'x;
  input		Group	$
  			Kilograms	
			;
  			
DATALINES4;
A	24
A	54
A	60
A	44
A	84
A	42
A	52
A	72
A	74
A	36
A	38
a	104
a	190
a	180
a	120
a	100
a	128
a	130
a	132
a	134
b	24
b	44
b	60
b	54
b	94
b	52
b	62
b	92
b	84
b	86
b	88
b	84
b	170
b	180
b	120
b	110
b	138
b	140
b	142
b	144
b	52
b	62
b	92
b	90
c	999
d	888
e	777
f	666
;;;;
run;

**	Step 1	- Clean the data	**;
data work.cleandata36;
  set cert.input36;
  group=upcase(group);
  if upcase(group) in ('A','B');
run;

**	Step 2	- Calculate the median save it in a data set named Summary	**;
*ODS TRACE	ON;
ODS	OUTPUT	Summary=Summary;
proc means data=work.cleandata36 median;
	class group;
	var kilograms;
run;
*ODS TRACE	OFF;

**	Step 3	- Populate macro variables containing the calculated median for each group	**;
DATA	_NULL_;
	set		WORK.Summary;
	CALL	SYMPUTX(CATS('Median_', Group), ROUND(Kilograms_Median), 'G');
RUN;

**	Step 4	- Substitute the median values when the actual value is out of range	**;
data results.output36;
	set cleandata36;
	if Kilograms < 40 OR Kilograms > 200 then 
		do;
			if group='A' then 
				kilograms=&Median_A;
			else 
				kilograms=&Median_B; 
		end;
run;

JC411911 · Posted 05-25-2021 03:39 PM

Thank you very much for your help this definitely helps me understand it better!

Tom · Posted 05-25-2021 12:43 AM

I have no idea where 79 or 89 came from, but I assume they are the expected answer to step2.

Step 3 talks about replacing the value when the current value is either missing or not "valid".

The wording of the last two parts of step 3 is confusing. The middle line make it look like you were requested to eliminate the observations with such invalid values. If you did that then there would be any need to replace any values. So in that case you should first writes some words describing how you have translated their poorly worded question into something that makes sense and then provide the answer to that.

The test : (Kilograms < 40 OR Kilograms > 200 ) will includes values that are too large and values that are too small. Since SAS treats missing values as smaller than any actual value it will also include the missing values.

So you get to the nested IF/THEN/ELSE statements only when you want to replace the value of KILOGRAMS with the group means. That nested IF/THEN/ELSE group is only testing for GROUP='A' but because in STEP 1 you eliminated any values of GROUP that are not A or B then the ELSE must be those where GROUP='B'.

JC411911 · Posted 05-25-2021 03:40 PM

Yes the wording definitely threw me for a loop. After looking further you are correct the 79 and 89 are the median but it didnt click very easily at first when I read what was being asked. Thank you!

if/do SAS Practice Exam

Re: if/do SAS Practice Exam

Re: if/do SAS Practice Exam

Re: if/do SAS Practice Exam

Re: if/do SAS Practice Exam

Re: if/do SAS Practice Exam

Re: if/do SAS Practice Exam

SAS Innovate 2025: Call for Content