BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JC411911
Obsidian | Level 7

I am practicing for the base certification and having trouble understanding the ask in Step 3. The answer to this has a if group='A' then kilograms=79 else kilograms=89; I don't understand where or why these values are used. Any help would be appreciated.

 

The Question Asked:

    • Step 1:
      • create a temporary data set, cleandata36.
      • In this data set, convert all group values to upper case.
      • Then keep only observations with group equal to 'A' or 'B'.
    • Step 2:
      • Determine the MEDIAN value for the Kilograms variable for each group (A,B) in the cleandata36 data set. Round MEDIAN to the nearest whole number.
    • Step 3:
      • create results.output36 from cleandata36
      • Ensure that all values for variable Kilograms are between 40 and 200, inclusively.
      • If the value is missing or out of range, replace the value with the MEDIAN Kilograms value for the respective group (A,B) calculated in step 2.

 

Step 1: 
data work.cleandata36; set cert.input36; group=upcase(group); if upcase(group) in ('A','B'); run; Step 2:
proc means data=work.cleandata36 median; class group; var kilograms; run; Step 3:
data results.output36; set cleandata36; if Kilograms < 40 OR Kilograms > 200 then do; if group='A' then kilograms=79; else kilograms=89; end; run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

Elaborating on my earlier answer, here's a program that does I believe everything that they're asking for.  It uses the calculated means in the final step where values out of range are replaced with the median values.  My data has a median of 79 for Group A and 89 for Group B.

 

As a suggestion:  You might try running my program and playing with it a bit in order to understand it better.

 

See also the comments in the code, such as they are.

 

Jim

 

LIBNAME	Cert	'X:\sastemp';
LIBNAME	Results	'X:\sastemp';

**	Step 0	- Load data	**;
data Cert.INPUT36;
	LENGTH	Group	$1;
	
  infile	datalines4	dsd	dlm='09'x;
  input		Group	$
  			Kilograms	
			;
  			
DATALINES4;
A	24
A	54
A	60
A	44
A	84
A	42
A	52
A	72
A	74
A	36
A	38
a	104
a	190
a	180
a	120
a	100
a	128
a	130
a	132
a	134
b	24
b	44
b	60
b	54
b	94
b	52
b	62
b	92
b	84
b	86
b	88
b	84
b	170
b	180
b	120
b	110
b	138
b	140
b	142
b	144
b	52
b	62
b	92
b	90
c	999
d	888
e	777
f	666
;;;;
run;

**	Step 1	- Clean the data	**;
data work.cleandata36;
  set cert.input36;
  group=upcase(group);
  if upcase(group) in ('A','B');
run;

**	Step 2	- Calculate the median save it in a data set named Summary	**;
*ODS TRACE	ON;
ODS	OUTPUT	Summary=Summary;
proc means data=work.cleandata36 median;
	class group;
	var kilograms;
run;
*ODS TRACE	OFF;

**	Step 3	- Populate macro variables containing the calculated median for each group	**;
DATA	_NULL_;
	set		WORK.Summary;
	CALL	SYMPUTX(CATS('Median_', Group), ROUND(Kilograms_Median), 'G');
RUN;

**	Step 4	- Substitute the median values when the actual value is out of range	**;
data results.output36;
	set cleandata36;
	if Kilograms < 40 OR Kilograms > 200 then 
		do;
			if group='A' then 
				kilograms=&Median_A;
			else 
				kilograms=&Median_B; 
		end;
run;

View solution in original post

5 REPLIES 5
jimbarbour
Meteorite | Level 14

I don't have access to the certification data (at least not that I know of).  What is the median value for Group A?  What is the median value for Group B?  I assume that those values are 79 and 89, respectively.  If one sets a missing value to the median value, then one does not shift a curve one way or another, at least I think that's the idea behind this question.  Using the median value is an attempt to avoid skewing the data.

 

Jim

jimbarbour
Meteorite | Level 14

Elaborating on my earlier answer, here's a program that does I believe everything that they're asking for.  It uses the calculated means in the final step where values out of range are replaced with the median values.  My data has a median of 79 for Group A and 89 for Group B.

 

As a suggestion:  You might try running my program and playing with it a bit in order to understand it better.

 

See also the comments in the code, such as they are.

 

Jim

 

LIBNAME	Cert	'X:\sastemp';
LIBNAME	Results	'X:\sastemp';

**	Step 0	- Load data	**;
data Cert.INPUT36;
	LENGTH	Group	$1;
	
  infile	datalines4	dsd	dlm='09'x;
  input		Group	$
  			Kilograms	
			;
  			
DATALINES4;
A	24
A	54
A	60
A	44
A	84
A	42
A	52
A	72
A	74
A	36
A	38
a	104
a	190
a	180
a	120
a	100
a	128
a	130
a	132
a	134
b	24
b	44
b	60
b	54
b	94
b	52
b	62
b	92
b	84
b	86
b	88
b	84
b	170
b	180
b	120
b	110
b	138
b	140
b	142
b	144
b	52
b	62
b	92
b	90
c	999
d	888
e	777
f	666
;;;;
run;

**	Step 1	- Clean the data	**;
data work.cleandata36;
  set cert.input36;
  group=upcase(group);
  if upcase(group) in ('A','B');
run;

**	Step 2	- Calculate the median save it in a data set named Summary	**;
*ODS TRACE	ON;
ODS	OUTPUT	Summary=Summary;
proc means data=work.cleandata36 median;
	class group;
	var kilograms;
run;
*ODS TRACE	OFF;

**	Step 3	- Populate macro variables containing the calculated median for each group	**;
DATA	_NULL_;
	set		WORK.Summary;
	CALL	SYMPUTX(CATS('Median_', Group), ROUND(Kilograms_Median), 'G');
RUN;

**	Step 4	- Substitute the median values when the actual value is out of range	**;
data results.output36;
	set cleandata36;
	if Kilograms < 40 OR Kilograms > 200 then 
		do;
			if group='A' then 
				kilograms=&Median_A;
			else 
				kilograms=&Median_B; 
		end;
run;
JC411911
Obsidian | Level 7

Thank you very much for your help this definitely helps me understand it better!

Tom
Super User Tom
Super User

I have no idea where 79 or 89 came from, but I assume they are the expected answer to step2.

 

Step 3 talks about replacing the value when the current value is either missing or not "valid".

The wording of the last two parts of step 3 is confusing.  The middle line make it look like you were requested to eliminate the observations with such invalid values.  If you did that then there would be any need to replace any values.  So in that case you should first writes some words describing how you have translated their poorly worded question into something that makes sense and then provide the answer to that.  

 

The test : (Kilograms < 40 OR Kilograms > 200 ) will includes values that are too large and values that are too small.  Since SAS treats missing values as smaller than any actual value it will also include the missing values.

So you get to the nested IF/THEN/ELSE statements only when you want to replace the value of KILOGRAMS with the group means.  That nested IF/THEN/ELSE group is only testing for GROUP='A' but because in STEP 1 you eliminated any values of GROUP that are not A or B then the ELSE must be those where GROUP='B'.

 

JC411911
Obsidian | Level 7

Yes the wording definitely threw me for a loop. After looking further you are correct the 79 and 89 are the median but it didnt click very easily at first when I read what was being asked. Thank you!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2200 views
  • 10 likes
  • 3 in conversation