BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MegB
Calcite | Level 5

I have a data file containing educational test results for students in year levels 3, 5, 7, and 9. Each year level also has a value for "assessed year", based on the calendar year in which the student sat the test (values for calendar year vary by id). However I have many missing values for "assessed year". For each id, each value of assessed year should be 2 years apart. This is what the data should look like:

 

IDYearLevelAssessedYear
132007
152009
172011
192013
232009
252011
272013
292015
332011
352013
372015
392017
432003
452005
472007
492009
532003
552005
572007
592009

 

But this is what I have:

IDYearLevelAssessedYear
13.
152009
17.
192013
232009
252011
272013
29.
332011
35.
37.
39.
43.
452005
472007
492009
53.
55.
57.
592009

 

I want to fill in the missing cells by calculating +/- 2 years from the non-missing values of assessed year.

 

Any suggestions will be greatly appreciated, this has been keeping me occupied for some time now!

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Use the fact that AssessedYear - YearLevel is a constant for a given ID :

 

data have;
input ID  YearLevel   AssessedYear;
datalines;
1   3   .
1   5   2009
1   7   .
1   9   2013
2   3   2009
2   5   2011
2   7   2013
2   9   .
3   3   2011
3   5   .
3   7   .
3   9   .
4   3   .
4   5   2005
4   7   2007
4   9   2009
5   3   .
5   5   .
5   7   .
5   9   2009
;

data want;
do until(last.ID); 
    set have; by ID;
    if not missing(AssessedYear) then offset = AssessedYear - YearLevel;
    end;
do until(last.ID); 
    set have; by ID;
    AssessedYear = coalesce(AssessedYear, sum(YearLevel, offSet));
    output;
    end;
drop offset;
run;
PG

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Use the fact that AssessedYear - YearLevel is a constant for a given ID :

 

data have;
input ID  YearLevel   AssessedYear;
datalines;
1   3   .
1   5   2009
1   7   .
1   9   2013
2   3   2009
2   5   2011
2   7   2013
2   9   .
3   3   2011
3   5   .
3   7   .
3   9   .
4   3   .
4   5   2005
4   7   2007
4   9   2009
5   3   .
5   5   .
5   7   .
5   9   2009
;

data want;
do until(last.ID); 
    set have; by ID;
    if not missing(AssessedYear) then offset = AssessedYear - YearLevel;
    end;
do until(last.ID); 
    set have; by ID;
    AssessedYear = coalesce(AssessedYear, sum(YearLevel, offSet));
    output;
    end;
drop offset;
run;
PG
MegB
Calcite | Level 5

Thank you!!! 

 

That works perfectly and has saved me more hours of hair pulling. Amazing 🙂

jimbarbour
Meteorite | Level 14

I guess I should have refreshed my screen before answering; @PGStats has already solved this.  (nice solution)

 

Since I've already coded it and since it can be helpful to see different ways of doing things, here's another solution:

DATA	WORK.Assessments
		WORK.Populated_Years	(RENAME=(Year_Level=Populated_Level Assessed_Year=Populated_Year))
		;

	INFILE	DATALINES4
		DSD	DLM		=	'09'X
		FIRSTOBS	=	2
		MISSOVER;

	INPUT	ID				$
			Year_Level		$
			Assessed_Year	:	4.
			;

	OUTPUT	WORK.Assessments;

	IF	NOT	MISSING(Assessed_Year)	THEN
		OUTPUT	WORK.Populated_Years;
	ELSE
		DELETE;

DATALINES4;
ID	Year Level	Assessed_Year
1	3	.
1	5	2009
1	7	.
1	9	2013
2	3	2009
2	5	2011
2	7	2013
2	9	.
3	3	2011
3	5	.
3	7	.
3	9	.
4	3	.
4	5	2005
4	7	2007
4	9	2009
5	3	.
5	5	.
5	7	.
5	9	2009
;;;;
RUN;

DATA	Fully_Populated_Assessments;
	DROP	_:	Populated_:;

	IF	0											THEN
		DO;
			SET	WORK.Populated_Years;
		END;

	IF	_N_							=	1			THEN
		DO;
			DECLARE	HASH	Hsh_Pop_Years				(DATASET: 'WORK.Populated_Years');
							Hsh_Pop_Years.DefineKey 	('ID');
							Hsh_Pop_Years.DefineData	('Populated_Level', 'Populated_Year');
							Hsh_Pop_Years.DefineDone	();
		END;

	SET	WORK.ASSESSMENTS;

	IF	MISSING(Assessed_Year)						THEN
		DO;
			_RC						=	Hsh_Pop_Years.FIND();
			IF	_RC					=	0			THEN
				DO;
					Assessed_Year	=	Populated_Year	+	(Year_Level		-	Populated_Level);
				END;	
		END;	
RUN;

Jim

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1141 views
  • 4 likes
  • 3 in conversation