BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MegB
Calcite | Level 5

I have a data file containing educational test results for students in year levels 3, 5, 7, and 9. Each year level also has a value for "assessed year", based on the calendar year in which the student sat the test (values for calendar year vary by id). However I have many missing values for "assessed year". For each id, each value of assessed year should be 2 years apart. This is what the data should look like:

 

IDYearLevelAssessedYear
132007
152009
172011
192013
232009
252011
272013
292015
332011
352013
372015
392017
432003
452005
472007
492009
532003
552005
572007
592009

 

But this is what I have:

IDYearLevelAssessedYear
13.
152009
17.
192013
232009
252011
272013
29.
332011
35.
37.
39.
43.
452005
472007
492009
53.
55.
57.
592009

 

I want to fill in the missing cells by calculating +/- 2 years from the non-missing values of assessed year.

 

Any suggestions will be greatly appreciated, this has been keeping me occupied for some time now!

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Use the fact that AssessedYear - YearLevel is a constant for a given ID :

 

data have;
input ID  YearLevel   AssessedYear;
datalines;
1   3   .
1   5   2009
1   7   .
1   9   2013
2   3   2009
2   5   2011
2   7   2013
2   9   .
3   3   2011
3   5   .
3   7   .
3   9   .
4   3   .
4   5   2005
4   7   2007
4   9   2009
5   3   .
5   5   .
5   7   .
5   9   2009
;

data want;
do until(last.ID); 
    set have; by ID;
    if not missing(AssessedYear) then offset = AssessedYear - YearLevel;
    end;
do until(last.ID); 
    set have; by ID;
    AssessedYear = coalesce(AssessedYear, sum(YearLevel, offSet));
    output;
    end;
drop offset;
run;
PG

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

Use the fact that AssessedYear - YearLevel is a constant for a given ID :

 

data have;
input ID  YearLevel   AssessedYear;
datalines;
1   3   .
1   5   2009
1   7   .
1   9   2013
2   3   2009
2   5   2011
2   7   2013
2   9   .
3   3   2011
3   5   .
3   7   .
3   9   .
4   3   .
4   5   2005
4   7   2007
4   9   2009
5   3   .
5   5   .
5   7   .
5   9   2009
;

data want;
do until(last.ID); 
    set have; by ID;
    if not missing(AssessedYear) then offset = AssessedYear - YearLevel;
    end;
do until(last.ID); 
    set have; by ID;
    AssessedYear = coalesce(AssessedYear, sum(YearLevel, offSet));
    output;
    end;
drop offset;
run;
PG
MegB
Calcite | Level 5

Thank you!!! 

 

That works perfectly and has saved me more hours of hair pulling. Amazing 🙂

jimbarbour
Meteorite | Level 14

I guess I should have refreshed my screen before answering; @PGStats has already solved this.  (nice solution)

 

Since I've already coded it and since it can be helpful to see different ways of doing things, here's another solution:

DATA	WORK.Assessments
		WORK.Populated_Years	(RENAME=(Year_Level=Populated_Level Assessed_Year=Populated_Year))
		;

	INFILE	DATALINES4
		DSD	DLM		=	'09'X
		FIRSTOBS	=	2
		MISSOVER;

	INPUT	ID				$
			Year_Level		$
			Assessed_Year	:	4.
			;

	OUTPUT	WORK.Assessments;

	IF	NOT	MISSING(Assessed_Year)	THEN
		OUTPUT	WORK.Populated_Years;
	ELSE
		DELETE;

DATALINES4;
ID	Year Level	Assessed_Year
1	3	.
1	5	2009
1	7	.
1	9	2013
2	3	2009
2	5	2011
2	7	2013
2	9	.
3	3	2011
3	5	.
3	7	.
3	9	.
4	3	.
4	5	2005
4	7	2007
4	9	2009
5	3	.
5	5	.
5	7	.
5	9	2009
;;;;
RUN;

DATA	Fully_Populated_Assessments;
	DROP	_:	Populated_:;

	IF	0											THEN
		DO;
			SET	WORK.Populated_Years;
		END;

	IF	_N_							=	1			THEN
		DO;
			DECLARE	HASH	Hsh_Pop_Years				(DATASET: 'WORK.Populated_Years');
							Hsh_Pop_Years.DefineKey 	('ID');
							Hsh_Pop_Years.DefineData	('Populated_Level', 'Populated_Year');
							Hsh_Pop_Years.DefineDone	();
		END;

	SET	WORK.ASSESSMENTS;

	IF	MISSING(Assessed_Year)						THEN
		DO;
			_RC						=	Hsh_Pop_Years.FIND();
			IF	_RC					=	0			THEN
				DO;
					Assessed_Year	=	Populated_Year	+	(Year_Level		-	Populated_Level);
				END;	
		END;	
RUN;

Jim

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 851 views
  • 4 likes
  • 3 in conversation