BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jcis7
Pyrite | Level 9

G'day.

Appreciate any help you can give.

 

I've a file named fileA that has a unique identifying variable facility_code, no duplicates and it has variableX that I need to populate in fileB for certain years.

 

I've a second file named fileB (combined years of data) with the  facility_code listed once for each year so since the file which has 7 years worth of data, if the facility reported information each year, it will  be listed 7 times.  It may not have reported each of the years, only some.

fileB is missing data for variable X in years 2020-21 through 2023-24.

 

I have a variable in fileA named variableX I want to populate in fileB for only 4 of the seven school years (2020-21, 2021-22, 2022-23, 2023-24).

 

How do i update fileB, specifically variable X only for the four school years with the information  in FileA matched on facility_code for variable X only for specific school years? 

I only know how to update the file if it is a one to one relationship in each file, for example if the facilty_code is listed only once in fileZ, code below, I know how to update it with the information in fileA (2nd SAS code window below).

 

thanks!

DATA fileA;
 INPUT facility_code $ facility_type$ physical_zip$ variableX schoolyr$;
CARDS;
1234567 public 14524 Yes 2019-20
1234568 public 23561 No 2019-20
1234569 public 84972 No 2019-20
0234567 public 94801 Yes 2019-20
0234568 public 89024 No 2019-20 
0234569 public 28201 No 2019-20

;
RUN; 


DATA fileB;
 INPUT facility_code $ status$ facility_type$ physical_zip$ variableX schoolyr$;
CARDS;
1234567 active public 14524 Yes 2017-18
1234567 closed public 14524 Yes 2018-19
1234567 active public 14524 No  2017-18
1234567 closed public 14524 . 2021-22
1234568 active public 23561 Yes 2017-18
1234568 active public 23561 Yes 2018-19
1234568 active public 23561 Yes 2019-20
1234568 active public 23561 . 2020-21
1234568 active public 23561 . 2021-22
1234568 active public 23561 . 2022-23
1234568 active public 23561 . 2023-24
1234568 active public 23561 . 2018-19
1234569 closed public 84972 . 2020-21
1234569 closed public 84972 . 2021-22
1234569 closed public 84972 . 2022-23
1234569 closed public 84972 . 2023-24
0234567 active public 94801 . 2021-22
0234568 active public 89024 . 2022-23
0234569 active public 28201 . 2023-24

;
RUN; 



DATA want;
 INPUT facility_code $ status$ facility_type$ physical_zip$ variableX $ schoolyr$;
CARDS;
1234567 active public 14524 Yes 2017-18
1234567 closed public 14524 Yes 2018-19
1234567 active public 14524 No 2017-18
1234567 closed public 14524 Yes 2021-22
1234568 active public 23561 Yes 2017-18
1234568 active public 23561 Yes 2018-19
1234568 active public 23561 No 2019-20
1234568 active public 23561 No 2020-21
1234568 active public 23561 No 2021-22
1234568 active public 23561 No 2022-23
1234568 active public 23561 No 2023-24
1234568 active public 23561 No 2018-19
1234569 closed public 84972 No 2020-21
1234569 closed public 84972 No 2021-22
1234569 closed public 84972 No 2022-23
1234569 closed public 84972 No 2023-24
0234567 active public 94801 Yes 2021-22
0234568 active public 89024 No 2022-23
0234569 active public 28201 No 2023-24

;
RUN; 

 

data transaction;
  set fileA (keep=facility_code variableX);
run;

*--sort transaction;
proc sort data=transaction;
  by facility_code;
run;

*--sort list to be updated;
proc sort data=fileZ;
  by facility_code;
run;

*-update;
data updatedZ;
   update fileZ transaction updatemode=missingcheck;   
   by facility_code;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@jcis7 wrote:

Thanks for pointing out I forgot to make variableX character.

I need to update variableX only for years 2020-21 through 2023-24 so wondering how I do that.  Thanks.


Having sneakily included an IF statement we just add to the conditions:

data want;
   merge fileb 
         filea (keep= facility_code  variablex rename=(variablex=testx) )
   ;
   by facility_code ;
   if missing(variablex) and schoolyr in('2020-21' '2021-22' '2022-23' '2023-24') then variablex=testx;
   drop testx;
run;

Suggestion: You may want to save yourself some headaches by using a numeric year value and use a custom format that will display the 2020-21 values as needed. Long lists of school years such as in the IN comparison above could be reduced to schoolyr in (2020:2023) or similar range of integer values.

 

Here is an example of creating a data set that will format school years starting in 1950 through 2150 in the 1950-51 appearance.

data syrcntlin;
   fmtname='SchoolYr';
   type='N';
   do start=1950 to 2150;
      label=catx('-',put(start,f4.),put(mod(start+1,100),z2.));
      output;
   end;
run;

proc format library=work cntlin=syrcntlin;
run;

Just add: Format <year variable name> schoolyr. ; to any place you want the year to appear with that 2020-21 layout.

I've worked with school year data for nearly 20 years and this approach has saved a lot of headaches in that time.

 

 

View solution in original post

3 REPLIES 3
ballardw
Super User

Your first data step throws invalid data for every observation for variableX as you have not told SAS to read it as character.

Your second data step reads variablex as numeric.

You cannot update a numeric variable with a character value of "Yes" or "No".

So what are we missing? Is Variablex supposed to be numeric or character?

 

The UPDATE data step statement requires the target set to have only one value.

You are close. MERGE will handle this if only one set has duplicate by variables:

data want;
   merge fileb 
         filea (keep= facility_code  variablex rename=(variablex=testx) )
   ;
   by facility_code ;
   if missing(variablex) then variablex=testx;
   drop testx;
run;

You didn't mention what the behavior for the variables facility_type physical_zip or school_yr should be from FileA so I dropped them. The code i have is set up to handle a FileB where some of the values of VariableX are not missing. Because the MERGE be default would replace all of them I renamed the variable in data set options to something else and then only use the renamed value to replace missing variablex when encountered.

 

Note that the MISSING function will work regardless of variable type so no need to modify after you decide whether VariableX is character or numeric.

 

jcis7
Pyrite | Level 9

Thanks for pointing out I forgot to make variableX character.

I need to update variableX only for years 2020-21 through 2023-24 so wondering how I do that.  Thanks.

ballardw
Super User

@jcis7 wrote:

Thanks for pointing out I forgot to make variableX character.

I need to update variableX only for years 2020-21 through 2023-24 so wondering how I do that.  Thanks.


Having sneakily included an IF statement we just add to the conditions:

data want;
   merge fileb 
         filea (keep= facility_code  variablex rename=(variablex=testx) )
   ;
   by facility_code ;
   if missing(variablex) and schoolyr in('2020-21' '2021-22' '2022-23' '2023-24') then variablex=testx;
   drop testx;
run;

Suggestion: You may want to save yourself some headaches by using a numeric year value and use a custom format that will display the 2020-21 values as needed. Long lists of school years such as in the IN comparison above could be reduced to schoolyr in (2020:2023) or similar range of integer values.

 

Here is an example of creating a data set that will format school years starting in 1950 through 2150 in the 1950-51 appearance.

data syrcntlin;
   fmtname='SchoolYr';
   type='N';
   do start=1950 to 2150;
      label=catx('-',put(start,f4.),put(mod(start+1,100),z2.));
      output;
   end;
run;

proc format library=work cntlin=syrcntlin;
run;

Just add: Format <year variable name> schoolyr. ; to any place you want the year to appear with that 2020-21 layout.

I've worked with school year data for nearly 20 years and this approach has saved a lot of headaches in that time.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 350 views
  • 1 like
  • 2 in conversation