BookmarkSubscribeRSS Feed
jcis7
Pyrite | Level 9

G'day.

Appreciate any help you can give.

 

I've a file named fileA that has a unique identifying variable facility_code, no duplicates and it has variableX that I need to populate in fileB for certain years.

 

I've a second file named fileB (combined years of data) with the  facility_code listed once for each year so since the file which has 7 years worth of data, if the facility reported information each year, it will  be listed 7 times.  It may not have reported each of the years, only some.

fileB is missing data for variable X in years 2020-21 through 2023-24.

 

I have a variable in fileA named variableX I want to populate in fileB for only 4 of the seven school years (2020-21, 2021-22, 2022-23, 2023-24).

 

How do i update fileB, specifically variable X only for the four school years with the information  in FileA matched on facility_code for variable X only for specific school years? 

I only know how to update the file if it is a one to one relationship in each file, for example if the facilty_code is listed only once in fileZ, code below, I know how to update it with the information in fileA (2nd SAS code window below).

 

thanks!

DATA fileA;
 INPUT facility_code $ facility_type$ physical_zip$ variableX schoolyr$;
CARDS;
1234567 public 14524 Yes 2019-20
1234568 public 23561 No 2019-20
1234569 public 84972 No 2019-20
0234567 public 94801 Yes 2019-20
0234568 public 89024 No 2019-20 
0234569 public 28201 No 2019-20

;
RUN; 


DATA fileB;
 INPUT facility_code $ status$ facility_type$ physical_zip$ variableX schoolyr$;
CARDS;
1234567 active public 14524 Yes 2017-18
1234567 closed public 14524 Yes 2018-19
1234567 active public 14524 No  2017-18
1234567 closed public 14524 . 2021-22
1234568 active public 23561 Yes 2017-18
1234568 active public 23561 Yes 2018-19
1234568 active public 23561 Yes 2019-20
1234568 active public 23561 . 2020-21
1234568 active public 23561 . 2021-22
1234568 active public 23561 . 2022-23
1234568 active public 23561 . 2023-24
1234568 active public 23561 . 2018-19
1234569 closed public 84972 . 2020-21
1234569 closed public 84972 . 2021-22
1234569 closed public 84972 . 2022-23
1234569 closed public 84972 . 2023-24
0234567 active public 94801 . 2021-22
0234568 active public 89024 . 2022-23
0234569 active public 28201 . 2023-24

;
RUN; 



DATA want;
 INPUT facility_code $ status$ facility_type$ physical_zip$ variableX $ schoolyr$;
CARDS;
1234567 active public 14524 Yes 2017-18
1234567 closed public 14524 Yes 2018-19
1234567 active public 14524 No 2017-18
1234567 closed public 14524 Yes 2021-22
1234568 active public 23561 Yes 2017-18
1234568 active public 23561 Yes 2018-19
1234568 active public 23561 No 2019-20
1234568 active public 23561 No 2020-21
1234568 active public 23561 No 2021-22
1234568 active public 23561 No 2022-23
1234568 active public 23561 No 2023-24
1234568 active public 23561 No 2018-19
1234569 closed public 84972 No 2020-21
1234569 closed public 84972 No 2021-22
1234569 closed public 84972 No 2022-23
1234569 closed public 84972 No 2023-24
0234567 active public 94801 Yes 2021-22
0234568 active public 89024 No 2022-23
0234569 active public 28201 No 2023-24

;
RUN; 

 

data transaction;
  set fileA (keep=facility_code variableX);
run;

*--sort transaction;
proc sort data=transaction;
  by facility_code;
run;

*--sort list to be updated;
proc sort data=fileZ;
  by facility_code;
run;

*-update;
data updatedZ;
   update fileZ transaction updatemode=missingcheck;   
   by facility_code;
run;

 

2 REPLIES 2
ballardw
Super User

Your first data step throws invalid data for every observation for variableX as you have not told SAS to read it as character.

Your second data step reads variablex as numeric.

You cannot update a numeric variable with a character value of "Yes" or "No".

So what are we missing? Is Variablex supposed to be numeric or character?

 

The UPDATE data step statement requires the target set to have only one value.

You are close. MERGE will handle this if only one set has duplicate by variables:

data want;
   merge fileb 
         filea (keep= facility_code  variablex rename=(variablex=testx) )
   ;
   by facility_code ;
   if missing(variablex) then variablex=testx;
   drop testx;
run;

You didn't mention what the behavior for the variables facility_type physical_zip or school_yr should be from FileA so I dropped them. The code i have is set up to handle a FileB where some of the values of VariableX are not missing. Because the MERGE be default would replace all of them I renamed the variable in data set options to something else and then only use the renamed value to replace missing variablex when encountered.

 

Note that the MISSING function will work regardless of variable type so no need to modify after you decide whether VariableX is character or numeric.

 

jcis7
Pyrite | Level 9

Thanks for pointing out I forgot to make variableX character.

I need to update variableX only for years 2020-21 through 2023-24 so wondering how I do that.  Thanks.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 91 views
  • 0 likes
  • 2 in conversation