Solved: Re: Convert sub total / summary rows to a variable

Cruise · Posted 05-02-2020 02:17 PM

Hi Folks:

Could you please help me convert summary rows under municipality column (Current structure) to a variable shown as 'district' (Desired output) in the image below? I need district names as 'ID1NAME' variable and city names as 'IDNAME' variable for the ensuing data linkages.

data have;
input municipality : $9. count pop rate;
cards;
district1 1000 10000 10
city1 300 2500 12
city2 200 6000 3.3
city3 500 1500 33.3
district2 100 500 20
city4 50 200 25
city5 50 300 16.7
;

ed_sas_member · Posted 05-02-2020 02:39 PM

Hi @Cruise

You can do this, using the retain statement:

data want;
	length ID1NAME $ 20;
	set have;
	retain ID1NAME;
	if find(municipality,"district")>0 then do;
		ID1NAME=municipality;
		delete;
	end;
run;

View solution in original post

ed_sas_member · Posted 05-02-2020 02:39 PM

Hi @Cruise

You can do this, using the retain statement:

data want;
	length ID1NAME $ 20;
	set have;
	retain ID1NAME;
	if find(municipality,"district")>0 then do;
		ID1NAME=municipality;
		delete;
	end;
run;

Cruise · Posted 05-02-2020 03:22 PM

Thank you @ed_sas_member . Sorry for the ambiguity. The values that variable municipality takes don't share prefix: district but different district names. There are 9 districts that I could specify for the 'find' function. But some cities takes same names as their districts. Is there any solution to the update problem?

data have;
input municipality : $9. count pop rate;
cards;
Daejeon 1000 10000 10
A_city 300 2500 12
Daejeon 200 6000 3.3
B_city 500 1500 33.3
Busan 100 500 20
C_city 50 200 25
Busan 50 300 16.7
;

Kurt_Bremser · Posted 05-02-2020 05:02 PM

Given your structure, there is nothing to discern a district observation from a city observation as soon as they have the same name.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Cruise · Posted 05-02-2020 05:42 PM

True, however, District gets the sum value of cities combined, or mean rates of cities combined. Can this logic be used here?

Kurt_Bremser · Posted 05-04-2020 10:01 AM

If the line containing the district will always be the first for such a group, you can do this, using 2 lookup hash tables:

data have;
input municipality : $9. count pop rate;
cards;
Daejeon 1000 10000 10
A_city 300 2500 12
Daejeon 200 6000 3.3
B_city 500 1500 33.3
Busan 100 500 20
C_city 50 200 25
Busan 50 300 16.7
;

data districts;
input municipality :$9.;
datalines;
Daejeon
Busan
;

data want;
set have;
if _n_ = 1
then do;
  declare hash di (dataset:"districts");
  di.definekey("municipality");
  di.definedone();
  declare hash df ();
  df.definekey("municipality");
  df.definedone();
end;
retain district;
if di.check() = 0 and df.check() ne 0
then do;
  district = municipality;
  df.add();
  delete;
end;
run;

di holds the reference which names can appear as districts, and df holds the reference if a certain name was already used for the district.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX