BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks: 

 

Could you please help me convert summary rows under municipality column (Current structure) to a variable shown as 'district' (Desired output) in the image below? I need district names as 'ID1NAME' variable and city names as 'IDNAME' variable for the ensuing data linkages.    

 

table wanted.png

data have;
input municipality : $9. count pop rate;
cards;
district1 1000 10000 10
city1 300 2500 12
city2 200 6000 3.3
city3 500 1500 33.3
district2 100 500 20
city4 50 200 25
city5 50 300 16.7
;
1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Cruise 

You can do this, using the retain statement:

data want;
	length ID1NAME $ 20;
	set have;
	retain ID1NAME;
	if find(municipality,"district")>0 then do;
		ID1NAME=municipality;
		delete;
	end;
run;

Capture d’écran 2020-05-02 à 20.39.05.png

View solution in original post

5 REPLIES 5
ed_sas_member
Meteorite | Level 14

Hi @Cruise 

You can do this, using the retain statement:

data want;
	length ID1NAME $ 20;
	set have;
	retain ID1NAME;
	if find(municipality,"district")>0 then do;
		ID1NAME=municipality;
		delete;
	end;
run;

Capture d’écran 2020-05-02 à 20.39.05.png

Cruise
Ammonite | Level 13

Thank you @ed_sas_member . Sorry for the ambiguity. The values that variable municipality takes don't share prefix: district but different district names. There are 9 districts that I could specify for the 'find' function. But some cities takes same names as their districts. Is there any solution to the update problem? 

data have;
input municipality : $9. count pop rate;
cards;
Daejeon 1000 10000 10
A_city 300 2500 12
Daejeon 200 6000 3.3
B_city 500 1500 33.3
Busan 100 500 20
C_city 50 200 25
Busan 50 300 16.7
;

 

Cruise
Ammonite | Level 13
True, however, District gets the sum value of cities combined, or mean rates of cities combined. Can this logic be used here?
Kurt_Bremser
Super User

If the line containing the district will always be the first for such a group, you can do this, using 2 lookup hash tables:

data have;
input municipality : $9. count pop rate;
cards;
Daejeon 1000 10000 10
A_city 300 2500 12
Daejeon 200 6000 3.3
B_city 500 1500 33.3
Busan 100 500 20
C_city 50 200 25
Busan 50 300 16.7
;

data districts;
input municipality :$9.;
datalines;
Daejeon
Busan
;

data want;
set have;
if _n_ = 1
then do;
  declare hash di (dataset:"districts");
  di.definekey("municipality");
  di.definedone();
  declare hash df ();
  df.definekey("municipality");
  df.definedone();
end;
retain district;
if di.check() = 0 and df.check() ne 0
then do;
  district = municipality;
  df.add();
  delete;
end;
run;

di holds the reference which names can appear as districts, and df holds the reference if a certain name was already used for the district.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1715 views
  • 2 likes
  • 3 in conversation