I need to find the most frequent languages and number of people speaking those languages in each Zip Code. Have over 2000 Zips. Here is dummy data to illustrate my needs.
Needing to go from Table A:
Language | 90001 | 90002 | 90003 |
English | 35010.55 | 362 | 1201.05 |
French | 1250.65 | 7805.23 | . |
Spanish | 2951 | . | 4835.702 |
Chinese | 130.25 | 520103.07 | . |
Japanese | 762.79 | 6945.33 | 1234.693 |
Russian | 501.86 | 7485.45 | 9210.252 |
To Table B
ZIP | Language_name_1 | Language_N_1 | Language_name_2 | Language_N_2 | Language_name_3 | Language_N_3 | Language_name_4 | Language_N_4 | Language_name_5 | Language_N_5 | Language_name_6 | Language_N_6 |
90001 | English | 35010.55 | Spanish | 2951 | French | 1250.65 | Japanese | 762.79 | Russian | 501.86 | Chinese | 130.25 |
90002 | Chinese | 520103.1 | French | 7805.23 | Russian | 7485.45 | Japanese | 6945.33 | English | 362 | Spanish | . |
90003 | Russian | 9210.252 | Spanish | 4835.702 | Japanese | 1234.693 | English | 1201.05 | Chinese | . | French | . |
The following code proposes a different layout for the result. Assuming you want the result in an Excel I'd consider such a layout as easier to work with because you can easily filter by rank to determine which language within a postcode is spoken the most, 2nd and so on frequently.
/* create a source .csv file in SAS WORK */
filename src_csv "%sysfunc(pathname(work))/source_data.csv";
data _null_;
file src_csv;
infile datalines truncover;
input;
put _infile_;
datalines;
Language,90001,90002,90003
English,35010.55,362,1201.05
French,1250.65,7805.23,.
Spanish,2951,.,4835.702
Chinese,130.25,520103.07,.
Japanese,762.79,6945.33,1234.693
Russian,501.86,7485.45,9210.252
;
run;
/* import .csv into a SAS table */
%let sv_validvarname=%sysfunc(getoption(validvarname,keyword));
options validvarname=any;
proc import
datafile="%sysfunc(pathname(work))/source_data.csv"
dbms=csv
out=work.have
replace;
run;
options &sv_validvarname;
/* transpose to long data structure */
proc transpose data=work.have out=work.have_long(rename=(_name_=zip_code col1=n_people)) ;
by Language notsorted;
run;
proc sort data=work.have_long out=work.have_long_sorted;
by zip_code n_people Language;
run;
proc rank data=work.have_long_sorted out=work.want_ranked ties=dense descending;
by zip_code;
ranks rank;
var n_people;
run;
proc sort data=work.want_ranked;
by zip_code rank;
run;
proc print data=work.want_ranked;
run;
It would also not be hard to exclude the rows with missings or alternatively set the values to zero and/or ensure that for each zip-code all languages get listed.
The following code proposes a different layout for the result. Assuming you want the result in an Excel I'd consider such a layout as easier to work with because you can easily filter by rank to determine which language within a postcode is spoken the most, 2nd and so on frequently.
/* create a source .csv file in SAS WORK */
filename src_csv "%sysfunc(pathname(work))/source_data.csv";
data _null_;
file src_csv;
infile datalines truncover;
input;
put _infile_;
datalines;
Language,90001,90002,90003
English,35010.55,362,1201.05
French,1250.65,7805.23,.
Spanish,2951,.,4835.702
Chinese,130.25,520103.07,.
Japanese,762.79,6945.33,1234.693
Russian,501.86,7485.45,9210.252
;
run;
/* import .csv into a SAS table */
%let sv_validvarname=%sysfunc(getoption(validvarname,keyword));
options validvarname=any;
proc import
datafile="%sysfunc(pathname(work))/source_data.csv"
dbms=csv
out=work.have
replace;
run;
options &sv_validvarname;
/* transpose to long data structure */
proc transpose data=work.have out=work.have_long(rename=(_name_=zip_code col1=n_people)) ;
by Language notsorted;
run;
proc sort data=work.have_long out=work.have_long_sorted;
by zip_code n_people Language;
run;
proc rank data=work.have_long_sorted out=work.want_ranked ties=dense descending;
by zip_code;
ranks rank;
var n_people;
run;
proc sort data=work.want_ranked;
by zip_code rank;
run;
proc print data=work.want_ranked;
run;
It would also not be hard to exclude the rows with missings or alternatively set the values to zero and/or ensure that for each zip-code all languages get listed.
This gets me where I need to be. Thank you!
Using Patrick 's dataset .
/* create a source .csv file in SAS WORK */
filename src_csv "%sysfunc(pathname(work))/source_data.csv";
data _null_;
file src_csv;
infile datalines truncover;
input;
put _infile_;
datalines;
Language,90001,90002,90003
English,35010.55,362,1201.05
French,1250.65,7805.23,.
Spanish,2951,.,4835.702
Chinese,130.25,520103.07,.
Japanese,762.79,6945.33,1234.693
Russian,501.86,7485.45,9210.252
;
run;
/* import .csv into a SAS table */
options validvarname=any;
proc import
datafile="%sysfunc(pathname(work))/source_data.csv"
dbms=csv
out=work.have
replace;
run;
/* transpose to long data structure */
proc transpose data=work.have out=work.have_long(rename=(_name_=zip_code col1=n_people)) ;
by Language notsorted;
run;
proc sort data=work.have_long out=work.have_long_sorted;
by zip_code descending n_people ;
run;
data x;
set have_long_sorted;
by zip_code;
if first.zip_code then n=0;
n+1;
run;
proc sql noprint;
select distinct catt('x(where=(n=',n,') rename=(language=language_name_',n,' n_people=language_n_',n,'))')
into :merge separated by ' '
from x;
quit;
data want;
if 0 then set x(keep=zip_code);
merge &merge;
by zip_code;
drop n;
run;
Calling all data scientists and open-source enthusiasts! Want to solve real problems that impact your company or the world? Register to hack by August 31st!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.