11-21-2016 10:27 PM
My following PROC SQL sort the common variable I joined the table with. Is there a code for lookup table where the common variable (Country_Code in this case) does not get sorted?
My expected output:
Emp_Name Country Code
DATA Lookup_CountryCode; INPUT Country_Name $ Country_Code Region $; DATALINES; Japan 107 Asia USA 109 America France 103 Europe Canada 111 America India 105 Asia Nepal 115 Asia Italy 102 Europe ; RUN; DATA Employee; INPUT Emp_Name $ Country_Code; DATALINES; David 107 Joy 103 Murad 105 Paul 102 Vick 109 Maya 103 ; RUN; PROC SQL; CREATE TABLE want AS SELECT a.Emp_Name, b.Country_Name FROM Employee a LEFT JOIN Lookup_CountryCode b on a.Country_Code=b.Country_Code; QUIT;
11-21-2016 10:36 PM
You can't control row order with SQL unless you use an order by clause.
You must use a data step, or add a sorting variable.
11-21-2016 10:48 PM
I tried this and it worked:
PROC SQL; CREATE TABLE want AS SELECT a.Emp_Name, ( select Country_Name from Lookup_CountryCode where Country_Code = a.Country_Code ) as country_name FROM Employee as a; select * from want; QUIT;
11-21-2016 11:20 PM
11-21-2016 11:51 PM
Yeah. I can imagine (but not test) that multithreading would disturb output order. Anyway, output order is never guaranteed unless there is an ORDER BY clause.
11-26-2016 11:20 PM
Multithreading is when SAS splits a task among many processors that run in parallel. I don't know exactly why your code reordered the data table and mine didn't. To my knowledge, this isn't documented anywhere and might change in a future SAS version. I agree 100% with @LinusH's comment below.
11-21-2016 10:51 PM
You could also create formats and then apply them to your dataset. You can apply your formats and not modify your dataset unless you need both, code and name.
11-22-2016 12:46 AM
By a data step it can be achieved.
Hold the Country_Name in a temporary array using Country_Code as its Index. Then read a record by record from Lookup_Country data set, load the array. Then lookup the array using Country_Code from Employee. If Country_Code is found in Lookup_CountryCode data set then the coreesponding cell will have Country_Name, otherwise a missing value. Thus the order of rows of Employee is undisturbed.
data need; array k[102:115] $8 _temporary_; length Emp_Name $8; do until(eof); set Lookup_CountryCode(drop = Region) end = eof; k[Country_Code] = Country_Name; end; do until(last); set Employee end = last; if not missing (k[Country_Code]) then do; Country_Name = k[Country_Code]; output; end; end; drop Country_Code; run;
11-22-2016 03:29 AM
This is how you do it with a format:
data cntlin; set lookup_countrycode; drop region; fmtname = 'country_lookup'; type = 'N'; rename country_code=start country_name=label ; run; proc format library=work cntlin=cntlin; run; data want; set employee; country_name = put(country_code,country_lookup.); run;
11-22-2016 05:32 AM
Original sort order is not a sustainable concept when transforming data. if your original sort order is important, you need a columns/variable that gives you the appropriate sort order (which you could use to sort the result after the processing, like in an ORDER BY clause in SQL).