I am working on the Tidy Tuesday data for this week about seafood.
Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.
* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";
proc import out=production datafile=test1234 dbms=csv replace;
guessingrows = max;
getnames=no;
run;
* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
delete from production
where VAR1 = "Entity";
quit;
* Rename columns;
data production;
set production (rename=(Var1=Entity
Var2=Code
Var3=Year
Var4=Pelagic
Var5=Crustaceans
Var6=Cephalopods
Var7=Demersal
Var8=Freshwater
Var9=Molluscs
Var10=Other_Marine));
run;
* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
delete from production
where Code = "";
quit;
* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
set production;
where Entity ^= 'World' AND Year = '2013';
run;
* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production out=long_production;
by Entity-Year;
var Crustaceans-Other_Marine;
run;
* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
by Year;
var Crustaceans-Other_Marine;
run;
* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
by Code;
var Crustaceans-Other_Marine;
run;
Example of what I am trying to do, but this is from R.
There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.
Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,.
This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step.
You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you.
Here's my version of your code, some modifications:
* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";
data WORK.PRODUCTION ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat VAR1 $57. ;
informat VAR2 $8. ;
informat VAR3 $4. ;
informat VAR4-VAR10 8. ;
format VAR1 $57. ;
format VAR2 $8. ;
format VAR3 $4. ;
format VAR4-VAR10 8. ;
input VAR1-VAR10;
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
* Rename columns;
data production2;
set production (rename=(Var1=Entity
Var2=Code
Var3=Year
Var4=Pelagic
Var5=Crustaceans
Var6=Cephalopods
Var7=Demersal
Var8=Freshwater
Var9=Molluscs
Var10=Other_Marine));
run;
*clean up and filter;
proc sql;
create table production3 as
select *
from production2
where ENTITY not in ('Entity', 'World') and not missing(Code)
having year=max(year);
quit;
* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
by Entity Year;
var Crustaceans--Other_Marine;
run;
@Indescribled wrote:
I am working on the Tidy Tuesday data for this week about seafood.
Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.
* Get data 1; filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv"; proc import out=production datafile=test1234 dbms=csv replace; guessingrows = max; getnames=no; run; * Delete row 1 because getnames=no put the column names in the first row; proc sql; delete from production where VAR1 = "Entity"; quit; * Rename columns; data production; set production (rename=(Var1=Entity Var2=Code Var3=Year Var4=Pelagic Var5=Crustaceans Var6=Cephalopods Var7=Demersal Var8=Freshwater Var9=Molluscs Var10=Other_Marine)); run; * Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where; proc sql; delete from production where Code = ""; quit; * Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year); data production; set production; where Entity ^= 'World' AND Year = '2013'; run;
* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;Example of what I am trying to do, but this is from R.
There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.
Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,.
This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step.
You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you.
Here's my version of your code, some modifications:
* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";
data WORK.PRODUCTION ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat VAR1 $57. ;
informat VAR2 $8. ;
informat VAR3 $4. ;
informat VAR4-VAR10 8. ;
format VAR1 $57. ;
format VAR2 $8. ;
format VAR3 $4. ;
format VAR4-VAR10 8. ;
input VAR1-VAR10;
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
* Rename columns;
data production2;
set production (rename=(Var1=Entity
Var2=Code
Var3=Year
Var4=Pelagic
Var5=Crustaceans
Var6=Cephalopods
Var7=Demersal
Var8=Freshwater
Var9=Molluscs
Var10=Other_Marine));
run;
*clean up and filter;
proc sql;
create table production3 as
select *
from production2
where ENTITY not in ('Entity', 'World') and not missing(Code)
having year=max(year);
quit;
* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
by Entity Year;
var Crustaceans--Other_Marine;
run;
@Indescribled wrote:
I am working on the Tidy Tuesday data for this week about seafood.
Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.
* Get data 1; filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv"; proc import out=production datafile=test1234 dbms=csv replace; guessingrows = max; getnames=no; run; * Delete row 1 because getnames=no put the column names in the first row; proc sql; delete from production where VAR1 = "Entity"; quit; * Rename columns; data production; set production (rename=(Var1=Entity Var2=Code Var3=Year Var4=Pelagic Var5=Crustaceans Var6=Cephalopods Var7=Demersal Var8=Freshwater Var9=Molluscs Var10=Other_Marine)); run; * Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where; proc sql; delete from production where Code = ""; quit; * Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year); data production; set production; where Entity ^= 'World' AND Year = '2013'; run;
* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;Example of what I am trying to do, but this is from R.
Am I the only one curious about why random numerals in the R output have underlined (partial) values?
Don't use PROC IMPORT to read a TEXT file, especially if you already know what is in the file.
I wouldn't worry about eliminating entities or years (SAS is not R, you don't have to put all your data into memory).
data wide ;
infile test1234 dsd truncover firstobs=2 ;
input entity :$40. code :$8. year
Pelagic Crustaceans Cephalopods Demersal Freshwater Molluscs Other_Marine
;
run;
proc transpose data=wide name=seafood out=want(rename=(col1=production));
by entity code year;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.