BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Indescribled
Obsidian | Level 7

I am working on the Tidy Tuesday data for this week about seafood. 

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;


* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;

Example of what I am trying to do, but this is from R. 

Indescribled_0-1634234796391.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.

Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,. 

 

This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step. 

You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you. 

 

Here's my version of your code, some modifications:

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

 data WORK.PRODUCTION    ;
             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
              infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
                 informat VAR1 $57. ;
                 informat VAR2 $8. ;
                 informat VAR3 $4. ;
                 informat VAR4-VAR10 8. ;
 
                 format VAR1 $57. ;
                 format VAR2 $8. ;
                 format VAR3 $4. ;
                 format VAR4-VAR10 8. ;

             input VAR1-VAR10;
                        
             ;
             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
             run;
 
 

* Rename columns;
data production2;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

*clean up and filter;
proc sql;
create table production3 as
select *
from production2 
where ENTITY not in ('Entity', 'World') and not missing(Code) 
having year=max(year);
quit;




* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
	by Entity Year;
	var Crustaceans--Other_Marine;
run;

@Indescribled wrote:

I am working on the Tidy Tuesday data for this week about seafood. 

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;


* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;

Example of what I am trying to do, but this is from R. 

Indescribled_0-1634234796391.png

 


 

View solution in original post

5 REPLIES 5
Reeza
Super User

There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.

Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,. 

 

This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step. 

You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you. 

 

Here's my version of your code, some modifications:

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

 data WORK.PRODUCTION    ;
             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
              infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
                 informat VAR1 $57. ;
                 informat VAR2 $8. ;
                 informat VAR3 $4. ;
                 informat VAR4-VAR10 8. ;
 
                 format VAR1 $57. ;
                 format VAR2 $8. ;
                 format VAR3 $4. ;
                 format VAR4-VAR10 8. ;

             input VAR1-VAR10;
                        
             ;
             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
             run;
 
 

* Rename columns;
data production2;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

*clean up and filter;
proc sql;
create table production3 as
select *
from production2 
where ENTITY not in ('Entity', 'World') and not missing(Code) 
having year=max(year);
quit;




* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
	by Entity Year;
	var Crustaceans--Other_Marine;
run;

@Indescribled wrote:

I am working on the Tidy Tuesday data for this week about seafood. 

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;


* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;

Example of what I am trying to do, but this is from R. 

Indescribled_0-1634234796391.png

 


 

Reeza
Super User
One big issue - you use a single hyphen when you need double hyphen.

Crustaceans-Other_Marine would reference Crustaceans1-???A single hyphen relies on the index. If you're relying on positions of the columns then you need to use a double hyphen instead: Crustaceans -- Other_Marine;
ballardw
Super User

Am I the only one curious about why random numerals in the R output have underlined (partial) values?

Indescribled
Obsidian | Level 7
That is a default feature of the Tidyverse R package. The underlining is to help identify big numbers I think, it ends where a comma would.
Tom
Super User Tom
Super User

Don't use PROC IMPORT to read a TEXT file, especially if you already know what is in the file.

I wouldn't worry about eliminating entities or years (SAS is not R, you don't have to put all your data into memory).

data wide ;
  infile test1234 dsd truncover firstobs=2 ;
  input entity :$40. code :$8. year
        Pelagic Crustaceans Cephalopods Demersal Freshwater Molluscs Other_Marine
  ;
run;
proc transpose data=wide name=seafood out=want(rename=(col1=production));
  by entity code year;
run;

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1840 views
  • 5 likes
  • 4 in conversation