BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Indescribled
Obsidian | Level 7

I am working on the Tidy Tuesday data for this week about seafood. 

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;


* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;

Example of what I am trying to do, but this is from R. 

Indescribled_0-1634234796391.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.

Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,. 

 

This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step. 

You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you. 

 

Here's my version of your code, some modifications:

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

 data WORK.PRODUCTION    ;
             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
              infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
                 informat VAR1 $57. ;
                 informat VAR2 $8. ;
                 informat VAR3 $4. ;
                 informat VAR4-VAR10 8. ;
 
                 format VAR1 $57. ;
                 format VAR2 $8. ;
                 format VAR3 $4. ;
                 format VAR4-VAR10 8. ;

             input VAR1-VAR10;
                        
             ;
             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
             run;
 
 

* Rename columns;
data production2;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

*clean up and filter;
proc sql;
create table production3 as
select *
from production2 
where ENTITY not in ('Entity', 'World') and not missing(Code) 
having year=max(year);
quit;




* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
	by Entity Year;
	var Crustaceans--Other_Marine;
run;

@Indescribled wrote:

I am working on the Tidy Tuesday data for this week about seafood. 

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;


* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;

Example of what I am trying to do, but this is from R. 

Indescribled_0-1634234796391.png

 


 

View solution in original post

5 REPLIES 5
Reeza
Super User

There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.

Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,. 

 

This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step. 

You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you. 

 

Here's my version of your code, some modifications:

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

 data WORK.PRODUCTION    ;
             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
              infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
                 informat VAR1 $57. ;
                 informat VAR2 $8. ;
                 informat VAR3 $4. ;
                 informat VAR4-VAR10 8. ;
 
                 format VAR1 $57. ;
                 format VAR2 $8. ;
                 format VAR3 $4. ;
                 format VAR4-VAR10 8. ;

             input VAR1-VAR10;
                        
             ;
             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
             run;
 
 

* Rename columns;
data production2;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

*clean up and filter;
proc sql;
create table production3 as
select *
from production2 
where ENTITY not in ('Entity', 'World') and not missing(Code) 
having year=max(year);
quit;




* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
	by Entity Year;
	var Crustaceans--Other_Marine;
run;

@Indescribled wrote:

I am working on the Tidy Tuesday data for this week about seafood. 

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

 

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;


* ISSUES START BELOW HERE; * Now the data is ready to pivot. Seems like proc transpose does this; proc transpose data=production out=long_production; by Entity-Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Year; var Crustaceans-Other_Marine; run; * ERROR: CRUSTACEANS does not have a numeric suffix.; proc transpose data=production out=long_production; by Code; var Crustaceans-Other_Marine; run;

Example of what I am trying to do, but this is from R. 

Indescribled_0-1634234796391.png

 


 

Reeza
Super User
One big issue - you use a single hyphen when you need double hyphen.

Crustaceans-Other_Marine would reference Crustaceans1-???A single hyphen relies on the index. If you're relying on positions of the columns then you need to use a double hyphen instead: Crustaceans -- Other_Marine;
ballardw
Super User

Am I the only one curious about why random numerals in the R output have underlined (partial) values?

Indescribled
Obsidian | Level 7
That is a default feature of the Tidyverse R package. The underlining is to help identify big numbers I think, it ends where a comma would.
Tom
Super User Tom
Super User

Don't use PROC IMPORT to read a TEXT file, especially if you already know what is in the file.

I wouldn't worry about eliminating entities or years (SAS is not R, you don't have to put all your data into memory).

data wide ;
  infile test1234 dsd truncover firstobs=2 ;
  input entity :$40. code :$8. year
        Pelagic Crustaceans Cephalopods Demersal Freshwater Molluscs Other_Marine
  ;
run;
proc transpose data=wide name=seafood out=want(rename=(col1=production));
  by entity code year;
run;

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1911 views
  • 5 likes
  • 4 in conversation