Solved: Trouble with proc transpose / emulate pivot_longer from R

Indescribled · Posted 10-14-2021 02:10 PM

I am working on the Tidy Tuesday data for this week about seafood.

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;



* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production out=long_production;
	by Entity-Year;
	var Crustaceans-Other_Marine;
run;

* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
	by Year;
	var Crustaceans-Other_Marine;
run;

* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
	by Code;
	var Crustaceans-Other_Marine;
run;

Example of what I am trying to do, but this is from R.

Reeza · Posted 10-14-2021 02:55 PM

There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.

Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,.

This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step.

You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you.

Here's my version of your code, some modifications:

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

 data WORK.PRODUCTION    ;
             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
              infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
                 informat VAR1 $57. ;
                 informat VAR2 $8. ;
                 informat VAR3 $4. ;
                 informat VAR4-VAR10 8. ;
 
                 format VAR1 $57. ;
                 format VAR2 $8. ;
                 format VAR3 $4. ;
                 format VAR4-VAR10 8. ;

             input VAR1-VAR10;
                        
             ;
             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
             run;
 
 

* Rename columns;
data production2;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

*clean up and filter;
proc sql;
create table production3 as
select *
from production2 
where ENTITY not in ('Entity', 'World') and not missing(Code) 
having year=max(year);
quit;




* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
	by Entity Year;
	var Crustaceans--Other_Marine;
run;

@Indescribled wrote:

I am working on the Tidy Tuesday data for this week about seafood.

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;



* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production out=long_production;
	by Entity-Year;
	var Crustaceans-Other_Marine;
run;

* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
	by Year;
	var Crustaceans-Other_Marine;
run;

* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
	by Code;
	var Crustaceans-Other_Marine;
run;

Example of what I am trying to do, but this is from R.

View solution in original post

Reeza · Posted 10-14-2021 02:55 PM

There's one rule to follow to ensure these issues don't happen, in any language. Make sure your data is read in correctly.

Spending a few minutes ensuring your types and names are correct at the beginning saves you a lot of headaches in the long run,.

This can be simplified further - by naming the variables in the import directly instead of in a second step and including some of the filters in that step.

You could also likely read it directly into the last format from the CSV/TEXT file but I'll leave that as an exercise for you.

Here's my version of your code, some modifications:

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

 data WORK.PRODUCTION    ;
             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
              infile TEST1234 delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
                 informat VAR1 $57. ;
                 informat VAR2 $8. ;
                 informat VAR3 $4. ;
                 informat VAR4-VAR10 8. ;
 
                 format VAR1 $57. ;
                 format VAR2 $8. ;
                 format VAR3 $4. ;
                 format VAR4-VAR10 8. ;

             input VAR1-VAR10;
                        
             ;
             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
             run;
 
 

* Rename columns;
data production2;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

*clean up and filter;
proc sql;
create table production3 as
select *
from production2 
where ENTITY not in ('Entity', 'World') and not missing(Code) 
having year=max(year);
quit;




* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production3 out=long_production (rename = (_name_ = Seafood col1=Production));
	by Entity Year;
	var Crustaceans--Other_Marine;
run;

@Indescribled wrote:

I am working on the Tidy Tuesday data for this week about seafood.

Below is reproducible code. It works fine up until PROC TRANSPOSE, I am having trouble getting the syntax correct. Could anyone help me figure out the correct way to do this? I included a picture example from R at the end for what I am trying to do.

* Get data 1;
filename test1234 url "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/seafood-and-fish-production-thousand-tonnes.csv";

proc import out=production datafile=test1234 dbms=csv replace; 
	guessingrows = max; 
	getnames=no;
run;

* Delete row 1 because getnames=no put the column names in the first row;
proc sql;
	delete from production
	where VAR1 = "Entity";
quit;

* Rename columns;
data production;
   set production (rename=(Var1=Entity
   						   Var2=Code
   						   Var3=Year
   						   Var4=Pelagic
   						   Var5=Crustaceans
   						   Var6=Cephalopods
   						   Var7=Demersal
   						   Var8=Freshwater
   						   Var9=Molluscs
   						   Var10=Other_Marine));
run;

* Delete all rows that have blank Code. Could not figure out how to filter it in the data step with where;
proc sql;
	delete from production
	where Code = "";
quit;

* Filter Entity to remove World and set year to max. Would prefer to do something like Year = max(Year);
data production;
	set production;
	where Entity ^= 'World' AND Year = '2013';
run;



* ISSUES START BELOW HERE;
* Now the data is ready to pivot. Seems like proc transpose does this;
proc transpose data=production out=long_production;
	by Entity-Year;
	var Crustaceans-Other_Marine;
run;

* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
	by Year;
	var Crustaceans-Other_Marine;
run;

* ERROR: CRUSTACEANS does not have a numeric suffix.;
proc transpose data=production out=long_production;
	by Code;
	var Crustaceans-Other_Marine;
run;

Example of what I am trying to do, but this is from R.

Reeza · Posted 10-14-2021 02:58 PM

One big issue - you use a single hyphen when you need double hyphen.

Crustaceans-Other_Marine would reference Crustaceans1-???A single hyphen relies on the index. If you're relying on positions of the columns then you need to use a double hyphen instead: Crustaceans -- Other_Marine;

ballardw · Posted 10-14-2021 04:07 PM

Am I the only one curious about why random numerals in the R output have underlined (partial) values?

Indescribled · Posted 10-14-2021 06:05 PM

That is a default feature of the Tidyverse R package. The underlining is to help identify big numbers I think, it ends where a comma would.

Tom · Posted 10-14-2021 06:13 PM

Don't use PROC IMPORT to read a TEXT file, especially if you already know what is in the file.

I wouldn't worry about eliminating entities or years (SAS is not R, you don't have to put all your data into memory).

data wide ;
  infile test1234 dsd truncover firstobs=2 ;
  input entity :$40. code :$8. year
        Pelagic Crustaceans Cephalopods Demersal Freshwater Molluscs Other_Marine
  ;
run;
proc transpose data=wide name=seafood out=want(rename=(col1=production));
  by entity code year;
run;

Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Registration is open

Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Re: Trouble with proc transpose / emulate pivot_longer from R

Registration is open

SAS Training: Just a Click Away