Solved: Re: Moving rows to columns

kevinmc87 · Posted 07-27-2016 02:34 PM

STATE YEAR RACE F_0 -- F_2000
CA    2010   W
CA    2010   B
CA    2010   H

Hi,

I'm trying to tranpose this data where I essentially move these 2nd and 3rd rows as columns, where I have attached B and H as a part of the variable names from F_0 -- F_2000. I have tried messing around with PROC TRANSPOSE, but this doesn't seem to work for me. I'm not sure if it does, since I am not actually truly transposing the data, as I want to keep F_0--F_2000 as columns. I want something that looks like this:

STATE YEAR RACE F_0 -- F_2000 B_F_0-- B_F_2000 H_F_0--H_F_2000
CA 2010 -

I'm able to move my data around using this code below, but it requires me having all of my variables coded as either character or numeric, and I'm wondering if there is a simpler, more elegant solution.

DATA WANT;
	SET HAVE NOBS=NOBS;
	ARRAY ALL [3,2003] $ A1-A6009;
	ARRAY VARS [*] $ STATE--F_2000;
	RETAIN A1-A6009;
	I+1;
	DO J=1 TO 2003;
	ALL(I,J)=VARS(J);
	END;
	IF _N_=NOBS THEN OUTPUT;
RUN;

PGStats · Posted 07-27-2016 11:37 PM

Nice idea @Ksharp. But why not read dataset have directly? That way the order of race does not matter

data work.have;
	length STATE $ 2 YEAR 8 RACE $ 1 F_0-F_10 8;	
	input State Year Race F_0-F_10;
	datalines;
CA 2010 W 10 11 12 13 14 15 16 17 18 19 20
CA 2010 B 30 31 32 33 34 35 36 37 38 39 40
CA 2010 H 50 51 52 53 54 55 56 57 58 59 60
CA 2011 W 44 44 44 44 44 44 44 44 44 44 44
CA 2011 B 22 22 22 22 22 22 22 22 22 22 22
CA 2011 H 88 88 88 88 88 88 88 88 88 88 88
;

proc sql;
select distinct catt('have(where=(race="',race,'")
 rename=(F_0-F_10=',race,'_F_0-',race,'_F_10))') 
  into : list separated by ' '
   from have;
quit;

data want;
 merge &list;
 by State Year;
 drop race;
run;

PG

View solution in original post

Reeza · Posted 07-27-2016 03:56 PM

I think your code is about as efficient as you're going to get.

But, are you actually trying to create a dataset that's a single line?

error_prone · Posted 07-27-2016 03:59 PM

Why do you need all those columns in one row?

Some parts of the data-step could be generated by reading the required information from HAVE and sashelp.vcolumns, but the code will most likely harder to read.

data work.have;
	length STATE $ 2 YEAR 8 RACE $ 1 F_0-F_10 8;
	
	input State Year Race F_0-F_10;
	datalines;
CA 2010 W 10 11 12 13 14 15 16 17 18 19 20
CA 2010 B 30 31 32 33 34 35 36 37 38 39 40
CA 2010 H 50 51 52 53 54 55 56 57 58 59 60
CA 2011 W 44 44 44 44 44 44 44 44 44 44 44
CA 2011 B 22 22 22 22 22 22 22 22 22 22 22
CA 2011 H 88 88 88 88 88 88 88 88 88 88 88
;
run;

proc print;run;

%macro transposer;
	proc sql noprint;
		select distinct Race 
			into :RaceList separated by " "
			from work.have;
			
		select distinct cats(Race, "_:")
			into :RaceRetain separated by " "
			from work.have
		;
			
		select cats("#_", Name)
			into :Values separated by " "
			from sashelp.vcolumn
				where libname = "WORK" and memname = "HAVE" 
					and Name like "F^_%" escape "^"
		;
	quit;

	data work.want;
		set work.have;
		by State Year;
		
		/* Caution: the sysfunc-line must not end with ; */
		array target
		%do i = 1 %to %sysfunc(countw(&RaceList));
			%sysfunc(translate(&Values, %scan(&RaceList, &i), %str(#)))
		%end;
		; /* for array target */
		
		array ori F_:;
		
		retain raceCount &raceRetain;
		
		/* Reseting retained variables */
		if first.Year then do;
			do i = 1 to dim(target);
				target[i] = .;
			end;
			raceCount = 0;
		end;
		
		raceCount = raceCount + 1;
		
		/* copy the values */
		do i = 1 to dim(ori);
			target[i+((raceCount-1)*dim(ori))] = ori[i];
		end;
	
		/* output after last Race of State/Year has been processed. */
		if last.Year then output;
		
		drop Race F_: i raceCount;
	run;
	

%mend;

%transposer;

proc print;run;

kevinmc87 · Posted 07-27-2016 04:05 PM

I'm moving everything to one row because of dataset I am working with that has the data constructed in a similar fashion, and I need to merge the datasets. I guess I change the data structure from the other dataset to match this one, and that would probably make things easier.

Reeza · Posted 07-27-2016 04:17 PM

I would consider what I need to do after and decide what structure will best facilitate that analysis and then move towards that. If you post some of the questions you're trying to answer people can make suggestions. And if it turns out a different method works, the beauty of code is you can change it fairly easily 🙂

Ksharp · Posted 07-27-2016 10:35 PM

Check the merge skill proposed by Me,Matt, Arthur.T :

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf




data work.have;
	length STATE $ 2 YEAR 8 RACE $ 1 F_0-F_10 8;	
	input State Year Race F_0-F_10;
	datalines;
CA 2010 W 10 11 12 13 14 15 16 17 18 19 20
CA 2010 B 30 31 32 33 34 35 36 37 38 39 40
CA 2010 H 50 51 52 53 54 55 56 57 58 59 60
CA 2011 W 44 44 44 44 44 44 44 44 44 44 44
CA 2011 B 22 22 22 22 22 22 22 22 22 22 22
CA 2011 H 88 88 88 88 88 88 88 88 88 88 88
;
run;

data temp(index=(x=(State Year)));
 set have;
 by State Year;
 if first.year then n=0;
 n+1;
run;
proc sql;
select distinct catt('temp(where=(n=',n,')
 rename=(F_0-F_10=',race,'_F_0-',race,'_F_10))') 
  into : list separated by ' '
   from temp;
quit;
data want;
 merge &list;
 by State Year;
 drop n race;
run;

PGStats · Posted 07-27-2016 11:37 PM

Nice idea @Ksharp. But why not read dataset have directly? That way the order of race does not matter

data work.have;
	length STATE $ 2 YEAR 8 RACE $ 1 F_0-F_10 8;	
	input State Year Race F_0-F_10;
	datalines;
CA 2010 W 10 11 12 13 14 15 16 17 18 19 20
CA 2010 B 30 31 32 33 34 35 36 37 38 39 40
CA 2010 H 50 51 52 53 54 55 56 57 58 59 60
CA 2011 W 44 44 44 44 44 44 44 44 44 44 44
CA 2011 B 22 22 22 22 22 22 22 22 22 22 22
CA 2011 H 88 88 88 88 88 88 88 88 88 88 88
;

proc sql;
select distinct catt('have(where=(race="',race,'")
 rename=(F_0-F_10=',race,'_F_0-',race,'_F_10))') 
  into : list separated by ' '
   from have;
quit;

data want;
 merge &list;
 by State Year;
 drop race;
run;

PG

Ksharp · Posted 07-27-2016 11:50 PM

@PG,
yeah, But if there are many group variables , that would make that macro variable very very longer and over the limit of 
the max length of macro variable. and create many sub-tables that I would like to avoid to .

Ksharp · Posted 07-28-2016 03:33 AM

@PG,
You are right. My code wouldn't work if there are some missing levels in a group.
Your code is right . 
Thanks.

kevinmc87 · Posted 07-28-2016 05:17 PM

Thanks everyone!

rogerjdeangelis · Posted 08-08-2016 07:20 PM

I poster a WPS/SAS/R solution at

https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;5cb9dbe4.1608b

PGStats · Posted 08-08-2016 10:42 PM

@rogerjdeangelis The listserv requires a login. Please copy the SAS solution here.

PG

Registration is open

SAS Training: Just a Click Away