Help using Base SAS procedures

Help request for "complex" transposition

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Help request for "complex" transposition

Hi. I'm fairly new to SAS, so any help or sample programs would be most appreciated. My data currently is organized as follows:

http://dl.dropbox.com/u/1097617/SAS%20Help/Capture.PNGcontinues.....

Every two columns goes together. The first column is the date (duh). The first ob of the second column is the identifier. The remaining obs under the indentifier are the variables specific to that date and that identifier. I need the data to look like this:

http://dl.dropbox.com/u/1097617/SAS%20Help/Capture2.PNG

Note each ID, does NOT have the same number of date/var combinations. I know how to do this in Excel, but even with a macro it would take me entirely too long. I'm sure a short SAS program could knock this out. Again, any help is most appreciated.


Accepted Solutions
Solution
‎08-01-2011 11:10 PM
PROC Star
Posts: 7,492

Re: Help request for "complex" transposition

Posted in reply to DocStocks

No guarantee, but I think that the following does what you want.  I added an extra column, namely recnum, so that you can distinguish where each record ends and the next one starts:

filename in 'c:\art\cvx_1.csv';                                                                                                  

data want;

  array names (5000) $20 _temporary_ ;

  length garbage name $20 var1 var2 $20 ;

  format recnum best12.;

  format id $20.;

  format date mmddyy10.;

  format var best12.;

  infile in dsd lrecl=60000 truncover ;

  if _n_=1 then do ;

    do ncols=1 by 1 until (name='');

      input garbage name @ ;

      if name ne '' then names(ncols)=name;

    end;

    input;

    ncols=ncols-1;

  end;

  else do;

    recnum+1;

    do i=1 to ncols;

      input var1 var2 @ ;

      date=input(var1, ?? mmddyy10.);

      var=input(var2, ?? best12.);

      id=names(i);

      output ;

    end;

  end;

  input;

  keep recnum id date var;

  retain ncols names;

run;

View solution in original post


All Replies
SAS Employee
Posts: 17

Help request for "complex" transposition

Posted in reply to DocStocks

Your data may be organized like that but that looks like a spreadsheet and not a SAS data set.  In order to transpose in SAS, you'll need your data in a SAS data set.  If you've already got it within a SAS data set, can you post the output of PROC CONTENTS showing the variable names, types, and lengths?

Occasional Contributor
Posts: 16

Re: Help request for "complex" transposition

Will do. Installing SAS now on my new PC.

PROC Star
Posts: 7,492

Re: Help request for "complex" transposition

Posted in reply to DocStocks

If the file is indeed an xl file, then you might be able to use something like the following.  Basically, the code is merely importing the excel file, converting it to a comma delimited file, inputting the csv file to get the ids into macro variables, then using the macro variables to set the ids in another datastep that only reads the data.  The proc sort, at the end, is only to put the data in a reasonable to view order:

PROC IMPORT OUT= WORK.xlin

            DATAFILE= "C:\myxlfile.xls"

            DBMS=EXCEL REPLACE;

     GETNAMES=NO;

     MIXED=YES;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

RUN;

 

PROC EXPORT DATA= WORK.xlin

            OUTFILE= "C:\xltemp.csv"

            DBMS=csv REPLACE;

RUN;

 

data _null_;

  infile "c:\xltemp.csv" dsd delimiter="," firstobs=2;

  input;

  if _n_ eq 1 then do;

    call symput('ID1',scan(_infile_,1,','));

    call symput('ID2',scan(_infile_,2,','));

    call symput('ID3',scan(_infile_,3,','));

  end;

  stop;

run;

data want (keep=id date var);

  infile "c:\xltemp.csv" dsd delimiter="," firstobs=3 truncover;

  length id $8;

  format date date9.;

  format var best12.;

  informat date1 date9.;

  informat var1 best12.;

  informat date2 date9.;

  informat var2 best12.;

  informat date3 date9.;

  informat var3 best12.;

  format var3 best12.;

  input date1 var1 date2 var2 date3 var3;

  if not missing(date1) then do;

    id="&id1.";

    date=date1;

    var=var1;

    output;

  end;

  if not missing(date2) then do;

    id="&id2.";

    date=date2;

    var=var2;

    output;

  end;

  if not missing(date3) then do;

    id="&id3.";

    date=date3;

    var=var3;

    output;

  end;

run;

 

proc sort data=want;

  by id date;

run;

Occasional Contributor
Posts: 16

Re: Help request for "complex" transposition

Thanks alot. I'll let you know if it works.

Occasional Contributor
Posts: 16

Re: Help request for "complex" transposition

The code works beautifully. Thank you very much!

Occasional Contributor
Posts: 16

Re: Help request for "complex" transposition

Posted in reply to DocStocks

Unfortunately, my exuberance at the code working was premature. I went through today to make sure all the data had been reorganized and realized that the code above only transposed three observations with their associated dates and values. I will add that the source data is an xlsx not an xls file.

Troubleshooting steps so far:

1. Ran just the first two blocks of code and found that the .csv it creates only has 255 columnes, while the source data has well over 4000.

Below is a link to an exceprt of my data as a .csv and a zipped .csv.  Would have linked the SAS data but it is 140mb. Any further help would be appreciated. I will continue to troubleshoot code provided and if I find a solution I will update thread.

Data.csv (21 mb)              http://dl.dropbox.com/u/1097617/Data.csv

Zipped Data.csv (2 mb)     http://dl.dropbox.com/u/1097617/Data.zip

PROC Star
Posts: 7,492

Re: Help request for "complex" transposition

Posted in reply to DocStocks

You obviously have an xlsx file, thus your proc import has to use the correct engine for reading such a file.  The correct engine will depend upon whether you are on a 32 or 64 bit system.  Of course, if the attached csv file is what your real data look like, you could just import it as a csv file.

Originally, you indicated that you only had 3 variables, thus the code was only written for three variables.

Given that there are approximately 4,000 variables, I would definitely suggest a different approach for creating the macro variable(s).

Thus, before anyone spends time trying to help, some answers from you would help:

1. are you on a 32 or 64 bit system?

2. is the raw data an xlsx file or a csv file

3. does it follow the same structure as the current file (i.e., where the variable names are on every other column of the second row)?

Super User
Super User
Posts: 7,076

Re: Help request for "complex" transposition

Posted in reply to DocStocks

CSV files are a little easier to deal with as you can just use normal input functions.

This program will just read your two variables as character strings (VAR1 and VAR2), but you can just change the input statement to read them as dates or numbers and give them more meaningful names.

The program counts the number of columns by reading the first line until it hits an empty string.

Then in reads the next row into an array so that the names are available when reading the data.

It does not output the empty cells. You might want to also not output the cells where the first variable is '#NA #NA' and the second is empty.

You might want to add a PROC SORT to get the data back into a more reasonable order.

filename in 'c:\downloads\Data.csv';                                                                                                   

data out ;                                                                                                                             

  array names (5000) $20 _temporary_ ;                                                                                                 

  length name $20 var1 var2 $20 ;                                                                                                      

  infile in dsd lrecl=300000 truncover ;                                                                                               

  if _n_=1 then do ;                                                                                                                   

     do ncols=1 by 1 until(name=' ');                                                                                                  

       input name @ ;                                                                                                                  

     end;                                                                                                                              

     input ;                                                                                                                           

     do i=1 to ncols/2 ;                                                                                                               

       input name name @ ;                                                                                                             

       names(i)=name;                                                                                                                  

     end;                                                                                                                              

     input ;                                                                                                                           

  end;                                                                                                                                 

  do i=1 to ncols/2 ;                                                                                                                  

    input var1 var2 @ ;                                                                                                                

    if not ( missing(var1) and missing(var2) ) then do ;                                                                               

       name = names(i);                                                                                                                

       output ;                                                                                                                        

    end;                                                                                                                               

  end;                                                                                                                                 

  keep name var1 var2 ;                                                                                                                

  retain ncols;                                                                                                                        

run;                 

PROC Star
Posts: 7,492

Re: Help request for "complex" transposition

Tom,

I like your coding style and it looks familiar, thus I have to ask: does your last name happen to start with "Ab"?

Art

Occasional Contributor
Posts: 16

Help request for "complex" transposition

Posted in reply to DocStocks

art297

My apologies for my original post lacking all the needed info. The "continues..." after the first image is supposed to indicate the colums continue. The data in the linked CSV is structured exactly as in the XLSX. I'm fine using the CSV but just fyi my system is 64 bit Windows 7, running SAS 9.2 32 bit.

Tom

Thanks a million for the code! I'll let you know if it works after I test it.

Occasional Contributor
Posts: 16

Re: Help request for "complex" transposition

Posted in reply to DocStocks

Okay, as I said, the code works. But I have to do some post transpose formatting to get the date variable to act like a date. Trying to modify code so it will do this in one step, but I have to revert to importing date as character value. Here is modified code. Also for some reason, this code does not like using a CSV file created within Excel. This is reason for first two blocks of code.

PROC IMPORT OUT= WORK.TEMP

            DATAFILE= "C:\Users\Matt\Desktop\Research Projects\Bond Yiel

ds, Governance, Default Risk\Excel Data\CVX_1.csv"

            DBMS=CSV REPLACE;

     GETNAMES=NO;

     DATAROW=1;

RUN;

PROC EXPORT DATA= WORK.TEMP

            OUTFILE= "C:\Users\Matt\Desktop\Research Projects\Bond Yield

s, Governance, Default Risk\Excel Data\CVX_1a.csv"

            DBMS=CSV LABEL REPLACE;

     PUTNAMES=YES;

RUN;

filename in 'C:\Users\Matt\Desktop\Research Projects\Bond Yields, Governance, Default Risk\excel data\cvx_1a.csv';                                                                                                  

data out ;                                                                                                                            

  array names (5000) $20 _temporary_ ;                                                                                                

  length name $20 Date $20 Cvx 3.2;                                                                                                     

  infile in dsd lrecl=300000 truncover ;                                                                                              

  if _n_=1 then do ;                                                                                                                  

     do ncols=1 by 1 until(name=' ');                                                                                                 

       input name @ ;                                                                                                                 

     end;                                                                                                                             

     input ;                                                                                                                          

     do i=1 to ncols/2 ;                                                                                                              

       input name name @ ;                                                                                                            

       names(i)=name;                                                                                                                 

     end;                                                                                                                             

     input ;                                                                                                                          

  end;                                                                                                                                

  do i=1 to ncols/2 ;                                                                                                                 

    input Date Cvx @ ;                                                                                                               

    if not ( missing(Date) and missing(Cvx) ) then do ;                                                                              

       name = names(i);                                                                                                               

       output ;                                                                                                                       

    end;                                                                                                                              

  end;                                                                                                                                

  keep name Date cvx ;                                                                                                               

  retain ncols;                                                                                                                       

run;    

proc sort data=out out=Import;

          by name date;

run;

PROC Star
Posts: 7,492

Re: Help request for "complex" transposition

Posted in reply to DocStocks

It would help if you pasted a copy of the csv file and how you want it to appear.  If the date is really importing and exporting as a character variable, creating a new numeric variable with a date format is usually a pretty trivial matter.

Occasional Contributor
Posts: 16

Re: Help request for "complex" transposition

I agree it should be trivial, but my SASfu is weak at this point.

Here is the CSV.

First col should be name and formatted as character. Second col should be Date, and formatted as mm/dd/yyyy. Third column is CVX and should be numeric.

Solution
‎08-01-2011 11:10 PM
PROC Star
Posts: 7,492

Re: Help request for "complex" transposition

Posted in reply to DocStocks

No guarantee, but I think that the following does what you want.  I added an extra column, namely recnum, so that you can distinguish where each record ends and the next one starts:

filename in 'c:\art\cvx_1.csv';                                                                                                  

data want;

  array names (5000) $20 _temporary_ ;

  length garbage name $20 var1 var2 $20 ;

  format recnum best12.;

  format id $20.;

  format date mmddyy10.;

  format var best12.;

  infile in dsd lrecl=60000 truncover ;

  if _n_=1 then do ;

    do ncols=1 by 1 until (name='');

      input garbage name @ ;

      if name ne '' then names(ncols)=name;

    end;

    input;

    ncols=ncols-1;

  end;

  else do;

    recnum+1;

    do i=1 to ncols;

      input var1 var2 @ ;

      date=input(var1, ?? mmddyy10.);

      var=input(var2, ?? best12.);

      id=names(i);

      output ;

    end;

  end;

  input;

  keep recnum id date var;

  retain ncols names;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 373 views
  • 1 like
  • 5 in conversation