BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DocStocks
Calcite | Level 5

Hi. I'm fairly new to SAS, so any help or sample programs would be most appreciated. My data currently is organized as follows:

http://dl.dropbox.com/u/1097617/SAS%20Help/Capture.PNGcontinues.....

Every two columns goes together. The first column is the date (duh). The first ob of the second column is the identifier. The remaining obs under the indentifier are the variables specific to that date and that identifier. I need the data to look like this:

http://dl.dropbox.com/u/1097617/SAS%20Help/Capture2.PNG

Note each ID, does NOT have the same number of date/var combinations. I know how to do this in Excel, but even with a macro it would take me entirely too long. I'm sure a short SAS program could knock this out. Again, any help is most appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

No guarantee, but I think that the following does what you want.  I added an extra column, namely recnum, so that you can distinguish where each record ends and the next one starts:

filename in 'c:\art\cvx_1.csv';                                                                                                  

data want;

  array names (5000) $20 _temporary_ ;

  length garbage name $20 var1 var2 $20 ;

  format recnum best12.;

  format id $20.;

  format date mmddyy10.;

  format var best12.;

  infile in dsd lrecl=60000 truncover ;

  if _n_=1 then do ;

    do ncols=1 by 1 until (name='');

      input garbage name @ ;

      if name ne '' then names(ncols)=name;

    end;

    input;

    ncols=ncols-1;

  end;

  else do;

    recnum+1;

    do i=1 to ncols;

      input var1 var2 @ ;

      date=input(var1, ?? mmddyy10.);

      var=input(var2, ?? best12.);

      id=names(i);

      output ;

    end;

  end;

  input;

  keep recnum id date var;

  retain ncols names;

run;

View solution in original post

16 REPLIES 16
scmebu
SAS Employee

Your data may be organized like that but that looks like a spreadsheet and not a SAS data set.  In order to transpose in SAS, you'll need your data in a SAS data set.  If you've already got it within a SAS data set, can you post the output of PROC CONTENTS showing the variable names, types, and lengths?

DocStocks
Calcite | Level 5

Will do. Installing SAS now on my new PC.

art297
Opal | Level 21

If the file is indeed an xl file, then you might be able to use something like the following.  Basically, the code is merely importing the excel file, converting it to a comma delimited file, inputting the csv file to get the ids into macro variables, then using the macro variables to set the ids in another datastep that only reads the data.  The proc sort, at the end, is only to put the data in a reasonable to view order:

PROC IMPORT OUT= WORK.xlin

            DATAFILE= "C:\myxlfile.xls"

            DBMS=EXCEL REPLACE;

     GETNAMES=NO;

     MIXED=YES;

     SCANTEXT=YES;

     USEDATE=YES;

     SCANTIME=YES;

RUN;

 

PROC EXPORT DATA= WORK.xlin

            OUTFILE= "C:\xltemp.csv"

            DBMS=csv REPLACE;

RUN;

 

data _null_;

  infile "c:\xltemp.csv" dsd delimiter="," firstobs=2;

  input;

  if _n_ eq 1 then do;

    call symput('ID1',scan(_infile_,1,','));

    call symput('ID2',scan(_infile_,2,','));

    call symput('ID3',scan(_infile_,3,','));

  end;

  stop;

run;

data want (keep=id date var);

  infile "c:\xltemp.csv" dsd delimiter="," firstobs=3 truncover;

  length id $8;

  format date date9.;

  format var best12.;

  informat date1 date9.;

  informat var1 best12.;

  informat date2 date9.;

  informat var2 best12.;

  informat date3 date9.;

  informat var3 best12.;

  format var3 best12.;

  input date1 var1 date2 var2 date3 var3;

  if not missing(date1) then do;

    id="&id1.";

    date=date1;

    var=var1;

    output;

  end;

  if not missing(date2) then do;

    id="&id2.";

    date=date2;

    var=var2;

    output;

  end;

  if not missing(date3) then do;

    id="&id3.";

    date=date3;

    var=var3;

    output;

  end;

run;

 

proc sort data=want;

  by id date;

run;

DocStocks
Calcite | Level 5

Thanks alot. I'll let you know if it works.

DocStocks
Calcite | Level 5

The code works beautifully. Thank you very much!

DocStocks
Calcite | Level 5

Unfortunately, my exuberance at the code working was premature. I went through today to make sure all the data had been reorganized and realized that the code above only transposed three observations with their associated dates and values. I will add that the source data is an xlsx not an xls file.

Troubleshooting steps so far:

1. Ran just the first two blocks of code and found that the .csv it creates only has 255 columnes, while the source data has well over 4000.

Below is a link to an exceprt of my data as a .csv and a zipped .csv.  Would have linked the SAS data but it is 140mb. Any further help would be appreciated. I will continue to troubleshoot code provided and if I find a solution I will update thread.

Data.csv (21 mb)              http://dl.dropbox.com/u/1097617/Data.csv

Zipped Data.csv (2 mb)     http://dl.dropbox.com/u/1097617/Data.zip

art297
Opal | Level 21

You obviously have an xlsx file, thus your proc import has to use the correct engine for reading such a file.  The correct engine will depend upon whether you are on a 32 or 64 bit system.  Of course, if the attached csv file is what your real data look like, you could just import it as a csv file.

Originally, you indicated that you only had 3 variables, thus the code was only written for three variables.

Given that there are approximately 4,000 variables, I would definitely suggest a different approach for creating the macro variable(s).

Thus, before anyone spends time trying to help, some answers from you would help:

1. are you on a 32 or 64 bit system?

2. is the raw data an xlsx file or a csv file

3. does it follow the same structure as the current file (i.e., where the variable names are on every other column of the second row)?

Tom
Super User Tom
Super User

CSV files are a little easier to deal with as you can just use normal input functions.

This program will just read your two variables as character strings (VAR1 and VAR2), but you can just change the input statement to read them as dates or numbers and give them more meaningful names.

The program counts the number of columns by reading the first line until it hits an empty string.

Then in reads the next row into an array so that the names are available when reading the data.

It does not output the empty cells. You might want to also not output the cells where the first variable is '#NA #NA' and the second is empty.

You might want to add a PROC SORT to get the data back into a more reasonable order.

filename in 'c:\downloads\Data.csv';                                                                                                   

data out ;                                                                                                                             

  array names (5000) $20 _temporary_ ;                                                                                                 

  length name $20 var1 var2 $20 ;                                                                                                      

  infile in dsd lrecl=300000 truncover ;                                                                                               

  if _n_=1 then do ;                                                                                                                   

     do ncols=1 by 1 until(name=' ');                                                                                                  

       input name @ ;                                                                                                                  

     end;                                                                                                                              

     input ;                                                                                                                           

     do i=1 to ncols/2 ;                                                                                                               

       input name name @ ;                                                                                                             

       names(i)=name;                                                                                                                  

     end;                                                                                                                              

     input ;                                                                                                                           

  end;                                                                                                                                 

  do i=1 to ncols/2 ;                                                                                                                  

    input var1 var2 @ ;                                                                                                                

    if not ( missing(var1) and missing(var2) ) then do ;                                                                               

       name = names(i);                                                                                                                

       output ;                                                                                                                        

    end;                                                                                                                               

  end;                                                                                                                                 

  keep name var1 var2 ;                                                                                                                

  retain ncols;                                                                                                                        

run;                 

art297
Opal | Level 21

Tom,

I like your coding style and it looks familiar, thus I have to ask: does your last name happen to start with "Ab"?

Art

DocStocks
Calcite | Level 5

art297

My apologies for my original post lacking all the needed info. The "continues..." after the first image is supposed to indicate the colums continue. The data in the linked CSV is structured exactly as in the XLSX. I'm fine using the CSV but just fyi my system is 64 bit Windows 7, running SAS 9.2 32 bit.

Tom

Thanks a million for the code! I'll let you know if it works after I test it.

DocStocks
Calcite | Level 5

Okay, as I said, the code works. But I have to do some post transpose formatting to get the date variable to act like a date. Trying to modify code so it will do this in one step, but I have to revert to importing date as character value. Here is modified code. Also for some reason, this code does not like using a CSV file created within Excel. This is reason for first two blocks of code.

PROC IMPORT OUT= WORK.TEMP

            DATAFILE= "C:\Users\Matt\Desktop\Research Projects\Bond Yiel

ds, Governance, Default Risk\Excel Data\CVX_1.csv"

            DBMS=CSV REPLACE;

     GETNAMES=NO;

     DATAROW=1;

RUN;

PROC EXPORT DATA= WORK.TEMP

            OUTFILE= "C:\Users\Matt\Desktop\Research Projects\Bond Yield

s, Governance, Default Risk\Excel Data\CVX_1a.csv"

            DBMS=CSV LABEL REPLACE;

     PUTNAMES=YES;

RUN;

filename in 'C:\Users\Matt\Desktop\Research Projects\Bond Yields, Governance, Default Risk\excel data\cvx_1a.csv';                                                                                                  

data out ;                                                                                                                            

  array names (5000) $20 _temporary_ ;                                                                                                

  length name $20 Date $20 Cvx 3.2;                                                                                                     

  infile in dsd lrecl=300000 truncover ;                                                                                              

  if _n_=1 then do ;                                                                                                                  

     do ncols=1 by 1 until(name=' ');                                                                                                 

       input name @ ;                                                                                                                 

     end;                                                                                                                             

     input ;                                                                                                                          

     do i=1 to ncols/2 ;                                                                                                              

       input name name @ ;                                                                                                            

       names(i)=name;                                                                                                                 

     end;                                                                                                                             

     input ;                                                                                                                          

  end;                                                                                                                                

  do i=1 to ncols/2 ;                                                                                                                 

    input Date Cvx @ ;                                                                                                               

    if not ( missing(Date) and missing(Cvx) ) then do ;                                                                              

       name = names(i);                                                                                                               

       output ;                                                                                                                       

    end;                                                                                                                              

  end;                                                                                                                                

  keep name Date cvx ;                                                                                                               

  retain ncols;                                                                                                                       

run;    

proc sort data=out out=Import;

          by name date;

run;

art297
Opal | Level 21

It would help if you pasted a copy of the csv file and how you want it to appear.  If the date is really importing and exporting as a character variable, creating a new numeric variable with a date format is usually a pretty trivial matter.

DocStocks
Calcite | Level 5

I agree it should be trivial, but my SASfu is weak at this point.

Here is the CSV.

First col should be name and formatted as character. Second col should be Date, and formatted as mm/dd/yyyy. Third column is CVX and should be numeric.

art297
Opal | Level 21

No guarantee, but I think that the following does what you want.  I added an extra column, namely recnum, so that you can distinguish where each record ends and the next one starts:

filename in 'c:\art\cvx_1.csv';                                                                                                  

data want;

  array names (5000) $20 _temporary_ ;

  length garbage name $20 var1 var2 $20 ;

  format recnum best12.;

  format id $20.;

  format date mmddyy10.;

  format var best12.;

  infile in dsd lrecl=60000 truncover ;

  if _n_=1 then do ;

    do ncols=1 by 1 until (name='');

      input garbage name @ ;

      if name ne '' then names(ncols)=name;

    end;

    input;

    ncols=ncols-1;

  end;

  else do;

    recnum+1;

    do i=1 to ncols;

      input var1 var2 @ ;

      date=input(var1, ?? mmddyy10.);

      var=input(var2, ?? best12.);

      id=names(i);

      output ;

    end;

  end;

  input;

  keep recnum id date var;

  retain ncols names;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1831 views
  • 1 like
  • 5 in conversation