BookmarkSubscribeRSS Feed
AbuChowdhury
Fluorite | Level 6

I need to copy one column from one dataset to another dataset. Moreover, I need to do the same task many times (say I need to copy column1 from datase_1 to dataset_a, column1 from dataset_2 to dataset_b, and so on. How can I do this task? Should I use macro such as %macro, %copy etc?

Further, I need to subtract that copied column from every variable in the dataset. Say, dataset_a has variables: date, Firm1, Firm2,  … Firm100. Now column1 from dataset_1 will be copied to dataset_a. So dataset_a will look like: Date, Firm1, Firm2, … Firm100, column1. Now column1 will have to be subtracted from Firm1, Firm2 , … Firm100. In this way, column1 from dataset_2 will have to copied to dataset_b and that copied variable will have to be subtracted from other variables in dataset_b.

28 REPLIES 28
Amir
PROC Star

Hi,

Does the column you want to copy always have the same number of observations as the data set you want to copy to. If not then you'll need to specify what should happen when there is a difference in observation count.

A simple example of a macro where the observations are the same would be:

data class(rename=(age=years));

  set sashelp.class;

run;

%macro addcol(ds1,col,ds2,newds);

  data &newds;

    set &ds2;

    set &ds1(keep=&col);

  run;

%mend addcol;

%addcol(class,years,sashelp.class,class2);

Regards,

Amir.

Tom
Super User Tom
Super User

"Copy a column" is not a normal data management activity.  From your later description is looks like you want to merge on a new variable based on values of your key variable DATE.

data want ;

   merge old new (keep=date diff) ;

   by date;

   array firms firm1-firm100 ;

   do over firms; firms=firms - diff; end;

run;

AbuChowdhury
Fluorite | Level 6

It's not working. Let me explain again. From the previous code (you gave me) I select the variables and then performed the equations. The last equation is ret(i) = ……… which generates returns for each variable such as ret1 ret2 etc. Now I want to subtract each of these returns from average_ret. This average_ret is in another dataset. How can I do that?


Further, when I subtract ret1 - average_ret, ret2 - average_ret etc. then I cannot identify the main variables. I also need to identify the return difference for each main variable. How can I do that?


Again, this variable list is for one dataset. I want to do this task for multiple datasets. So do I need to run code for each dataset separately?




Below is your previous code that you gave me:


%let varlist = CA45245E1097 ... US98974P1003 ; /***I wrote all the variable names***/

%let nvar = %sysfunc(countw(&varlist,%str( )));

data P20040by01;

  set Benchmar.P20040by0 (keep = date &varlist);

*set Benchmar.P20040by0;

  array var (&nvar) &varlist;

  array rvar (&nvar) &varlist;

  array gvar (&nvar) &varlist;

  array lgvar (&nvar) &varlist;

  array ret (&nvar) &varlist;

  do i = 1 to &nvar;

    rvar(i) = (var(i+1)-lag(var(i+1)))/lag(var(i+1));

    gvar(i) = rvar(i)+1;

    lgvar(i) = lag(gvar(i));

  end;

  rgla = mean( of lgvar(*) );

  do i = 1 to &nvar;

    ret(i) = (rvar(i)*lgvar(i))/rgla;

  end;

run;

Tom
Super User Tom
Super User

You need to generate and post some example input data and the desired output data.  For example here is data for 2 dates and 3 firms.

data one ;

   informat date date9.;

   format date date9.;

   input date firm1-firm3 ;

cards;

01JAN2013 100 200 300

02JAN2013 101 210 330

run;

data two;

data one ;

   informat date date9.;

   format date date9.;

   input date diff ;

cards;

01JAN2013 10

02JAN2013 20

run;

data want ;

   informat date date9.;

   format date date9.;

   input date diff firm1-firm3 ;

cards;

01JAN2013 10 90 190 290

02JAN2013 20 81 190 310

run;

AbuChowdhury
Fluorite | Level 6

I am sorry I could not understand this. Is this for my previous question?

Tom
Super User Tom
Super User

Just post some examples so we can understand more clearly what you are talking about.

AbuChowdhury
Fluorite | Level 6

I am attaching two files: Benchmark and usa_firms. I need to subtract the last column (average_ret) of Benchmark table from all return columns (ret1, ret2, .... , ret61) of usa_firms.

By the by, can I rename those variables (ret1, ret2, etc) by firm identity (such as US1234456, there are 61 firm identities in usa_firms table. I got these ret1, ret2 …. variable names when I calculated the returns of those firms).

Tom
Super User Tom
Super User

So did you try the program I posted?

I notice that you do not seem to have any variable to uniquely identify the rows in the datasets.  This should be corrected as you will have a hard time making sure that you are matching the rows without that.

So I took just the first 6 rows and two firm values so I can show you how this can be done. I just took the first two columns, I am not sure if that is what you are calling RET or not, but you should be able to adjust to use the variable names of your columns.

So here are two simple data steps to setup the sample datasets. BENCHMARK will have the AVERAGE_RET variable that you want to merge onto and then subtract. and USA_FIRMS will have the values that you will be subtracting from.  I had the data steps define a ROWID variable so that I can make sure to match the rows properly.  Also I renamed the DATE variables so I can compare and make sure that the dates from the two different files have not gotten out of line.  Also I set the format on the date variables to yymmdd because it leads to less confusion about what is the month and what is the day, also it will sort properly if it is treated as text.

data benchmark;

  informat date1 yymmdd10. ;

  format date1 yymmdd10.;

  rowid+1;

  input date1 average_ret ;

cards4;

. .

. .

2004/01/01 0

2004/02/01 0.0336863483

2004/05/01 0.0416289109

2004/06/01 -0.001851738

;;;;

data usa_firms;

  informat date2 yymmdd10. ;

  format date2 yymmdd10.;

  rowid+1;

  input date2 firm1 firm2;

cards4;

. 15.4467 7.82

. 15.06 7.76

2004/01/01 15.06 7.76

2004/02/01 14.9867 7.91

2004/05/01 14.6667 8.05

2004/06/01 13.2527 8.09

;;;;

Now you just need to merge the two datasets so that the variables you want to work with are in the same data vector.

I had it create new variables to old the subtracted values. But you could store the values back into the original variables.

Also I used the DO OVER construct for working with the arrrays.  It makes typing the names easier to not have to add the indexes. 

data want ;

  merge benchmark usa_firms ;

  by rowid;

  if date1 ne date2 then put 'ERROR: DATE MISMATCH. ' date1= date2=;

  else do;

    array firms firm1-firm2 ;

    array new   new1-new2 ;

    do over firms;

      new=firms-average_ret;

    end;

  end;

run;

So lets look at the results and see if it is what you expected.

data _null_;

  set want ;

  put rowid date1 new1 new2 ;

run;


1 . . .

2 . . .

3 2004-01-01 15.06 7.76

4 2004-02-01 14.953013652 7.8763136517

5 2004-05-01 14.625071089 8.0083710891

6 2004-06-01 13.254551738 8.091851738


AbuChowdhury
Fluorite | Level 6

How did you write the followings in the datastep?

. .

. .

2004/01/01 0

2004/02/01 0.0336863483

2004/05/01 0.0416289109

2004/06/01 -0.001851738

Tom
Super User Tom
Super User

Copy and pasted it from the dataset you posted. I opened the data set with SAS System Viewer, highlighted the first 6 values of the variable and hit control-C for copy, went to the program editor and hit control-V for paste.  I then typed in the dates by hand.

But you don't need to type the data as you already have it in a dataset.

AbuChowdhury
Fluorite | Level 6

Do I need to install the SAS System Viewer?

You copied only first 6 values, but If I do this for the whole dataset then there will be 264 values. Isn't it too much?

Tom
Super User Tom
Super User

I am not being very clear. The ONLY purpose of the simple data steps I posted was have some common example data that everyone reading the thread could see and play with.  You already have data in a datesets, you do NOT need to copy and paste it!!!

For you to generate the same output I generated starting with the two datasets you posted you could run this program.  The dataset options OBS=6 will limit it just using the first 6 rows so that it will look just like what I posted.  Of course in realitiy you wouldn't use that option since you want to operate on all of the rows in your datasets.

data both ;

  merge usa_firms(obs=6 keep=date US4583341098 US09061G1013)

        benchmark(obs=6 keep=average_ret)

  ;

  * NO BY STATEMENT since you do not have any id variable to use to merge ;

  array firms US4583341098 US09061G1013 ;

  array new new1 - new2 ;

  do over firms;

     new = firms - average_ret ;

  end;

run;

AbuChowdhury
Fluorite | Level 6

It works now. But I had to change  the date format:

data benchmark;

  informat date1 yymmdd10. ;

  format date1 yymmdd10.;


Instead of the above code, I wrote:

data benchmark;

  informat date1 ddmmyy10. ;

  format date1 ddmmyy10.;


since the date in my dataset was like dd/mm/2004. Am I right?


Your latest codes also work fine.


Thank you very much.

Tom
Super User Tom
Super User

Yes. The INFORMAT is what tells SAS how to convert text values.  MMDDYY will read strings in month day year order .

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 28 replies
  • 11626 views
  • 0 likes
  • 3 in conversation