Solved: Re: IML CREATE overwrites variables

rfrancis · Posted 05-07-2020 11:15 AM

Hello, I am using IML (inside of a macro) which begins by reading the contents of a work file containing five variables into a matrix (XY). I then create new variables, which I eventually wish to merge (append) as columns (vectors) to the original work file. With the CREATE statement, I am able to write the new variables to a file. However, this overwrites the contents of the original four variables (i.e., now have missing values).. Omitting the new variables from the CREATE statement allows the original variables to retain the proper values. Other posts in this forum suggest merging the original dataset with a dataset containing the new variables. The problem here is creating a BY variable for the new dataset. Guess the question how do I create a BY variable that maintains its integrity? Code is below, with a brief amount of sample data. Thank you for any suggestions. Rick

data untrimmed;
	input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts;
	cards;
	1991 0 155 175 165
	1991 0 200 225 250
	1991 0 75 125 135
	1991 0 350 375 400
	;
	
%MACRO combination;  
	proc iml;
		use work.untrimmed where (group = &I & lag2cvrank = &J);
		read all var {lag2cfo_ts lagcfo_ts group lag2cvrank} into XY;
		close work.untrimmed;
		c = allcomb(nrow(XY), 2);       /* all "N choose 2" combinations of pairs */
		group = XY[1,3];        /* extract group from XY, to be used as a BY variable later */
		lag2cvrank = XY[1,4];   /* extract lag2cvrank from XY, to be used as a BY variable later */
		Pt1 = XY[c[,1],];       /* extract first point of line segments */
		Pt2 = XY[c[,2],];       /* extract second point of line segments */
		slope = (Pt1[,2] - Pt2[,2]) / (Pt1[,1] - Pt2[,1]); /* (y1 - y2)/(x1 - x2) */
		m = median(slope);      /* scalar */
		b = median( XY[,2] - m*XY[,1] );     /* median(y-mx) */ /* Scalar */
		mxy = m*XY[,2];         /* N x 1 Vector */
		t = nrow(XY);           /* number of rows in XY */
		int = J(t, 1, b);       /* nrow x 1 column vector */
		p = int + mxy;          /* nrow x 1 column vector */
		print (b||m)[c={'Intercept' 'Slope'} L="Method=Theil Combs=All"];   
		
		create ts_1_&I&J  var {int mxy p group lag2cvrank}; /** create data set **/
			append;       /** write new variables in vectors for each combo of group and lag2cvrank **/
		close ts_1_&I&J;     /** close the data set **/
	quit;
/*
	proc sql;
	create table perm1.ts_1_*I&J as
	select * from 
	untrimmed as a inner join ts_1&I&J as b on
*/
	%MEND combination;

%MACRO LOOP; 
	%DO I=1991 %TO 1991; 
		%DO J=0 %TO 0;
			%combination; 
		%END; 
	%END;
%MEND LOOP; 

%LOOP;

run;

proc print data=ts_1_19910;run;

rfrancis · Posted 05-08-2020 09:57 AM

Thank you. I did not know the IML forum existed. My bad. Thank you again. Rick

View solution in original post

Ksharp · Posted 05-08-2020 08:22 AM

Why not post it at IML forum, and calling @Rick_SAS

rfrancis · Posted 05-08-2020 09:57 AM

Thank you. I did not know the IML forum existed. My bad. Thank you again. Rick

Rick_SAS · Posted 05-08-2020 03:35 PM

I'm confused. I was going to move this thread to the SAS/IML Community, but you have marked it as Solved. Please let me know if you want the question moved and if you still need an answer (if so, mark as unsolved).

rfrancis · Posted 05-09-2020 09:37 AM

I apologize, just trying to stay out of the way. I will re-post in the IML forum. Thank you for you offer. And I apologize for the inconvenience.

Rick_SAS · Posted 05-09-2020 10:33 AM

It's no problem. I moved the thread to the IML forum.

ardehg · Posted 05-09-2020 10:53 AM

I would recommend BY-group processing by using the WHERE clause on the READ statement, as shown in this blog post:

In addition, you can use the "SETIN" and "SETOUT" statements to have two data sets open simultaneously and read and write to them in a loop. Refer to this blog post for a review of the available options:

Using the above techniques, you can remove the macro and the macro loop and instead use a pure IML loop over the number of BY-groups, as determined by PROC FREQ.

I have rewritten your program and modified your dataset to cover a more general case:

data untrimmed;
     input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts;
     cards;
     1991 0 155 175 165
     1991 0 200 225 250
     1991 0 75 125 135
     1991 0 350 375 400
     1991 1 155 175 165
     1991 1 200 225 250
     1991 1 75 125 135
     1992 0 155 175 165
     1992 0 200 225 250
     1992 0 75 125 135
     1992 0 350 375 400
     1992 2 155 175 165
     1992 2 200 225 250
     1992 2 75 125 135
     ;
     
/* find unique BY-group combinations */
proc freq data=untrimmed;
tables group*lag2cvrank / out=FreqOut;
run;

proc iml;
/* Suggest using the Theil-Sen function 
https://blogs.sas.com/content/iml/2019/05/28/theil-sen-robust-regression.html
because it handles the case of infinite slope.*/

  start compute_slopes(XY);
        c = allcomb(nrow(XY), 2);       /* all "N choose 2" combinations of pairs */
        group = XY[1,3];        /* extract group from XY, to be used as a BY variable later */
        lag2cvrank = XY[1,4];   /* extract lag2cvrank from XY, to be used as a BY variable later */
        Pt1 = XY[c[,1],];       /* extract first point of line segments */
        Pt2 = XY[c[,2],];       /* extract second point of line segments */
        slope = (Pt1[,2] - Pt2[,2]) / (Pt1[,1] - Pt2[,1]); /* (y1 - y2)/(x1 - x2) */
        m = median(slope);      /* scalar */
        b = median( XY[,2] - m*XY[,1] );     /* median(y-mx) */ /* Scalar */
        mxy = m*XY[,2];         /* N x 1 Vector */
        t = nrow(XY);           /* number of rows in XY */
        int = J(t, 1, b);       /* nrow x 1 column vector */
        p = int + mxy;          /* nrow x 1 column vector */
        print (b||m)[c={'Intercept' 'Slope'} L="Method=Theil Combs=All"];   

        group_col      = J(t, 1, group);
        lag2cvrank_col = J(t, 1, lag2cvrank);
                  
        return (int || mxy || p || group_col || lag2cvrank_col);
  finish;

  /* read the BY groups */
  use FreqOut nobs NumGroups;
  read all var {group lag2cvrank};
  close FreqOut;

  print NumGroups, group lag2cvrank;

  use work.untrimmed;      
  create ts_1 var {int mxy p group_col lag2cvrank_col};
  setin work.untrimmed;    
  setout ts_1;

  inVarNames = {"lag2cfo_ts" "lagcfo_ts" "group" "lag2cvrank"};
  do i = 1 to NumGroups;                    /* for each BY group */
     read all var inVarNames into XY 
         where(group=(group[i]) & lag2cvrank=(lag2cvrank[i]));
     /* X contains data for i_th group; analyze it */
     G = compute_slopes(XY);
     /* extract the columns of the matrix */
     int=G[,1]; mxy=G[,2]; p=G[,3]; group_col=G[,4]; lag2cvrank_col=G[,5];
     append;
  end;

  close work.untrimmed;
  close ts_1;

rfrancis · Posted 05-09-2020 10:56 AM

Thank you so much for your effort! I really appreciate it! Trying to pick up a basic knowledge of IML, so this really helpful!

ardehg · Posted 05-09-2020 11:00 AM

Best of luck to you.

rfrancis · Posted 05-12-2020 10:22 AM

Hi ardegh, two quick questions (BTW, your IML code works very well!) ... 1) the DO LOOP refers to NumGroups, which first appears in the USE statement; however, I'm not seeing where NumGroups obtains the values; logically PROC FREQ is the source, but I'm not seeing any assignment between PROC FREQ (or the related output dataset and NumGroups; what am I missing? and 2) the goal is to create a sas dataset with the results, which appear in the dataset ts_1; however, this dataset contains 16 obs but should only be 14 (from the original input data); the difference is two missing obs, which occur at the 8th and 16th observations; something about the loop is looking for data where no data exists maybe; so what is the source of the missing obs? Thank you for any ideas or suggestions! Rick

ardehg · Posted 05-12-2020 11:46 AM

Hi @rfrancis, the following are answers to your questions:

1- This line use FreqOut nobs NumGroups; means:

open the FreqOut dataset and store its number of observations in an IML variable named NumGroups. So NumGroups is assigned on this line.

2- Please change the following line in your program from:

create ts_1 var {int mxy p group lag2cvrank};

to:

create ts_1 var {int mxy p group_col lag2cvrank_col};

This change should eliminate those missing values. I have corrected my answer earlier in this thread to reflect this change.

rfrancis · Posted 05-12-2020 12:11 PM

Wow! So easy ... thank you for what you do on this site! I am grateful! Rick

ardehg · Posted 05-12-2020 01:59 PM

You are very welcome! Thanks for the follow-up questions.

Register Today!