BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dingdang
Fluorite | Level 6

croHello,

I have the following table

NameFirstnameAdress1Hobby1Tel1Mobil1... ...SameAsPrior2Adress2Hobby2Tel2Mobil2... ...SameAsPrior3Adress3Hobby3Tel3Mobil3

...

...

Lee

Mike

B23

reading012

1234

1....1....
Mueller

Anna

B34biking98714782B56cinema74578562....
VittelBenjaminB45hiking45614562B24biking79845613B98theater456742
HungWuB11theater47856891....2....

so i have 3 groups of variables, where each group has the same variables and there are many of them (say 200).

the variable SameAsPrior2 has 2 values, 1 means the values of the variables in group 2 are identical to those in group 1; 2 means the values of the variables in group 2 are not identical to those in group 1.

Analogous: the variable SameAsPrior3 has 3 values: 1 means the values of group 3 are identical to those of group 1; 2 means the values of group 3 are identical to those of group2; 3 means the values of group 3 should be different to those in group 1 and 2.

I want to do the following:

if SameAsPrior2=1, then copy the values of group 1 to group 2, no matter what stays currently in group 2. if SameAsPrior2=2, then leave the values of group 2 untouched.

if SameAsPrior3=1, then copy the values of group 1 to group 3. if SameAsPrior3=2, then copy the values of group 2 to group 3. If SameAsPrior3=3, then leave the values of group 3 untouched.

In this example the table will then look like:

NameFirstnameAdress1Hobby1Tel1

Mobil1

...

...

Adress2Hobby2Tel2Mobil2

...

...

Adress3Hobby3Tel3Mobil3
LeeMikeB23reading0121234B23reading0121234B23reading0121234
MuellerAnnaB34biking9871478B56cinema7457856B56cinema7457856

can someone give me an example code for this problem? would be really thankful for that!!

BR  Dingdang

1 ACCEPTED SOLUTION

Accepted Solutions
Vince28_Statcan
Quartz | Level 8

Based on this update, here's how I would go about reading the appropriate variables in macros to help declare 6 arrays for processing.

Assume you have loaded your excel sheet in dataset HAVE in WORK library

proc sql;

     select name

     into :g1n separated by " "

     where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_1"

;

     select name

     into :g1c separated by " "

     where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_1"

;

     select name

     into :g2n separated by " "

     where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_2"

;

     select name

     into :g2c separated by " "

  where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_2"

;

     select name

     into :g3n separated by " "

     where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_3"

;

     select name

     into :g3c separated by " "

  where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_3"

;

quit;

/* NOTE: if sameasprior2 and sameasprior3 really do not have the _ mentionned for other variables, then it won't be part of the array list as desired for the code below. Otherwise, you may need to play a bit around either not to load the said variable in the above proc sql or by creating a sameasprior1 variable with a length statement and adding it to the array g1n definition so that all array dimensions are lined up. You would still need to further adjust loop on num arrays to begin @ 2*/

data step1;

     set have;

     array g1c {*} &g1c;

     array g1n {*} &g1n;

     array g2c {*} &g2c;

     array g2n {*} &g2n;

     array g3c {*} &g3c;

     array g3n {*} &g3n;

     if SameAsPrior2=1 then do;

          do i =1 to dim(g1c);

               g2c{i}=g1c{i};

          end;

          do i=1 to dim(g1n);

               g2n{i}=g1n{i};

          end;

          if SameAsPrior3=1 then do;

               do i =1 to dim(g1c);

                    g3c{i}=g1c{i};

               end;

               do i=1 to dim(g1n);

                    g3n{i}=g1n{i};

               end;

          end;

          else if SameAsPrior3=2 then do;

               do i =1 to dim(g1c);

                    g3c{i}=g2c{i};

               end;

               do i=1 to dim(g1n);

                    g3n{i}=g2n{i};

               end;

          end;

     end;  

run;

Some slight efficiency gain could be done by creating and retaining a numeric variable for dim(g1c) and dim(g1n). Code is obviously untested, I don't have nearly enough imagination to create an example dataset this wide but if you post log we can help debug

Vince

View solution in original post

6 REPLIES 6
Astounding
PROC Star

Arrays?  Yes

Call execute?  No

Macros?  No, unless your list of variables is incredibly long

Assuming that you know the basics of SAS so that you know how to write a DATA step that brings in the data, here is the sort of code that  you could add to that DATA step:

array addr {200} adress1-adress200;

array hobbys {200} hobby1-hobby200;

array tels {200} tel1-tel200;

array mobils {200} mobil1-mobile200;

array sameas {2:200} SameasPrior2-SameasPrior200;

do _i_=2 to 200;

    if . < sameas{_i_} < _i_ then do;

       addr{_i_} = addr{sameas{_i_}};

       hobbys{_i_} = hobbys{sameas{_i_}};

       ...

       mobils{_i_} = mobils{sameas{_i_}};

    end;

end;

Good luck.

Vince28_Statcan
Quartz | Level 8

*edit* the editor crashed or something and I overwrote my previous post. I simply meant to restate that I believe OP had 200 named variables over 3 groups and not 200 groups of the 5 examplified named variables so in order to construct the arrays, you need either some awkward use of _CHARACTER_ and _NUMERIC or possibly the use of dictionary table VCOLUMN and some minor macroing to define the arrays.

If your table is built how I understood it and assuming that SameAsPrior2 and SameAsPrior3 are both numeric variables, I believe the following would work:

data step1;

     length dummy 8.; /*dummy var to ease index calculation*/

     retain dummy; /* to save computing time */

     set have(drop=name firstname);

     array ch {*} _CHARACTER_;

     array nu {*} _NUMERIC_;

     if _N_=1 then do;

     dummy=dim(nu)/3; /* figure out how many variables are numeric */

     end;

     if SameAsPrior2=1 then do;

          do i =1 to dummy;

               nu(i+dummy)=nu(i);

          end;

          do i=1 to dim(ch)/3;

               ch(i+dim(ch)/3)=ch(i);

          end;

     end;  

     if SameAsPrior3=1 then do;

          do i=1 to dummy;

               nu(i+2*dummy)=nu(i);

          end;

          do i=1 to dim(ch)/3;

               ch(i+2*dim(ch)/3)=ch(i);

          end;

     end;

     if SameAsPrior3=2 then do;

          do i=1 to dummy;

               nu(i+2*dummy) = nu(i+dummy);

          end;

          do i=1 to dim(ch)/3;

               nu(i+2*dim(ch)/3)=nu(i+dim(ch)/3);

          end;

     end;

run;

data want;

  merge have(keep=name firstname) step1(drop=dummy); /*1 to 1 remerge because I didn't want ot have to mess with -2/+2 index calculation for CH array */

run;

It would gain in efficiency to keep the full set and tweak relative referencing with some + and - but it's much smoother without a detailed visual mapping of the array to just make sure you have 3* however many variables you want.

If you named dummy SameAsPrior1 instead and that your entire table is character instead of mixed with numeric, you could've used :1, :2, :3 notation instead and improved array arithmetic referencing

If when SameAsPrior2=1 then SameAsPrior3=1, you could simplify into only 2 if conditions.

Vince

Astounding
PROC Star

Vince,

You're probably right that the number of variables is large and the number of groups is small.  Looking at the final table structure (and the boxes moving off to the right), I'm not sure that the number of groups would be limited to 3.  At any rate, here are just some of the issues I would consider before proposing a final solution.  (Please note that none of this is intended as a direct comment on the code that you posted.)

Is this a one-time application, or a many-times application?  (Sample impact:  can we hard-code the number of groups or should that be calculated by the program?)

Is the structure of the data known and reliable?  For example, are we sure that Name and Firstname are the only two variables that should be omitted from the arrays?  Are we sure of the internal order of the remaining variables?

Should the user be able to understand how the program works?  As one example of how to simplify, it might be appropriate to have the user specify two macro variables:

%let num_prefix_list = Mobil ...;

%let char_prefix_list = Adress Hobby Tel ...;

From my experience, there are many solutions posted on this board that are 100% valid, in a technical sense.  But the original poster can't understand them.  There are plenty of cases where I would rather adjust my answer to be bulkier to code, and less flexible, but easier to understand.

Dingdang
Fluorite | Level 6

Hi Vince and Astounding,

thank you both very much for your answers. Sorry I didn’t come back to you earlier. Was a bit stressed from other tasks. Here I try to formulate my problem again. There are indeed a large number of variables and 3 groups.

I am doing a kind of survey. People get a questionary in Excel with a few worksheets and there are variables, which are independent of the groups, which means there are more variables than just name and firstname that should be omitted from the arrays. they are i.e. on the first sheet. and then there come three seperate sheets to the groups, containing the same kind of variables, only belonging to different groups. The variables in the first group are named like “xx_1”, “yy_1” etc. the second group “xx_2” “yy_2” and so on.

the first of these three sheets includes the first group, and everyone has to fill in this sheet. In order to reduce the workload for the persons queried, I am putting a variable SameAsPrior2 on the top of the second sheet. if they choose 1, which means the values of the second group are identical to the first group, then they can leave this sheet and go to the third one. If they choose 2, then they have to fill in it. and analogous to the third sheet, they can choose if the values should be identical to one of the first sheets or they have to fill in per hand again. I am not allowed to use macro in Excel, because of security reasons. and because of the large num-ber of variables i am having, i dont want to use any “when, then” function in excel, to avoid any speed problem. At the end of the questionary I have a separate sheet to aggregate all the information filled. For the sheets to the groups I am leaving enough space for the group 2 and group 3. But in some cases they are of course empty.

I then collect the workingtables and read the aggregated sheets into SAS. For my analyse I don’t want certain variables to be empty, i.e. the variables with names starting with “xx”. That’s why I want the information correctly copied from other groups, which were filled.  the internal order of the variables doesnt really matter.

I hope this time I made my problem clear. And thanks again for your help.

BR  Dingdang

Vince28_Statcan
Quartz | Level 8

Based on this update, here's how I would go about reading the appropriate variables in macros to help declare 6 arrays for processing.

Assume you have loaded your excel sheet in dataset HAVE in WORK library

proc sql;

     select name

     into :g1n separated by " "

     where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_1"

;

     select name

     into :g1c separated by " "

     where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_1"

;

     select name

     into :g2n separated by " "

     where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_2"

;

     select name

     into :g2c separated by " "

  where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_2"

;

     select name

     into :g3n separated by " "

     where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_3"

;

     select name

     into :g3c separated by " "

  where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_3"

;

quit;

/* NOTE: if sameasprior2 and sameasprior3 really do not have the _ mentionned for other variables, then it won't be part of the array list as desired for the code below. Otherwise, you may need to play a bit around either not to load the said variable in the above proc sql or by creating a sameasprior1 variable with a length statement and adding it to the array g1n definition so that all array dimensions are lined up. You would still need to further adjust loop on num arrays to begin @ 2*/

data step1;

     set have;

     array g1c {*} &g1c;

     array g1n {*} &g1n;

     array g2c {*} &g2c;

     array g2n {*} &g2n;

     array g3c {*} &g3c;

     array g3n {*} &g3n;

     if SameAsPrior2=1 then do;

          do i =1 to dim(g1c);

               g2c{i}=g1c{i};

          end;

          do i=1 to dim(g1n);

               g2n{i}=g1n{i};

          end;

          if SameAsPrior3=1 then do;

               do i =1 to dim(g1c);

                    g3c{i}=g1c{i};

               end;

               do i=1 to dim(g1n);

                    g3n{i}=g1n{i};

               end;

          end;

          else if SameAsPrior3=2 then do;

               do i =1 to dim(g1c);

                    g3c{i}=g2c{i};

               end;

               do i=1 to dim(g1n);

                    g3n{i}=g2n{i};

               end;

          end;

     end;  

run;

Some slight efficiency gain could be done by creating and retaining a numeric variable for dim(g1c) and dim(g1n). Code is obviously untested, I don't have nearly enough imagination to create an example dataset this wide but if you post log we can help debug

Vince

Dingdang
Fluorite | Level 6

hi Vince,

thanks very much for your help Smiley Happy  your code is understandable for a SAS beginner like me. I am going to try it out tomorrow when i get my SAS programm again ( it was gone after the company pc update Smiley Sad). i probably need a few adjustments like you mentioned. cannot wait to find out if it works Smiley Happy

BR  Dingdang

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1228 views
  • 1 like
  • 3 in conversation