croHello,
I have the following table
Name | Firstname | Adress1 | Hobby1 | Tel1 | Mobil1 | ... ... | SameAsPrior2 | Adress2 | Hobby2 | Tel2 | Mobil2 | ... ... | SameAsPrior3 | Adress3 | Hobby3 | Tel3 | Mobil3 | ... ... | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lee | Mike | B23 | reading | 012 | 1234 | 1 | . | . | . | . | 1 | . | . | . | . | |||||||||||||||||||||
Mueller | Anna | B34 | biking | 987 | 1478 | 2 | B56 | cinema | 745 | 7856 | 2 | . | . | . | . | |||||||||||||||||||||
Vittel | Benjamin | B45 | hiking | 456 | 1456 | 2 | B24 | biking | 798 | 4561 | 3 | B98 | theater | 456 | 742 | |||||||||||||||||||||
Hung | Wu | B11 | theater | 478 | 5689 | 1 | . | . | . | . | 2 | . | . | . | . |
so i have 3 groups of variables, where each group has the same variables and there are many of them (say 200).
the variable SameAsPrior2 has 2 values, 1 means the values of the variables in group 2 are identical to those in group 1; 2 means the values of the variables in group 2 are not identical to those in group 1.
Analogous: the variable SameAsPrior3 has 3 values: 1 means the values of group 3 are identical to those of group 1; 2 means the values of group 3 are identical to those of group2; 3 means the values of group 3 should be different to those in group 1 and 2.
I want to do the following:
if SameAsPrior2=1, then copy the values of group 1 to group 2, no matter what stays currently in group 2. if SameAsPrior2=2, then leave the values of group 2 untouched.
if SameAsPrior3=1, then copy the values of group 1 to group 3. if SameAsPrior3=2, then copy the values of group 2 to group 3. If SameAsPrior3=3, then leave the values of group 3 untouched.
In this example the table will then look like:
Name | Firstname | Adress1 | Hobby1 | Tel1 | Mobil1 | ... ... | Adress2 | Hobby2 | Tel2 | Mobil2 | ... ... | Adress3 | Hobby3 | Tel3 | Mobil3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lee | Mike | B23 | reading | 012 | 1234 | B23 | reading | 012 | 1234 | B23 | reading | 012 | 1234 | ||||||||||||
Mueller | Anna | B34 | biking | 987 | 1478 | B56 | cinema | 745 | 7856 | B56 | cinema | 745 | 7856 |
can someone give me an example code for this problem? would be really thankful for that!!
BR Dingdang
Based on this update, here's how I would go about reading the appropriate variables in macros to help declare 6 arrays for processing.
Assume you have loaded your excel sheet in dataset HAVE in WORK library
proc sql;
select name
into :g1n separated by " "
where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_1"
;
select name
into :g1c separated by " "
where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_1"
;
select name
into :g2n separated by " "
where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_2"
;
select name
into :g2c separated by " "
where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_2"
;
select name
into :g3n separated by " "
where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_3"
;
select name
into :g3c separated by " "
where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_3"
;
quit;
/* NOTE: if sameasprior2 and sameasprior3 really do not have the _ mentionned for other variables, then it won't be part of the array list as desired for the code below. Otherwise, you may need to play a bit around either not to load the said variable in the above proc sql or by creating a sameasprior1 variable with a length statement and adding it to the array g1n definition so that all array dimensions are lined up. You would still need to further adjust loop on num arrays to begin @ 2*/
data step1;
set have;
array g1c {*} &g1c;
array g1n {*} &g1n;
array g2c {*} &g2c;
array g2n {*} &g2n;
array g3c {*} &g3c;
array g3n {*} &g3n;
if SameAsPrior2=1 then do;
do i =1 to dim(g1c);
g2c{i}=g1c{i};
end;
do i=1 to dim(g1n);
g2n{i}=g1n{i};
end;
if SameAsPrior3=1 then do;
do i =1 to dim(g1c);
g3c{i}=g1c{i};
end;
do i=1 to dim(g1n);
g3n{i}=g1n{i};
end;
end;
else if SameAsPrior3=2 then do;
do i =1 to dim(g1c);
g3c{i}=g2c{i};
end;
do i=1 to dim(g1n);
g3n{i}=g2n{i};
end;
end;
end;
run;
Some slight efficiency gain could be done by creating and retaining a numeric variable for dim(g1c) and dim(g1n). Code is obviously untested, I don't have nearly enough imagination to create an example dataset this wide but if you post log we can help debug
Vince
Arrays? Yes
Call execute? No
Macros? No, unless your list of variables is incredibly long
Assuming that you know the basics of SAS so that you know how to write a DATA step that brings in the data, here is the sort of code that you could add to that DATA step:
array addr {200} adress1-adress200;
array hobbys {200} hobby1-hobby200;
array tels {200} tel1-tel200;
array mobils {200} mobil1-mobile200;
array sameas {2:200} SameasPrior2-SameasPrior200;
do _i_=2 to 200;
if . < sameas{_i_} < _i_ then do;
addr{_i_} = addr{sameas{_i_}};
hobbys{_i_} = hobbys{sameas{_i_}};
...
mobils{_i_} = mobils{sameas{_i_}};
end;
end;
Good luck.
*edit* the editor crashed or something and I overwrote my previous post. I simply meant to restate that I believe OP had 200 named variables over 3 groups and not 200 groups of the 5 examplified named variables so in order to construct the arrays, you need either some awkward use of _CHARACTER_ and _NUMERIC or possibly the use of dictionary table VCOLUMN and some minor macroing to define the arrays.
If your table is built how I understood it and assuming that SameAsPrior2 and SameAsPrior3 are both numeric variables, I believe the following would work:
data step1;
length dummy 8.; /*dummy var to ease index calculation*/
retain dummy; /* to save computing time */
set have(drop=name firstname);
array ch {*} _CHARACTER_;
array nu {*} _NUMERIC_;
if _N_=1 then do;
dummy=dim(nu)/3; /* figure out how many variables are numeric */
end;
if SameAsPrior2=1 then do;
do i =1 to dummy;
nu(i+dummy)=nu(i);
end;
do i=1 to dim(ch)/3;
ch(i+dim(ch)/3)=ch(i);
end;
end;
if SameAsPrior3=1 then do;
do i=1 to dummy;
nu(i+2*dummy)=nu(i);
end;
do i=1 to dim(ch)/3;
ch(i+2*dim(ch)/3)=ch(i);
end;
end;
if SameAsPrior3=2 then do;
do i=1 to dummy;
nu(i+2*dummy) = nu(i+dummy);
end;
do i=1 to dim(ch)/3;
nu(i+2*dim(ch)/3)=nu(i+dim(ch)/3);
end;
end;
run;
data want;
merge have(keep=name firstname) step1(drop=dummy); /*1 to 1 remerge because I didn't want ot have to mess with -2/+2 index calculation for CH array */
run;
It would gain in efficiency to keep the full set and tweak relative referencing with some + and - but it's much smoother without a detailed visual mapping of the array to just make sure you have 3* however many variables you want.
If you named dummy SameAsPrior1 instead and that your entire table is character instead of mixed with numeric, you could've used :1, :2, :3 notation instead and improved array arithmetic referencing
If when SameAsPrior2=1 then SameAsPrior3=1, you could simplify into only 2 if conditions.
Vince
Vince,
You're probably right that the number of variables is large and the number of groups is small. Looking at the final table structure (and the boxes moving off to the right), I'm not sure that the number of groups would be limited to 3. At any rate, here are just some of the issues I would consider before proposing a final solution. (Please note that none of this is intended as a direct comment on the code that you posted.)
Is this a one-time application, or a many-times application? (Sample impact: can we hard-code the number of groups or should that be calculated by the program?)
Is the structure of the data known and reliable? For example, are we sure that Name and Firstname are the only two variables that should be omitted from the arrays? Are we sure of the internal order of the remaining variables?
Should the user be able to understand how the program works? As one example of how to simplify, it might be appropriate to have the user specify two macro variables:
%let num_prefix_list = Mobil ...;
%let char_prefix_list = Adress Hobby Tel ...;
From my experience, there are many solutions posted on this board that are 100% valid, in a technical sense. But the original poster can't understand them. There are plenty of cases where I would rather adjust my answer to be bulkier to code, and less flexible, but easier to understand.
Hi Vince and Astounding,
thank you both very much for your answers. Sorry I didn’t come back to you earlier. Was a bit stressed from other tasks. Here I try to formulate my problem again. There are indeed a large number of variables and 3 groups.
I am doing a kind of survey. People get a questionary in Excel with a few worksheets and there are variables, which are independent of the groups, which means there are more variables than just name and firstname that should be omitted from the arrays. they are i.e. on the first sheet. and then there come three seperate sheets to the groups, containing the same kind of variables, only belonging to different groups. The variables in the first group are named like “xx_1”, “yy_1” etc. the second group “xx_2” “yy_2” and so on.
the first of these three sheets includes the first group, and everyone has to fill in this sheet. In order to reduce the workload for the persons queried, I am putting a variable SameAsPrior2 on the top of the second sheet. if they choose 1, which means the values of the second group are identical to the first group, then they can leave this sheet and go to the third one. If they choose 2, then they have to fill in it. and analogous to the third sheet, they can choose if the values should be identical to one of the first sheets or they have to fill in per hand again. I am not allowed to use macro in Excel, because of security reasons. and because of the large num-ber of variables i am having, i dont want to use any “when, then” function in excel, to avoid any speed problem. At the end of the questionary I have a separate sheet to aggregate all the information filled. For the sheets to the groups I am leaving enough space for the group 2 and group 3. But in some cases they are of course empty.
I then collect the workingtables and read the aggregated sheets into SAS. For my analyse I don’t want certain variables to be empty, i.e. the variables with names starting with “xx”. That’s why I want the information correctly copied from other groups, which were filled. the internal order of the variables doesnt really matter.
I hope this time I made my problem clear. And thanks again for your help.
BR Dingdang
Based on this update, here's how I would go about reading the appropriate variables in macros to help declare 6 arrays for processing.
Assume you have loaded your excel sheet in dataset HAVE in WORK library
proc sql;
select name
into :g1n separated by " "
where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_1"
;
select name
into :g1c separated by " "
where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_1"
;
select name
into :g2n separated by " "
where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_2"
;
select name
into :g2c separated by " "
where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_2"
;
select name
into :g3n separated by " "
where libname="WORK" and memname="HAVE" and type="num" and substr(trim(name), -1, 2)="_3"
;
select name
into :g3c separated by " "
where libname="WORK" and memname="HAVE" and type="char" and substr(trim(name), -1, 2)="_3"
;
quit;
/* NOTE: if sameasprior2 and sameasprior3 really do not have the _ mentionned for other variables, then it won't be part of the array list as desired for the code below. Otherwise, you may need to play a bit around either not to load the said variable in the above proc sql or by creating a sameasprior1 variable with a length statement and adding it to the array g1n definition so that all array dimensions are lined up. You would still need to further adjust loop on num arrays to begin @ 2*/
data step1;
set have;
array g1c {*} &g1c;
array g1n {*} &g1n;
array g2c {*} &g2c;
array g2n {*} &g2n;
array g3c {*} &g3c;
array g3n {*} &g3n;
if SameAsPrior2=1 then do;
do i =1 to dim(g1c);
g2c{i}=g1c{i};
end;
do i=1 to dim(g1n);
g2n{i}=g1n{i};
end;
if SameAsPrior3=1 then do;
do i =1 to dim(g1c);
g3c{i}=g1c{i};
end;
do i=1 to dim(g1n);
g3n{i}=g1n{i};
end;
end;
else if SameAsPrior3=2 then do;
do i =1 to dim(g1c);
g3c{i}=g2c{i};
end;
do i=1 to dim(g1n);
g3n{i}=g2n{i};
end;
end;
end;
run;
Some slight efficiency gain could be done by creating and retaining a numeric variable for dim(g1c) and dim(g1n). Code is obviously untested, I don't have nearly enough imagination to create an example dataset this wide but if you post log we can help debug
Vince
hi Vince,
thanks very much for your help your code is understandable for a SAS beginner like me. I am going to try it out tomorrow when i get my SAS programm again ( it was gone after the company pc update ). i probably need a few adjustments like you mentioned. cannot wait to find out if it works
BR Dingdang
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.