Hi siva,
Based on your requirement i wrote the following code; I've assumed you've one (group, newstore) as key , another (group, modelid). Is there any chance that you'll have same model id in different groups? For the current scenario i kept the array limit as 3. but you can change that based on your need.
data g_s_m;
input Group Newstore Modelid;
datalines;
10549 6453 5582
10549 6453 5217
10549 6453 2871
10549 2001 5582
10549 2001 5217
10549 2001 2871
10548 9648 6453
10548 9648 3251
10548 9648 2205
10548 9453 6453
10548 9453 3251
10548 9453 2205
;
/* source 1 */
Proc sql;
create table temp as
select
g_s.group, g_s.newstore,g_m.modelid
from
(Select distinct
group
, newstore
from g_s_m
) g_s,
(select distinct
group
, modelid
from g_s_m) as g_m
where g_s.group=g_m.group
order by group, newstore, modelid;
quit;
proc print;run;
/*I realised, For the subset of data provided creation of source1 produced exactly the same result as g_s_m */ . So instead of that directly we can use the g_s_m table */
/* using source1 to create source2 with unique modelid at group level */
proc sql;
create table group_modelid as
select distinct group, modelid from
temp
order by group, modelid desc
;
quit;
/* transpose modelid at group */
proc transpose data=group_modelid out =grp_col_mod prefix=model_;
by group;
var modelid;
run;
proc sql;
create table input1
as
select distinct group, newstore
from temp;
quit;
data final;
merge input1 (in=a) grp_col_mod (in=b);
by group;
if a and b;
run;
data final1(keep=group newstore final_mod);
set final;
by group;
array model_sel model_1-model_3;
if first.group then index=1;
else index+1;
final_mod=model_sel{index};
run;
proc print;
run;
Message was edited by: Jeyvinth Manoj Rayan
Hi,
i ran the sample code which you gave it was as i expected ..
Could you please explain the below sample code
data final1(keep=group newstore final_mod);
set final;
by group;
array model_sel model_1-model_3;
if first.group then index=1;
else index+1;
final_mod=model_sel{index};//What this line will do
run;
proc print;
run;
Sivanandam,
I didn't write the code, but can definitely explain it.
The file, final, apparently has your values in multiple records in the order that you want them, in variables model_1 to model_3, with the desired value for the first record (in each group) in model_1, the second record in Model_2, etc.
The code keeps track of which record it is looking at by using
if first.group then index=1;
else index+1;
Then, it selects final_mod by picking the first, second or third value, respectively:
final_mod=model_sel{index}; /*<-What this line will do*/
Thanks for jumping in to explain Art. appreciated!. i was away for sometime.
Basically what i'm doing here is
1. My input1 is ordered by what you want at group, newstore.
2. I transposed the modelid by group level to have unique model id.
3. so once i merge in the above step (Input1, grp_col_mod). I'm actually appending the unique/distict model id at group level with
my "input1".
if first.group then index=1;
else index+1;
final_mod=model_sel{index};//What this line will do
4. Now my first two columns are ready,(i mean group, newstore). So every group. i'm assiging the modelid staring from 1 to 3(in this case).
if a group has multiple records i'm incrementing my index to fetch index+1 record.
Hope this clarifes. welcome your queries if you still not clear about it. Let us know about the other scenarios if it's not covered.
Hi,
Not sure but will this work.
data a;
infile datalines delimiter='|';
input a : $10. b : $10. c : $10.;
datalines;
10549|6453|5582
10549|6453|5217
10549|6453|2871
10549|2001|5582
10549|2001|5217
10549|2001|2871
10548|9648|6453
10548|9648|3251
10548|9648|2205
10548|9453|6453
10548|9453|3251
10548|9453|2205
;
run;
proc sort data=a;
by descending a descending b descending c;
run;
data b;
set a;
by descending a descending b descending c;
retain _b _c;
format _b _c $100.;
if first.a then do;
_b=b;
_c=c;
flag=1;
end;
else do;
if findw(_b,b,',','T')=0 then do;
if findw(_c,c,',','T') = 0 then do;
flag=1;
_c=cats(_c,',', c);
_b=cats(_b,',', b);
end;
end;
end;
if flag=1 then output;
drop _b _c;
run;
Your original data seems to have all new stores matched to all model ids.
If this is true then you can use a data step many-to-many merge on GROUP to assign stores to models.
data have ;
input Group Newstore Modelid ;
cards;
10549 6453 5582
10549 6453 5217
10549 6453 2871
10549 2001 5582
10549 2001 5217
10549 2001 2871
10548 9648 6453
10548 9648 3251
10548 9648 2205
10548 9453 6453
10548 9453 3251
10548 9453 2205
run;
proc sort data=have(keep=group modelid) out=modelids nodupkey;
by group modelid;
run;
proc sort data=have(keep=group newstore) out=stores nodupkey;
by group newstore;
run;
* Need to reset the IN flag to keep it from matching the last store to all of the remaining models ;
data want ;
in1=0;
merge stores(in=in1) modelids;
by group ;
if in1;
put group newstore modelid ;
run;
If you want to introduce some randomness into the issue then sort the stores and models randomly within the groups before the merge.
Nice!. reseting the in flag is a new learning for me
Hi Tom,
Could you please epalin about IN flag in detail so that we can learn about that.
Thanks,
Siva
Thanks Art.
The flags created by the IN= dataset option will be boolean flags indicating whether the dataset contributed to the current observation.
For 1-to-1 and 1-to-N matches the meaning is clear.
But when you have N-to-M matches the logic is a little more complex. SAS will basically match the records 1 to 1 until it runs out of records for that BY group from one of the sources. In that case the result is that SAS will carry forward the variables for the last observation from that dataset for the rest of the BY group. This includes the IN= flag. Resetting the flag (either at the end of the step or before the MERGE statement at the beginning of the step) will cause it to not be set until SAS reads another observation from that dataset. You might want to also add a CALL MISSING statement to force the variables to missing as they will also retain their values. You do not need that in this program for the variables coming from the STORES dataset, but you might need it for the variables coming from the MODELIDS dataset if it is possbile to have more newstores than modelids for a group.
I noticed that the code you said worked didn't actually provide what you originally said was the desired result, thus I presume that it doesn't make any difference which modelid is selected second, as long as the newstore and modelid aren't the same as the first one.
If that is correct, couldn't you just use some simple code like:
data g_s_m;
input Group Newstore Modelid;
datalines;
10549 6453 5582
10549 6453 5217
10549 6453 2871
10549 2001 5582
10549 2001 5217
10549 2001 2871
10548 9648 6453
10548 9648 3251
10548 9648 2205
10548 9453 6453
10548 9453 3251
10548 9453 2205
;
data want (keep=group newstore modelid);
set g_s_m;
by group newstore notsorted;
retain first_newstore first_modelid;
if first.group and first.newstore then do;
n=1;
first_newstore=newstore;
first_modelid=modelid;
output;
end;
else if newstore ne first_newstore and n eq 1 then do;
if modelid ne first_modelid then do;
n+1;
output;
end;
end;
run;
Of course, if you data aren't already grouped by group and, within each group grouped by newstore, you would first have to sort the data. E.g., using something like:
proc sort data=g_s_m;
by group newstore;
run;
HI All,
Thanks for all your response.It's very usefull.
am currently working with various input scenerios.Once it complete let you know.
Thanks,
Siva.
Sorry ,so late to reponse you.
Tom has already gave you an answer, My code is another way if you would like.
"it varies based up on group number."
So I assume that modelid is different from each group.
data have ; input Group Newstore Modelid ; cards; 10549 6453 5582 10549 6453 5217 10549 6453 2871 10549 2001 5582 10549 2001 5217 10549 2001 2871 10548 9648 6453 10548 9648 3251 10548 9648 2205 10548 9453 6453 10548 9453 3251 10548 9453 2205 run; proc sort data=have(keep=group modelid) out=modelids nodupkey; by group modelid; run; proc sort data=have(keep=group newstore) out=newstore nodupkey; by group newstore; run; data result(where=(newstore is not missing)); merge newstore modelids; by group; output; call missing(newstore); run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.