Fetching records with unique value

jonam · Posted 09-10-2011 04:13 AM

Hi siva,

Based on your requirement i wrote the following code; I've assumed you've one (group, newstore) as key , another (group, modelid). Is there any chance that you'll have same model id in different groups? For the current scenario i kept the array limit as 3. but you can change that based on your need.

data g_s_m;

input Group Newstore Modelid;

datalines;

10549 6453 5582

10549 6453 5217

10549 6453 2871

10549 2001 5582

10549 2001 5217

10549 2001 2871

10548 9648 6453

10548 9648 3251

10548 9648 2205

10548 9453 6453

10548 9453 3251

10548 9453 2205

;

/* source 1 */

Proc sql;

create table temp as

select

g_s.group, g_s.newstore,g_m.modelid

from

(Select distinct

group

, newstore

from g_s_m

) g_s,

(select distinct

group

, modelid

from g_s_m) as g_m

where g_s.group=g_m.group

order by group, newstore, modelid;

quit;

proc print;run;

/*I realised, For the subset of data provided creation of source1 produced exactly the same result as g_s_m */ . So instead of that directly we can use the g_s_m table */

/* using source1 to create source2 with unique modelid at group level */

proc sql;

create table group_modelid as

select distinct group, modelid from

temp

order by group, modelid desc

;

quit;

/* transpose modelid at group */

proc transpose data=group_modelid out =grp_col_mod prefix=model_;

by group;

var modelid;

run;

proc sql;

create table input1

as

select distinct group, newstore

from temp;

quit;

data final;

merge input1 (in=a) grp_col_mod (in=b);

by group;

if a and b;

run;

data final1(keep=group newstore final_mod);

set final;

by group;

array model_sel model_1-model_3;

if first.group then index=1;

else index+1;

final_mod=model_sel{index};

run;

proc print;

run;

Message was edited by: Jeyvinth Manoj Rayan

Sivanandam · Posted 09-10-2011 11:00 AM

Hi,

i ran the sample code which you gave it was as i expected ..

Could you please explain the below sample code

data final1(keep=group newstore final_mod);

set final;

by group;

array model_sel model_1-model_3;

if first.group then index=1;

else index+1;

final_mod=model_sel{index};//What this line will do

run;

proc print;

run;

art297 · Posted 09-10-2011 11:46 AM

Sivanandam,

I didn't write the code, but can definitely explain it.

The file, final, apparently has your values in multiple records in the order that you want them, in variables model_1 to model_3, with the desired value for the first record (in each group) in model_1, the second record in Model_2, etc.

The code keeps track of which record it is looking at by using

if first.group then index=1;

else index+1;

Then, it selects final_mod by picking the first, second or third value, respectively:

final_mod=model_sel{index}; /*<-What this line will do*/

jonam · Posted 09-11-2011 03:35 AM

Thanks for jumping in to explain Art. appreciated!. i was away for sometime.

Basically what i'm doing here is

1. My input1 is ordered by what you want at group, newstore.

2. I transposed the modelid by group level to have unique model id.

3. so once i merge in the above step (Input1, grp_col_mod). I'm actually appending the unique/distict model id at group level with

my "input1".

if first.group then index=1;

else index+1;

final_mod=model_sel{index};//What this line will do

4. Now my first two columns are ready,(i mean group, newstore). So every group. i'm assiging the modelid staring from 1 to 3(in this case).

if a group has multiple records i'm incrementing my index to fetch index+1 record.

Hope this clarifes. welcome your queries if you still not clear about it. Let us know about the other scenarios if it's not covered.

NN · Posted 09-10-2011 04:49 AM

Hi,

Not sure but will this work.

data a;

infile datalines delimiter='|';

input a : $10. b : $10. c : $10.;

datalines;

10549|6453|5582

10549|6453|5217

10549|6453|2871

10549|2001|5582

10549|2001|5217

10549|2001|2871

10548|9648|6453

10548|9648|3251

10548|9648|2205

10548|9453|6453

10548|9453|3251

10548|9453|2205

;

run;

proc sort data=a;

by descending a descending b descending c;

run;

data b;

set a;

by descending a descending b descending c;

retain _b _c;

format _b _c $100.;

if first.a then do;

_b=b;

_c=c;

flag=1;

end;

else do;

if findw(_b,b,',','T')=0 then do;

if findw(_c,c,',','T') = 0 then do;

flag=1;

_c=cats(_c,',', c);

_b=cats(_b,',', b);

end;

if flag=1 then output;

drop _b _c;

run;

Tom · Posted 09-10-2011 02:19 PM

Your original data seems to have all new stores matched to all model ids.

If this is true then you can use a data step many-to-many merge on GROUP to assign stores to models.

data have ;

input Group Newstore Modelid ;

cards;

10549 6453 5582

10549 6453 5217

10549 6453 2871

10549 2001 5582

10549 2001 5217

10549 2001 2871

10548 9648 6453

10548 9648 3251

10548 9648 2205

10548 9453 6453

10548 9453 3251

10548 9453 2205

run;

proc sort data=have(keep=group modelid) out=modelids nodupkey;

by group modelid;

run;

proc sort data=have(keep=group newstore) out=stores nodupkey;

by group newstore;

run;

* Need to reset the IN flag to keep it from matching the last store to all of the remaining models ;

data want ;

in1=0;

merge stores(in=in1) modelids;

by group ;

if in1;

put group newstore modelid ;

run;

If you want to introduce some randomness into the issue then sort the stores and models randomly within the groups before the merge.

jonam · Posted 09-11-2011 03:45 AM

Nice!. reseting the in flag is a new learning for me

Sivanandam · Posted 09-11-2011 02:07 PM

Hi Tom,

Could you please epalin about IN flag in detail so that we can learn about that.

Thanks,

Siva

art297 · Posted 09-11-2011 02:18 PM

the in= set option is described at:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000131134.htm

Sivanandam · Posted 09-12-2011 12:48 AM

Thanks Art.

Tom · Posted 09-11-2011 02:22 PM

The flags created by the IN= dataset option will be boolean flags indicating whether the dataset contributed to the current observation.

For 1-to-1 and 1-to-N matches the meaning is clear.

But when you have N-to-M matches the logic is a little more complex. SAS will basically match the records 1 to 1 until it runs out of records for that BY group from one of the sources. In that case the result is that SAS will carry forward the variables for the last observation from that dataset for the rest of the BY group. This includes the IN= flag. Resetting the flag (either at the end of the step or before the MERGE statement at the beginning of the step) will cause it to not be set until SAS reads another observation from that dataset. You might want to also add a CALL MISSING statement to force the variables to missing as they will also retain their values. You do not need that in this program for the variables coming from the STORES dataset, but you might need it for the variables coming from the MODELIDS dataset if it is possbile to have more newstores than modelids for a group.

art297 · Posted 09-10-2011 05:17 PM

I noticed that the code you said worked didn't actually provide what you originally said was the desired result, thus I presume that it doesn't make any difference which modelid is selected second, as long as the newstore and modelid aren't the same as the first one.

If that is correct, couldn't you just use some simple code like:

data g_s_m;

input Group Newstore Modelid;

datalines;

10549 6453 5582

10549 6453 5217

10549 6453 2871

10549 2001 5582

10549 2001 5217

10549 2001 2871

10548 9648 6453

10548 9648 3251

10548 9648 2205

10548 9453 6453

10548 9453 3251

10548 9453 2205

;

data want (keep=group newstore modelid);

set g_s_m;

by group newstore notsorted;

retain first_newstore first_modelid;

if first.group and first.newstore then do;

n=1;

first_newstore=newstore;

first_modelid=modelid;

output;

end;

else if newstore ne first_newstore and n eq 1 then do;

if modelid ne first_modelid then do;

n+1;

output;

end;

run;

Of course, if you data aren't already grouped by group and, within each group grouped by newstore, you would first have to sort the data. E.g., using something like:

proc sort data=g_s_m;

by group newstore;

run;

Sivanandam · Posted 09-11-2011 01:15 AM

HI All,

Thanks for all your response.It's very usefull.

am currently working with various input scenerios.Once it complete let you know.

Thanks,

Siva.

Ksharp · Posted 09-13-2011 10:43 PM

Sorry ,so late to reponse you.

Tom has already gave you an answer, My code is another way if you would like.

"it varies based up on group number."

So I assume that modelid is different from each group.

data have ;
  input Group Newstore Modelid ;
cards;
10549 6453 5582 
10549 6453 5217 
10549 6453 2871 
10549 2001 5582 
10549 2001 5217 
10549 2001 2871
10548 9648 6453 
10548 9648 3251 
10548 9648 2205 
10548 9453 6453 
10548 9453 3251 
10548 9453 2205 
run;
proc sort data=have(keep=group modelid) out=modelids nodupkey;
  by group modelid;
run;
proc sort data=have(keep=group newstore) out=newstore nodupkey;
 by group newstore;
run;
data result(where=(newstore is not missing));
 merge newstore modelids;
 by group; output;
 call missing(newstore);
run;

Ksharp

Re: Fetching records with unique value