Hello,
Please help.
I have this dataset:
data have;
input ID DCDate Direction $ ICode1 $ ICode2 $ ;
cards;
1 2 South Q22.x Q22.y
1 2 South Q22.x .
1 2 South Q22.y .
1 2 South Q23.x .
2 2 South Q22.x Q22.y
2 2 South Q22.x .
2 2 South Q22.y .
2 2 South Q23.x .
;
I want this output:
ID | Date | ICode1 | Icode2 |
1 | 2 | Q22.x | Q22.y |
1 | 2 | Q23.x | |
2 | 2 | Q22.x | Q22.y |
2 | 2 | Q23.x |
Transpose and sort to eliminate dups
data tall;
set have;
array icode[10] $8 ;
do i=1 to dim(icode);
if not missing(icode[i]) then do;
code=icode[i];
output;
end;
end;
keep id dcdate code ;
run;
proc sort data=tall nodupkey;
by id dcdate code;
run;
then transpose back if you are worried you will have more observations for ID than the number of ICODE variables you want to create (perhaps you want to make a dataset just for printing) then here is a method you can use to roll the tall database back into wide, but with limit on how wide.
data want ;
do i=1 to 2 until(last.dcdate);
set tall;
by id dcdate ;
array icode[2] $8 ;
icode[i] = code;
end;
drop i code;
run;
Transpose to long, then sort:
data long (rename=(dcdate=date));
set have;
array ic {*} icode:;
do i = 1 to dim(ic);
if not missing (ic{i}
then do;
icode = ic{i};
output;
enđ;
end;
keep id dcdate icode;
run;
proc sort data=long nodupkey;
by id date icode;
run;
Sorry,
I would like the following output
ID | DCDate | Direction | ICode1 | Icode2 |
1 | 2 | South | Q22.x | Q22.y |
1 | 2 | South | Q23.x | |
2 | 2 | South | Q22.x | Q22.y |
2 | 2 | South | Q23.x |
Keeping only the unique ICODEs for each ID group.
Why do you want only two icode columns?
ICode1 and ICode2, represent ICode to the nth in the original dataset.
I could drop the new ICODE in the new dataset I guess.
In reality, I have up to ICode10 with not all of them being filled.
I really just want for the group of IDs to capture only the unique ICode values.
Transpose and sort to eliminate dups
data tall;
set have;
array icode[10] $8 ;
do i=1 to dim(icode);
if not missing(icode[i]) then do;
code=icode[i];
output;
end;
end;
keep id dcdate code ;
run;
proc sort data=tall nodupkey;
by id dcdate code;
run;
then transpose back if you are worried you will have more observations for ID than the number of ICODE variables you want to create (perhaps you want to make a dataset just for printing) then here is a method you can use to roll the tall database back into wide, but with limit on how wide.
data want ;
do i=1 to 2 until(last.dcdate);
set tall;
by id dcdate ;
array icode[2] $8 ;
icode[i] = code;
end;
drop i code;
run;
You could also do it with hash objects in a single data step (assuming data already sorted by ID).
data want;
set have;
by id dcdate;
array icode[2] $8 ;
if _n_=1 then do;
declare hash h();
h.definekey('code');
h.definedata('code');
h.definedone();
end;
do i=1 to dim(icode);
code = icode[i];
if not missing(code) then rc=h.add();
end;
if last.id then do;
call missing(of code icode[*]);
declare hiter iter('h');
rc = iter.first();
do until (rc);
do i=1 to dim(icode) until (rc);
icode[i]=code;
rc=iter.next();
end;
output;
call missing(of icode[*]);
end;
rc = iter.delete();
rc = h.clear();
end;
drop rc i code;
run;
You need to explain more about what you want. Especially since the examples are so simple.
What is the role of ID and DCDATE? Do you want unique codes per ID? Or per ID and DCDATE combination? In your example there is no difference since DCDATE is a constant.
Why was DIRECTION included in the example data? It does not appear to be used at all?
What is the meaning of ICODE1 and ICODE2? What do you want to do if the same code appears in both variables for the same group? Which column should keep the value? Why are you keeping Q22.y in ICODE2 and not in ICODE1? What is the rule for deciding where it belongs?
Sorry,
ID is a unique Identifier.
DC date is the same because all codes happened on the same date
Think of Direction as a person's last name so it would be the same but unique to that person, as ID is unique to that person. What I really want to make sure of is that even if someone has the same last name they won't have the same ID since ID is unique and not names.
If in the group of the person's ID's there is a unique ICode then I want to capture it. Otherwise I don't wan it.
data want;
set have;
by ID DCDate Direction ICode1 descending icode2;
retain icode;
if first.id then icode=icode2;
if not first.icode1 or icode1=icode then delete;
drop icode;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.