- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Please help.
I have this dataset:
data have;
input ID DCDate Direction $ ICode1 $ ICode2 $ ;
cards;
1 2 South Q22.x Q22.y
1 2 South Q22.x .
1 2 South Q22.y .
1 2 South Q23.x .
2 2 South Q22.x Q22.y
2 2 South Q22.x .
2 2 South Q22.y .
2 2 South Q23.x .
;
I want this output:
ID | Date | ICode1 | Icode2 |
1 | 2 | Q22.x | Q22.y |
1 | 2 | Q23.x | |
2 | 2 | Q22.x | Q22.y |
2 | 2 | Q23.x |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Transpose and sort to eliminate dups
data tall;
set have;
array icode[10] $8 ;
do i=1 to dim(icode);
if not missing(icode[i]) then do;
code=icode[i];
output;
end;
end;
keep id dcdate code ;
run;
proc sort data=tall nodupkey;
by id dcdate code;
run;
then transpose back if you are worried you will have more observations for ID than the number of ICODE variables you want to create (perhaps you want to make a dataset just for printing) then here is a method you can use to roll the tall database back into wide, but with limit on how wide.
data want ;
do i=1 to 2 until(last.dcdate);
set tall;
by id dcdate ;
array icode[2] $8 ;
icode[i] = code;
end;
drop i code;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Transpose to long, then sort:
data long (rename=(dcdate=date));
set have;
array ic {*} icode:;
do i = 1 to dim(ic);
if not missing (ic{i}
then do;
icode = ic{i};
output;
enđ;
end;
keep id dcdate icode;
run;
proc sort data=long nodupkey;
by id date icode;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry,
I would like the following output
ID | DCDate | Direction | ICode1 | Icode2 |
1 | 2 | South | Q22.x | Q22.y |
1 | 2 | South | Q23.x | |
2 | 2 | South | Q22.x | Q22.y |
2 | 2 | South | Q23.x |
Keeping only the unique ICODEs for each ID group.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why do you want only two icode columns?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ICode1 and ICode2, represent ICode to the nth in the original dataset.
I could drop the new ICODE in the new dataset I guess.
In reality, I have up to ICode10 with not all of them being filled.
I really just want for the group of IDs to capture only the unique ICode values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Transpose and sort to eliminate dups
data tall;
set have;
array icode[10] $8 ;
do i=1 to dim(icode);
if not missing(icode[i]) then do;
code=icode[i];
output;
end;
end;
keep id dcdate code ;
run;
proc sort data=tall nodupkey;
by id dcdate code;
run;
then transpose back if you are worried you will have more observations for ID than the number of ICODE variables you want to create (perhaps you want to make a dataset just for printing) then here is a method you can use to roll the tall database back into wide, but with limit on how wide.
data want ;
do i=1 to 2 until(last.dcdate);
set tall;
by id dcdate ;
array icode[2] $8 ;
icode[i] = code;
end;
drop i code;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could also do it with hash objects in a single data step (assuming data already sorted by ID).
data want;
set have;
by id dcdate;
array icode[2] $8 ;
if _n_=1 then do;
declare hash h();
h.definekey('code');
h.definedata('code');
h.definedone();
end;
do i=1 to dim(icode);
code = icode[i];
if not missing(code) then rc=h.add();
end;
if last.id then do;
call missing(of code icode[*]);
declare hiter iter('h');
rc = iter.first();
do until (rc);
do i=1 to dim(icode) until (rc);
icode[i]=code;
rc=iter.next();
end;
output;
call missing(of icode[*]);
end;
rc = iter.delete();
rc = h.clear();
end;
drop rc i code;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need to explain more about what you want. Especially since the examples are so simple.
What is the role of ID and DCDATE? Do you want unique codes per ID? Or per ID and DCDATE combination? In your example there is no difference since DCDATE is a constant.
Why was DIRECTION included in the example data? It does not appear to be used at all?
What is the meaning of ICODE1 and ICODE2? What do you want to do if the same code appears in both variables for the same group? Which column should keep the value? Why are you keeping Q22.y in ICODE2 and not in ICODE1? What is the rule for deciding where it belongs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry,
ID is a unique Identifier.
DC date is the same because all codes happened on the same date
Think of Direction as a person's last name so it would be the same but unique to that person, as ID is unique to that person. What I really want to make sure of is that even if someone has the same last name they won't have the same ID since ID is unique and not names.
If in the group of the person's ID's there is a unique ICode then I want to capture it. Otherwise I don't wan it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
by ID DCDate Direction ICode1 descending icode2;
retain icode;
if first.id then icode=icode2;
if not first.icode1 or icode1=icode then delete;
drop icode;
run;