Solved: Re: Labor Day Weekend Puzzle

art297 · Posted 08-29-2014 07:05 PM

suggested the answer to this puzzle earlier today on SAS-L. Since I never thought of doing what Matt suggested in the more than 40 years I have been using SAS, I thought it might be useful to see other's responses.

You have the following dataset:

data have;
  input a b c $ Zip d e f;
  cards; 
1 2 x 12345 6 7 8
2 3 y 54321 7 8 9
;

You want a dataset called 'want' that only contains a b c d e and zip AND in that specific order.

What would be the easiest (i.e., least code and quickest running) way of accomplishing the task.

P.S. If you've Matt or have already seen Matt's SAS-L post, don't respond!

PGStats · Posted 08-29-2014 10:14 PM

All right then, how about:

data want;

set have(keep=a--c d--e);

set have(keep=zip);

run;

Not quite as efficient, but more generalizable.

PG

View solution in original post

PGStats · Posted 08-29-2014 08:23 PM

data want(drop=zip rename=f=zip);

set have;

f=zip;

run;

PG

art297 · Posted 08-29-2014 08:47 PM

: Impressive! But FriedEgg's solution was more generalizable and didn't require having a variable available with the same case.

I won't post his solution just yet.

PGStats · Posted 08-29-2014 10:14 PM

All right then, how about:

data want;

set have(keep=a--c d--e);

set have(keep=zip);

run;

Not quite as efficient, but more generalizable.

PG

art297 · Posted 08-29-2014 10:44 PM

: Close enough! FriedEgg's solution was an efficient way of doing the same thing:

data want;
  if 0 then set have (drop=zip keep=a--e);
  set have (keep=a--e);
run;

PGStats · Posted 08-29-2014 10:54 PM

Thanks Art! I agree, FriedEgg's solution is more efficient. - PG

PG

stat_sas · Posted 08-29-2014 09:05 PM

My try

data want;

retain a b c d e zip;

set have(drop=f);

run;

art297 · Posted 08-29-2014 09:27 PM

stat@sas: Yes, that would definitely work, but imagine if a thru e actually represented about 500 variables. However, good suggestion!

Ksharp · Posted 08-30-2014 08:38 AM

Arthur.T ,

I think the fastest way is to create a view for that code ,and not to copy real table again.

Xia Keshan

art297 · Posted 08-30-2014 09:05 AM

: Possibly. Compare the two on small, medium and large files and let us know.

ScottBass · Posted 08-31-2014 09:52 PM

Here's my/yet another approach:

* problem as stated ;

data have;

input a b c $ Zip d e f;

cards;

1 2 x 12345 6 7 8

2 3 y 54321 7 8 9

;

run;

%let vars=a b c d e zip;

data want1;

format &vars;

set have (keep=&vars);

run;

* more realistic example ;

data have;

input b c $ f Zip d a e;

cards;

2 x 8 12345 6 1 7

3 y 9 54321 7 2 8

;

run;

%let vars=a b c d e zip;

data want2;

format &vars;

set have (keep=&vars);

run;

proc compare base=want1 compare=want2;

run;

* last approach, use a "skeleton" dataset as metadata ;

data skel;

length a b 8 c $8 d e Zip 8; * these must be in the desired order for the target dataset ;

call missing(of _all_); * could also add formats, labels, etc. ;

stop;

run;

proc contents data=skel out=columns (keep=name varnum) noprint;

run;

proc sql noprint;

select name into :vars separated by " " from columns order by varnum;

quit;

data want3;

* if any of your metadata columns are derived columns (i.e. not in your source dataset) ;

* remember the implied retain on dataset variables, or this can come back to bite you! ;

* IOW if skel contains a variable "foo" in order to set its PDV order, ;

* and "foo" is calculated in the data step, remember that it is retained in the data step. ;

if 0 then set skel;

set have (keep=&vars);

run;

proc compare base=want1 compare=want3;

run;

For the problem as stated, where your source data has the PDV in almost the exact order you want in the target dataset, approaches such as variable lists may use less code. However, they are prone to undesired results if the source data structure changes.

In a perhaps more realistic example where the source data has its PDV in an order unrelated to the desired target dataset, you're gonna have to list the variables out. If I had 500 variables, I'd proc print the data from proc contents output, reorder the variables in an editor, and create my program.

The last example is yet another approach. Not the least amount of code, but an approach I've sometimes used in the past.

HTH,

Scott

Added comments about implied retain of dataset variables

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

SAS Innovate 2025: Call for Content

Classroom Training Available!