DATA Step, Macro, Functions and more

Labor Day Weekend Puzzle

Accepted Solution Solved
Reply
PROC Star
Posts: 7,363
Accepted Solution

Labor Day Weekend Puzzle

suggested the answer to this puzzle earlier today on SAS-L. Since I never thought of doing what Matt suggested in the more than 40 years I have been using SAS, I thought it might be useful to see other's responses.

You have the following dataset:

data have;

  input a b c $ Zip d e f;

  cards;

1 2 x 12345 6 7 8

2 3 y 54321 7 8 9

;

You want a dataset called 'want' that only contains a b c d e and zip AND in that specific order.

What would be the easiest (i.e., least code and quickest running) way of accomplishing the task.

P.S. If you've Matt or have already seen Matt's SAS-L post, don't respond!


Accepted Solutions
Solution
‎08-29-2014 10:14 PM
Respected Advisor
Posts: 4,651

Re: Labor Day Weekend Puzzle

All right then, how about:

data want;

set have(keep=a--c d--e);

set have(keep=zip);

run;

Not quite as efficient, but more generalizable.

PG

PG

View solution in original post


All Replies
Respected Advisor
Posts: 4,651

Re: Labor Day Weekend Puzzle

data want(drop=zip rename=f=zip);

set have;

f=zip;

run;

PG
PROC Star
Posts: 7,363

Re: Labor Day Weekend Puzzle

: Impressive! But FriedEgg's solution was more generalizable and didn't require having a variable available with the same case.

I won't post his solution just yet.

Solution
‎08-29-2014 10:14 PM
Respected Advisor
Posts: 4,651

Re: Labor Day Weekend Puzzle

All right then, how about:

data want;

set have(keep=a--c d--e);

set have(keep=zip);

run;

Not quite as efficient, but more generalizable.

PG

PG
PROC Star
Posts: 7,363

Re: Labor Day Weekend Puzzle

: Close enough! FriedEgg's solution was an efficient way of doing the same thing:

data want;

  if 0 then set have (drop=zip keep=a--e);

  set have (keep=a--e);

run;

Respected Advisor
Posts: 4,651

Re: Labor Day Weekend Puzzle

Thanks Art! I agree, FriedEgg's solution is more efficient. - PG

PG
Trusted Advisor
Posts: 1,204

Re: Labor Day Weekend Puzzle

My try

data want;

retain a b c d e zip;

set have(drop=f);

run;

PROC Star
Posts: 7,363

Re: Labor Day Weekend Puzzle

stat@sas: Yes, that would definitely work, but imagine if a thru e actually represented about 500 variables. However, good suggestion!

Super User
Posts: 9,682

Re: Labor Day Weekend Puzzle

Arthur.T ,

I think the fastest way is to create a view for that code ,and not to copy real table again. Smiley Happy

Xia Keshan

PROC Star
Posts: 7,363

Re: Labor Day Weekend Puzzle

: Possibly. Compare the two on small, medium and large files and let us know.

Super Contributor
Posts: 376

Re: Labor Day Weekend Puzzle

Here's my/yet another approach:

* problem as stated ;

data have;

  input a b c $ Zip d e f;

  cards;

1 2 x 12345 6 7 8

2 3 y 54321 7 8 9

;

run;

%let vars=a b c d e zip;

data want1;

   format &vars;

   set have (keep=&vars);

run;

* more realistic example ;

data have;

  input b c $ f Zip d a e;

  cards;

2 x 8 12345 6 1 7

3 y 9 54321 7 2 8

;

run;

%let vars=a b c d e zip;

data want2;

   format &vars;

   set have (keep=&vars);

run;

proc compare base=want1 compare=want2;

run;

* last approach, use a "skeleton" dataset as metadata ;

data skel;

  length a b 8 c $8 d e Zip 8* these must be in the desired order for the target dataset ;

  call missing(of _all_);       * could also add formats, labels, etc. ;       

  stop;

run;

proc contents data=skel out=columns (keep=name varnum) noprint;

run;

proc sql noprint;

  select name into :vars separated by " " from columns order by varnum;

quit;

data want3;

* if any of your metadata columns are derived columns (i.e. not in your source dataset) ;

* remember the implied retain on dataset variables, or this can come back to bite you! ;

* IOW if skel contains a variable "foo" in order to set its PDV order, ;

* and "foo" is calculated in the data step, remember that it is retained in the data step. ;

  if 0 then set skel;

  set have (keep=&vars);

run;

proc compare base=want1 compare=want3;

run;

For the problem as stated, where your source data has the PDV in almost the exact order you want in the target dataset, approaches such as variable lists may use less code.  However, they are prone to undesired results if the source data structure changes.

In a perhaps more realistic example where the source data has its PDV in an order unrelated to the desired target dataset, you're gonna have to list the variables out.  If I had 500 variables, I'd proc print the data from proc contents output, reorder the variables in an editor, and create my program.

The last example is yet another approach.  Not the least amount of code, but an approach I've sometimes used in the past.

HTH,

Scott

Added comments about implied retain of dataset variables

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 347 views
  • 9 likes
  • 5 in conversation