BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ayin
Quartz | Level 8

Have:

dataset named 'colList':

 

ColNeed
Col_1
Col_2
Col_3

dataset named 'sample':

Drop 1 Col_2 Col_1 Nope_3
a      2     1      b

Want:

Col_1 Col_2 Col_3
1     2     .

Bascially, in 'sample', drop columns that are not listed in 'colList', add additional columns that listed in 'colList' but not already in 'sample'. Finally, maintain the same order as in 'colList' (from top to bottom).

 

Question:

Currently struggled to find the optimal codes, though it seems there might be multiple approaches.

 

Note:

To generate the data needed, please use:

data colList;
	input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;

data sample;
	input 'Drop 1'n $ Col_2 Col_1 Nope_3 $;
cards;
a 2 1 b
;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

First, thank you for asking what appeared to be a simple problem, but required quite a bit of thought to come up with a working solution.

 

As mentioned in my previous post I had thought that you might be confronting a discrepancy between how SAS and WPS function, which there is (though minor in this case), but neither will output variables unless they are explicitly initialized. Interestingly, though, both will accept missing values as an acceptable initialization value.

 

As such, try the following code. It appears to work correctly on both SAS and WPS and, more importantly, solves your current problem:

 

options validvarname=any;

data colList;
	input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;

data sample;
	input 'Drop 1'n $ Col_2 Col_1 $ Nope_3 $;
cards;
a 2 1 b
;
run;

proc sql noprint;
  select ColNeed,
    case type
      when 'char' then catx(' ','retain',ColNeed,"' ';")
      else catx(' ','retain',ColNeed,".;")
    end
      into :keeps separated by ' ',
           :cols separated by ' '
        from colList a
          full join (select name,type from dictionary.columns
            where libname='WORK' and
                  memname='SAMPLE') b
              on a.ColNeed=b.name
                where ColNeed is not NULL
  ;
quit;

data want (keep=&keeps.);
  &cols.
  set sample;
run;

Art, CEO, AnalystFinder.com

 

 

View solution in original post

14 REPLIES 14
Reeza
Super User

This is a variation of your earlier problem. The solution is essentially the same. 

 

You can either use a PROC SQL to order vars or look at using a Length or retain in a datastep to maintain order. 

art297
Opal | Level 21

I would do it as follows:

 

proc sql noprint;
  select ColNeed into: cols separated by ' '
    from colList
  ;
quit;

data want (keep=&cols.);
  retain &cols.;
  set sample;
run;

Art, CEO, AnalystFinder.com

 

ayin
Quartz | Level 8

Hi Art,
Used exactly the same codes, but seems it's not picking up Col_3.

Codes:

data colList;
	input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;

data sample;
	input 'Drop 1'n $ Col_2 Col_1 Nope_3 $;
cards;
a 2 1 b
;
run;

proc sql noprint;
  select ColNeed into: cols separated by ' '
    from colList
  ;
quit;

data want (keep=&cols.); /* this part has some problems */
  retain &cols.;
  set sample;
run;

Logs:

45         data want (keep=&cols.);
46           retain &cols.;
47           set sample;
48         run;

NOTE: Variable Col_3 is uninitialized.
WARNING: The variable Col_3 in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: There were 1 observations read from the data set WORK.SAMPLE.
NOTE: The data set WORK.WANT has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Thanks!

 

Reeza
Super User

Post the code and log please. 

ayin
Quartz | Level 8
Please note Col_1 and Col_2 are not necessarily all numeric.
Reeza
Super User

@ayin wrote:
Please note Col_1 and Col_2 are not necessarily all numeric.

I don't believe that has any bearing on the solutions. Post your code and log that you say does not work. 

ayin
Quartz | Level 8
Please see posts above.

Array, Length can't be used here because Col_1, Col_2, Col_3 are not necessarily the same type (numeric/characters)
Reeza
Super User

@ayin if you edit an earlier post as a response that doesn't mean we'll automatically notice this. It also makes it hard to make sense of the thread if things change.

 

You're correct that RETAIN won't work. Do you have any other way to indicate the variable type or a value that it can be initialized to if the variable is missing from a dataset.

 

The other standard way to accomplish this is to have a dummy dataset that has all the attributes and then you append the empty table with your 'have' dataset and it will have all the desired traits. But you require more information than you currently have. 

 

 

Ksharp
Super User



options validvarname=any;

data colList;
	input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;

data sample;
	input 'Drop 1'n $ Col_2 Col_1 Nope_3 $;
cards;
a 2 1 b
;
run;

%macro vname;
proc sql noprint;
  select ColNeed into: cols separated by ' '
    from colList;
    
  select count(*) into : n from  colList
   where upcase(ColNeed) not in (
    select upcase(name) as name from dictionary.columns
     where libname='WORK' and memname='SAMPLE');
     
%if &n ne 0 %then %do;
  select ColNeed into : name separated by ' ' from  colList
   where upcase(ColNeed) not in (
    select upcase(name) as name from dictionary.columns
     where libname='WORK' and memname='SAMPLE');
%end;
quit;
data want ;
  set sample;
%if &n ne 0 %then %do;
  retain &name. .;
%end;
  keep &cols.;
run;
%mend;

%vname


art297
Opal | Level 21

NOTE: The following post isn't correct, but I'm leaving it here to show the train of thought in trying to solve the OP's problem.

 

I don't know if this is or isn't the problem you're confronting, but a similar issue was just discussed on a different forum yesterday.

 

The behavior your getting is a current difference between the behaviors of SAS and WPS. Regardless of whether you're using SAS or WPS try the following:

 

proc sql noprint;
  select ColNeed into: cols separated by ' '
    from colList
  ;
quit;

data want (keep=&cols.);
  retain &cols. .;
  set sample;
run;

While SAS doesn't require it, WPS doesn't add a variable to its PDV unless it has been initialized or set to some value. Interestingly, missing is one of those values.

 

Art, CEO, AnalystFinder.com

 

Reeza
Super User

@art297 I had the same behaviour using SAS 9.4 M3

art297
Opal | Level 21

First, thank you for asking what appeared to be a simple problem, but required quite a bit of thought to come up with a working solution.

 

As mentioned in my previous post I had thought that you might be confronting a discrepancy between how SAS and WPS function, which there is (though minor in this case), but neither will output variables unless they are explicitly initialized. Interestingly, though, both will accept missing values as an acceptable initialization value.

 

As such, try the following code. It appears to work correctly on both SAS and WPS and, more importantly, solves your current problem:

 

options validvarname=any;

data colList;
	input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;

data sample;
	input 'Drop 1'n $ Col_2 Col_1 $ Nope_3 $;
cards;
a 2 1 b
;
run;

proc sql noprint;
  select ColNeed,
    case type
      when 'char' then catx(' ','retain',ColNeed,"' ';")
      else catx(' ','retain',ColNeed,".;")
    end
      into :keeps separated by ' ',
           :cols separated by ' '
        from colList a
          full join (select name,type from dictionary.columns
            where libname='WORK' and
                  memname='SAMPLE') b
              on a.ColNeed=b.name
                where ColNeed is not NULL
  ;
quit;

data want (keep=&keeps.);
  &cols.
  set sample;
run;

Art, CEO, AnalystFinder.com

 

 

ayin
Quartz | Level 8

Thank you @art297!

It worked for the sample data listed above. When I tried to implement it on real data, however, all character values are reduced to only have length of 1.

 

For example, if you test it on

options validvarname=any;

data colList;
	input ColNeed $;
cards;
Col_1
Col_2
Col_3
;
run;

data sample;
	infile datalines dlm=",";
	input 'Drop 1'n $ Col_2 Col_1 $ Nope_3 $;
datalines;
GOOD,3,UNIVERSITY,cc
;
run;

The final output is 

Col_1 Col_2 Col_3
U     3     .

All numeric/date values are dealt perfectly.

 

Please let me know how we can tweak your codes a bit and keep character values to its orginal length?

art297
Opal | Level 21

Just move the retain macro one after the set statement:

 

data want (keep=&keeps.);
  set sample;
  &cols.
run;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2137 views
  • 5 likes
  • 4 in conversation