BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rush_milo
Obsidian | Level 7

Dear SAS communities,

 

Below is the 'Have' table and I would like to dedup it into the 'Want' table as below. The end goal is to have a smaller 'Have' table since the current one has multiple observations with different month/date. 

 

Have Remark
/sas/table_hist_m01.sas7bdat Keep, but rename _m01 to mXX
/sas/table_hist_m02.sas7bdat Remove
/sas/table_hist_m03.sas7bdat Remove
/sas/table_hist_m04.sas7bdat Remove
/sas/ww0.sas7bdat Keep, no change due to single table
/sas/table_new_20210301.sas7bdat Keep, but rename _20210301 to _YYYYMMDD
/sas/table_new_20210302.sas7bdat Remove
/sas/table_new_20210303.sas7bdat Remove
/sas/table_new_20210304.sas7bdat Remove
   
Want  
/sas/table_hist_mXX.sas7bdat  
/sas/ww0.sas7bdat  
/sas/table_new_YYYYMMDD.sas7bdat  


May I ask if there's an efficient way of doing this in SAS? Appreciate your kind advice. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Another way

data WANT;
  set HAVE(keep=PATH);
  PATH = prxchange('s/_m\d\d(\.sas7bdat)/_mXX\1/'     , 1, PATH);
  PATH = prxchange('s/_\d{8}(\.sas7bdat)/_YYYYMMDD\1/', 1, PATH);
  if PATH ne lag(PATH); 
  ID+1;
run;
PATH ID
/sas/table_hist_mXX.sas7bdat 1
/sas/ww0.sas7bdat 2
/sas/table_new_YYYYMMDD.sas7bdat 3

 

View solution in original post

12 REPLIES 12
tarheel13
Rhodochrosite | Level 12

Can you please post your data as datalines? It looks like prxmatch might work. 

rush_milo
Obsidian | Level 7

Hi @tarheel13 

 

Sorry here's the datalines:

 

 

DATA HAVE;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50. REMARK & $50. ;
	  DATALINES;
  1, "/sas/table_hist_m01.sas7bdat", "Keep but rename _m01 to mXX"
  2, "/sas/table_hist_m02.sas7bdat", "Remove"
  3, "/sas/table_hist_m03.sas7bdat", "Remove"
  4, "/sas/table_hist_m04.sas7bdat", "Remove"
  5, "/sas/ww0.sas7bdat", "Keep, no change due to single table"
  6, "/sas/table_new_20210301.sas7bdat", "Keep, but rename _20210301 to _YYYYMMDD"
  7, "/sas/table_new_20210302.sas7bdat", "Remove"
  8, "/sas/table_new_20210303.sas7bdat", "Remove"
  9, "/sas/table_new_20210304.sas7bdat", "Remove"
RUN;
DATA WANT;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50.;
	  DATALINES;
  1, "/sas/table_hist_m01.sas7bdat"
  2, "/sas/ww0.sas7bdat"
  3, "/sas/table_new_YYYYMMDD.sas7bdat"
RUN;

 

 

 

rush_milo
Obsidian | Level 7

Sorry @tarheel13 the 'Want' datalines should be as below. Corrected m01 to mXX.

DATA WANT;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50.;
	  DATALINES;
  1, "/sas/table_hist_mXX.sas7bdat"
  2, "/sas/ww0.sas7bdat"
  3, "/sas/table_new_YYYYMMDD.sas7bdat"
RUN;
SASKiwi
PROC Star

So where does the title come into what you want? Do you want to concatenate the rows in m01 - m04 into dataset mxx for example?

rush_milo
Obsidian | Level 7

Hi @SASKiwi,

Sorry for not including the datalines. Basically I want the 'Path' column to be deduped into a single line only. That is from:

  • m01, m02, m03  --> To: mXX
  • /sas/table_new_20210301.sas7bdat, /sas/table_new_20210302.sas7bdat --> /sas/table_new_YYYYMMDD.sas7bdat

Below are the datalines for reference:

 

DATA HAVE;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50. REMARK & $50. ;
	  DATALINES;
  1, "/sas/table_hist_m01.sas7bdat", "Keep but rename _m01 to mXX"
  2, "/sas/table_hist_m02.sas7bdat", "Remove"
  3, "/sas/table_hist_m03.sas7bdat", "Remove"
  4, "/sas/table_hist_m04.sas7bdat", "Remove"
  5, "/sas/ww0.sas7bdat", "Keep, no change due to single table"
  6, "/sas/table_new_20210301.sas7bdat", "Keep, but rename _20210301 to _YYYYMMDD"
  7, "/sas/table_new_20210302.sas7bdat", "Remove"
  8, "/sas/table_new_20210303.sas7bdat", "Remove"
  9, "/sas/table_new_20210304.sas7bdat", "Remove"
RUN;
DATA WANT;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50.;
	  DATALINES;
  1, "/sas/table_hist_mXX.sas7bdat"
  2, "/sas/ww0.sas7bdat"
  3, "/sas/table_new_YYYYMMDD.sas7bdat"
RUN;

 

 

andreas_lds
Jade | Level 19

The following code could have/cause some issues, if the regular expression match wrong parts of Path.

data work.replaced;
   set have(drop= Remark);
   
   Path = prxchange('s/_m\d\d/_mXX/', 1, Path);
   Path = prxchange('s/_\d{8}/_YYYYMMDD/', 1, Path);   
run;


proc sort data=work.replaced out=work.nodups nodupkey;
   by Path;
run;
ChrisNZ
Tourmaline | Level 20

Another way

data WANT;
  set HAVE(keep=PATH);
  PATH = prxchange('s/_m\d\d(\.sas7bdat)/_mXX\1/'     , 1, PATH);
  PATH = prxchange('s/_\d{8}(\.sas7bdat)/_YYYYMMDD\1/', 1, PATH);
  if PATH ne lag(PATH); 
  ID+1;
run;
PATH ID
/sas/table_hist_mXX.sas7bdat 1
/sas/ww0.sas7bdat 2
/sas/table_new_YYYYMMDD.sas7bdat 3

 

rush_milo
Obsidian | Level 7

Hi @ChrisNZ,

Your code works great! Thank you so much.

 

However, I noticed that it doesn't seem to work for observations that are not in sequence. Kindly refer to my 'Have' datalines below for id 3 - 8. The sequence is sas7bdat, followed by sas7bvew. I would like to keep one each for both and remove the rest as per my 'Want'. 

 

 

DATA HAVE;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50. REMARK & $50. ;
	  DATALINES;
  1, "/sas/table_new_dummy.sas7bdat", "Dummy"
  2, "/sas/table_old_dummy.sas7bdat", "Dummy"
  3, "/sas/table_new_20210301.sas7bdat", "Keep"
  4, "/sas/table_new_20210301.sas7bvew", "Keep"
  5, "/sas/table_new_20210302.sas7bdat", "Remove"
  6, "/sas/table_new_20210302.sas7bvew", "Remove"
  7, "/sas/table_new_20210303.sas7bdat", "Remove"
  8, "/sas/table_new_20210303.sas7bvew", "Remove"
RUN;

DATA WANT;
	  INFILE DATALINES DELIMITER=',' DSD;
	  INPUT id PATH & $50.;
	  DATALINES;
  1, "/sas/table_new_dummy.sas7bdat"
  2, "/sas/table_old_dummy.sas7bdat"
  3, "/sas/table_new_YYYYMMDD.sas7bdat"
  4, "/sas/table_new_YYYYMMDD.sas7bvew"
RUN;

 Really appreciate your advice. Thanks!

 

rush_milo
Obsidian | Level 7

I believe I can solve this by using nodups as per the code sample by @andreas_lds right? Just wondering if there's a more sophisticated way..

ChrisNZ
Tourmaline | Level 20
If your data is not in sequence, you need to use proc sort.
rush_milo
Obsidian | Level 7

Not really. It could be at the front as well. Either at the very front or at the end as per my datalines. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 744 views
  • 3 likes
  • 6 in conversation