BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimksas
Calcite | Level 5

hello all,

i have one big (4,550,750 obs) dataset (test) with 12 variables. I want every variable with one unique variable (test_var), what is the best way to do it?

I am using nodupkey options in proc sort, is it best way to do is? OR it's PROC SQL? how can we get all variables from SAS datasets with one unique variable using PROC SQL?

proc sort data=work.test nodupkey;

by test_var;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

Do you have duplicate records as well?

If not, does it matter which record you chose when you want a unique record?

If your data is pre-sorted - use the technique suggested by https://communities.sas.com/people/SASKiwi. If not PROC SORT NODUPKEY is you options. There is no (good) equivalent in SQL - both syntax and performance-wise.

Data never sleeps

View solution in original post

7 REPLIES 7
kiranv_
Rhodochrosite | Level 12

hi

You can unique variable by doing a union with same table. I am not sure which is efficient. But can be done as follows

data abc;

input a $ b $;

datalines;

vijay ajay

vijay ajay

sam hello

;

run;

proc sql;

select * from abc

union

select * from abc

;

run;

kiranv_
Rhodochrosite | Level 12

proc sql;

select distinct * from table;

quit;

SASKiwi
PROC Star

If your data is already sorted by test_var then you don't need to re-sort it and that will save you a lot of processing time:

data test_nodup;

  set work.test;

  by test_var;

  if first.test_var;

run;

LinusH
Tourmaline | Level 20

Do you have duplicate records as well?

If not, does it matter which record you chose when you want a unique record?

If your data is pre-sorted - use the technique suggested by https://communities.sas.com/people/SASKiwi. If not PROC SORT NODUPKEY is you options. There is no (good) equivalent in SQL - both syntax and performance-wise.

Data never sleeps
jimksas
Calcite | Level 5

Yes LinusH, i have duplicate records for "test_var" variable and i don't want duplicate records...I want unique records from "test_var" variable and all/every possible value for all other variables...

Thanks!

Reeza
Super User

I think you need to post sample data for this, preferably have and want data sets in a data step.

nrose
Quartz | Level 8

If, for each group of 'test_var' you have a unique value (say 'test_var_id'), you can use this to remove duplicates using the having clause;

select * from table group by test_var having max(test_var_id) = test_var_id;

The group by clause groups by the 'test_var' variable, and the having clause picks only one record from this group. Again, this requires that there is some unique variable contained within 'test_var'.

The only reason I would try use this is if you wanted in database processing, but not all databases support the having clause in this way (e.g. SQL server). Otherwise, proc sort is your answer.

Nick

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 16473 views
  • 0 likes
  • 6 in conversation