BookmarkSubscribeRSS Feed
kmin87
Fluorite | Level 6

So for example,

there is one data base in SAS library that someone created

exampleA.incorrect

and in this data I have rows of data with various columns. Let's say one column A for this entry has a value of "AT" in character. I find out that this is incorrect value. This data was inputted in 2017 and I have another Column titled "AsofDate" with the numeric value of Mar2017. So That's the month/year when it was inputted and it is incorrect. It has another column with the person's social security number

 

I have another data base in SAS library that has a monthly history of the same data above (But this is the most correct one) and the same columns as above

exampleA.correctJan2017 (that same row above but the Column A has a correct value of "AB")

exampleA.correctFeb2017

exampleA.correctMar2017

exampleA.correctApr2017

exampleA.correctMay2017

...it goes all the way to 2021

 

This library database is the snapshot of the data at the last day of that month. This historical databases are the most correct data.

how do i create a new data set in SAS to find the correct value in the historical databases in the library and make a NEW data set with the correct value of "AB"

 

I think I have to loop through all this historical data bases to find that individual ssn and if there is an mismatch of Column A from exampleA.incorrect and Column a in exampleA.correctJan2017  then making sure column A has the most correct value. Would you assist? I don't know where to begin.

 

 

4 REPLIES 4
ballardw
Super User

If the "correct" data sets are of the same structure you may, I say may, be able to combine them and then use that to update the main data set.

But I suggest providing some example data. There are a number of ways to update data sets but the content of both the base date and the updating data sets is important. Such as is the base data have only one record per combination of identification variables.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.

kmin87
Fluorite | Level 6

i used the attached excels as an example in my library. But basically I have the incorrect_master file with incorrect entries in column A, as highlighted in A. I need to loop through a hisotrical of databases named lib_yearmonth to find the right entry value. If there is a blank in the lib_202110 file, I need to go to the previous month to find it until I get the correct entry, I find the previous month with that correct entry. Once I find the correct entries, I want to create a new data set from the incorrect_masterfile with the correct entries associated with the column A.

I did this so far.Assume that these excel files has already been imported or in the library.

I need to make sure the entry date in the incorrect_master file associates with correct yearmonth of the historical data files or libraries too. For example if I input the entry in november 2021, I want to make sure  I loop through the historical files in 2021 until I find the correct value. IF they values are correct then I dont need to update or replace my incorrect_master column a entries.

 

 

Tom
Super User Tom
Super User

Why go to the trouble of creating and attaching a file to post a handful of numbers?  You can just post/type them into your question.  Use one of the Insert Code buttons to make a box that will not re-flow the text.

Name EmployeeID Color Column Asofdate
Mike, Sith 1234 Brown A 20210801
Am,Tem1 235 White B 20210801
Tammy,Yea 1236 yello V 20210801

So please show an example of the data that needs to be updated.  What is the KEY to this dataset?  That is one or more variables that uniquely identify the observations?

 

Include an example observation where there is a missing value that needs to be looked for in the historical data.

If you want find the first non-missing value of a variable from multiple datasets then a simple data step with a SET and BY should do the job.  Just set the data in reverse time order and take the first non-missing value. 

data old_color;
  set old_2021_09-old_2021_01 old_2020_12-old_2020_01 ;
  by EmployeeId ;
  where not missing(color);
  if first.EmployeeId;
  keep EmployeeId color;
run;
kmin87
Fluorite | Level 6

There are numerous data set. It has a month and a year from 2010JAN through 2021SEP. I have to go through a loop to each data bases to find the right value per my original data set to make sure I have the correct value in my original data set. But I need to make the criteria of the data in the original date to correspond to closest data bases above. For example in my original data set, there is an entry where in Dec2017 this person had a value of a "A" for a particular column. But I need to double check if that A is correct and go through the historical databases 2010JAN through 21SEP to see if its truly a but really 2017JAN to 2017DEC since that data was inputted in Dec 2017. If the value is A, then that is correct. However, if if in the 2017MAR, I find out it's different I need to replace the A with the correct value found in 2017MAR. Some of the databases might have a blank. For example 2017MAR may have a blank but in 2017FEB databases there is the correct value or that "A"

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 888 views
  • 0 likes
  • 3 in conversation