Mastering PROC SQL: Essential Statements for Data Management


1.Joining Tables

Example datasets:

data sales; input id product $ amount; datalines; 1 A 100 2 B 150 3 C 200 ; run; data customer; input id name $; datalines; 1 John 2 Mary 4 Alex ; run;

INNER JOIN (only matches)

proc sql; select a.id, a.product, a.amount, b.name from sales as a inner join customer as b on a.id = b.id; quit;

LEFT JOIN (all sales, add customer if match)

proc sql; select a.id, a.product, a.amount, b.name from sales as a left join customer as b on a.id = b.id; quit;

FULL JOIN (everything)

proc sql; select a.id, a.product, a.amount, b.name from sales as a full join customer as b on a.id = b.id; quit;



2. Subqueries

proc sql; select * from example where value = (select max(value) from example); quit;

🔹 Finds rows matching a nested query (here: the max value).



3. ON (JOIN condition)

General Syntax

proc sql; select a.col1, b.col2 from tableA as a join tableB as b on a.key = b.key; quit;
  • tableA and tableB are datasets.

  • a.key = b.key is the join condition.

  • Without an ON clause, SAS would try to join all rows with all rows (Cartesian product), which is usually not what you want.


Example: Match on id

data sales; input id product $ amount; datalines; 1 A 100 2 B 150 3 C 200 ; run; data customer; input id name $; datalines; 1 John 2 Mary 4 Alex ; run; proc sql; select a.id, a.product, a.amount, b.name from sales as a left join customer as b on a.id = b.id; quit;

👉 ON a.id = b.id ensures that rows are joined where id matches between datasets.


You can use more complex ON conditions

  1. Multiple keys

on a.id = b.id and a.date = b.date
  1. Inequality joins

on a.amount > b.threshold
  1. Not equal join

on a.category ^= b.category



Example 1: 

 proc sql;

create table va4str as

select distinct usubjid, adt, ady, visit, visitnum, avisit, avisitn, aval as va4str label='xxx'

from adam.adeff

where valid='xxx' and paramcd='xxx'

order by usubjid, avisitn, avisit, adt, ady;

run;




Example 2:

proc sql;

create table xxx as 

select distinct studyid, usubjid

from eff_v4sc2

where vsn>0;

run;


proc sql;

create table eff_v4sc4 as

select * from eff_v4sc2

where usubjid in (select distinct usubjid from xxx);                                                                                                     

run;




Example 3:

proc sql;

create table ec_eff2 as

select a.*, b.dy as indexady, b.visit as b_v, b.visitnum as b_vn, b.visage as agevis

from ec_eff as a right join indexvs2 as b

on a.studyid=b.studyid and a.usubjid=b.usubjid and a.ady>=b.dy

order by studyid, usubjid, ady;

quit;

Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS (1)

Understanding Binding vs. Non-Binding Futility Analysis in Clinical Trials

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden