Mastering PROC SQL: Essential Statements for Data Management


1.Joining Tables

Example datasets:

data sales; input id product $ amount; datalines; 1 A 100 2 B 150 3 C 200 ; run; data customer; input id name $; datalines; 1 John 2 Mary 4 Alex ; run;

INNER JOIN (only matches)

proc sql; select a.id, a.product, a.amount, b.name from sales as a inner join customer as b on a.id = b.id; quit;

LEFT JOIN (all sales, add customer if match)

proc sql; select a.id, a.product, a.amount, b.name from sales as a left join customer as b on a.id = b.id; quit;

FULL JOIN (everything)

proc sql; select a.id, a.product, a.amount, b.name from sales as a full join customer as b on a.id = b.id; quit;



2. Subqueries

proc sql; select * from example where value = (select max(value) from example); quit;

🔹 Finds rows matching a nested query (here: the max value).



3. ON (JOIN condition)

General Syntax

proc sql; select a.col1, b.col2 from tableA as a join tableB as b on a.key = b.key; quit;
  • tableA and tableB are datasets.

  • a.key = b.key is the join condition.

  • Without an ON clause, SAS would try to join all rows with all rows (Cartesian product), which is usually not what you want.


Example: Match on id

data sales; input id product $ amount; datalines; 1 A 100 2 B 150 3 C 200 ; run; data customer; input id name $; datalines; 1 John 2 Mary 4 Alex ; run; proc sql; select a.id, a.product, a.amount, b.name from sales as a left join customer as b on a.id = b.id; quit;

👉 ON a.id = b.id ensures that rows are joined where id matches between datasets.


You can use more complex ON conditions

  1. Multiple keys

on a.id = b.id and a.date = b.date
  1. Inequality joins

on a.amount > b.threshold
  1. Not equal join

on a.category ^= b.category



Example 1: 

 proc sql;

create table va4str as

select distinct usubjid, adt, ady, visit, visitnum, avisit, avisitn, aval as va4str label='xxx'

from adam.adeff

where valid='xxx' and paramcd='xxx'

order by usubjid, avisitn, avisit, adt, ady;

run;




Example 2:

proc sql;

create table xxx as 

select distinct studyid, usubjid

from eff_v4sc2

where vsn>0;

run;


proc sql;

create table eff_v4sc4 as

select * from eff_v4sc2

where usubjid in (select distinct usubjid from xxx);                                                                                                     

run;




Example 3:

proc sql;

create table ec_eff2 as

select a.*, b.dy as indexady, b.visit as b_v, b.visitnum as b_vn, b.visage as agevis

from ec_eff as a right join indexvs2 as b

on a.studyid=b.studyid and a.usubjid=b.usubjid and a.ady>=b.dy

order by studyid, usubjid, ady;

quit;

Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden

Four essential statistical functions for simulation in SAS