Thursday, March 22, 2012

Arbitrary assignment in join

Hi folks - I'm hoping that this is an issue that has a reasonably easy answe
r :-)
I am migrating data that has some grouping, but I there is not a unque join
to get a 1-1 match for
my data. What I need to to simply pick any *one* row and put it in any *one
* row in my destination
table.
Here is the issue in SQL:
create table T1
(
grp varchar(5), -- some grouping
k1 int -- some unique key in T1
)
create table T2
(
grp varchar(5), -- some grouping
k2 int -- some unique key in T2
)
insert T1 (grp, k1) values ('A', 1)
insert T1 (grp, k1) values ('A', 2)
insert T1 (grp, k1) values ('B', 3)
insert T1 (grp, k1) values ('B', 4)
insert T2 (grp, k2) values ('A', 100)
insert T2 (grp, k2) values ('A', 101)
insert T2 (grp, k2) values ('B', 102)
select T1.grp, T1.k1, T2.k2
from T1
inner join T2 on T2.grp = T1.grp
order by T1.grp, T1.k1, T2.k2
The last select gives me this result:
A 1 100
A 1 101
A 2 100
A 2 101
B 3 102
B 4 102
In the above, I have duplicates (A-1 matches with both 100 and 101, so does
A-2). I don't care
which way I get them, I just need to match them up and get each record in T2
once and only once. A
good result (but not the only) would be:
A 1 100
A 2 101
B 4 102
In this example, I only have one row in T2 for group B, so I can assign it t
o B-3 or B-4 in T1, it
doesn't matter to me.
Is there a clever way to join this to get the desired result? I should be a
ble to do it with a
cursor, but it is pretty far down my list of alternatives :-)
Thanks!Here's a deterministic solution which matches rows based on key order within
the group:
select t1.grp, k1, k2
from t1 join t2
on t1.grp = t2.grp
and (select count(*) from t1 as t1b
where t1b.grp = t1.grp and t1b.k1 <= t1.k1) =
(select count(*) from t2 as t2b
where t2b.grp = t2.grp and t2b.k2 <= t2.k2)
It's pretty fast if the group size is fairly small (only several rows in a
group), and you have an index on (grp, key).
If the group size is large (dozens and up), it will be even slower than a
cursor-based solution.
In such a case, a fast solution would be to populate temporary tables with
identity values like so:
create table #T1
(
rn int not null identity,
grp varchar(5),
k1 int
)
create table #T2
(
rn int not null identity,
grp varchar(5),
k2 int
)
insert into #t1(grp, k1)
select * from t1
order by grp, rand(checksum(newid()))
insert into #t2(grp, k2)
select * from t2
order by grp, rand(checksum(newid()))
select t1.grp, k1, k2
from (select rn - mnrn + 1 as rn, t.grp, t.k1
from #t1 as t
join (select grp, min(rn) as mnrn from #t1 group by grp) as g
on t.grp = g.grp) as t1
join
(select rn - mnrn + 1 as rn, t.grp, t.k2
from #t2 as t
join (select grp, min(rn) as mnrn from #t2 group by grp) as g
on t.grp = g.grp) as t2
on t1.grp = t2.grp
and t1.rn = t2.rn
grp k1 k2
-- -- --
A 1 101
A 2 100
B 4 102
I used random sorting within the group to generate row numbers, but if you
want a deterministic result, simply sort by grp, key.
Cheers,
--
BG, SQL Server MVP
www.SolidQualityLearning.com
"Jami Bradley" <jbradley@.isa-og.com> wrote in message
news:cm87f1tvutjsom61a8151uq74mmfi6f6f1@.
4ax.com...
> Hi folks - I'm hoping that this is an issue that has a reasonably easy
> answer :-)
> I am migrating data that has some grouping, but I there is not a unque
> join to get a 1-1 match for
> my data. What I need to to simply pick any *one* row and put it in any
> *one* row in my destination
> table.
> Here is the issue in SQL:
> create table T1
> (
> grp varchar(5), -- some grouping
> k1 int -- some unique key in T1
> )
>
> create table T2
> (
> grp varchar(5), -- some grouping
> k2 int -- some unique key in T2
> )
> insert T1 (grp, k1) values ('A', 1)
> insert T1 (grp, k1) values ('A', 2)
> insert T1 (grp, k1) values ('B', 3)
> insert T1 (grp, k1) values ('B', 4)
> insert T2 (grp, k2) values ('A', 100)
> insert T2 (grp, k2) values ('A', 101)
> insert T2 (grp, k2) values ('B', 102)
>
> select T1.grp, T1.k1, T2.k2
> from T1
> inner join T2 on T2.grp = T1.grp
> order by T1.grp, T1.k1, T2.k2
>
> The last select gives me this result:
> A 1 100
> A 1 101
> A 2 100
> A 2 101
> B 3 102
> B 4 102
> In the above, I have duplicates (A-1 matches with both 100 and 101, so
> does A-2). I don't care
> which way I get them, I just need to match them up and get each record in
> T2 once and only once. A
> good result (but not the only) would be:
> A 1 100
> A 2 101
> B 4 102
> In this example, I only have one row in T2 for group B, so I can assign it
> to B-3 or B-4 in T1, it
> doesn't matter to me.
>
> Is there a clever way to join this to get the desired result? I should be
> able to do it with a
> cursor, but it is pretty far down my list of alternatives :-)
>
> Thanks!
>|||Is this what you want?
select T2.grp, T2.k2, min(T1.k1) as k1
from T1 join T2 on (T1.grp = T2.grp)
group by T2.grp, T2.k2
"Jami Bradley" <jbradley@.isa-og.com> wrote in message
news:cm87f1tvutjsom61a8151uq74mmfi6f6f1@.
4ax.com...
> Hi folks - I'm hoping that this is an issue that has a reasonably easy
answer :-)
> I am migrating data that has some grouping, but I there is not a unque
join to get a 1-1 match for
> my data. What I need to to simply pick any *one* row and put it in any
*one* row in my destination
> table.
> Here is the issue in SQL:
> create table T1
> (
> grp varchar(5), -- some grouping
> k1 int -- some unique key in T1
> )
>
> create table T2
> (
> grp varchar(5), -- some grouping
> k2 int -- some unique key in T2
> )
> insert T1 (grp, k1) values ('A', 1)
> insert T1 (grp, k1) values ('A', 2)
> insert T1 (grp, k1) values ('B', 3)
> insert T1 (grp, k1) values ('B', 4)
> insert T2 (grp, k2) values ('A', 100)
> insert T2 (grp, k2) values ('A', 101)
> insert T2 (grp, k2) values ('B', 102)
>
> select T1.grp, T1.k1, T2.k2
> from T1
> inner join T2 on T2.grp = T1.grp
> order by T1.grp, T1.k1, T2.k2
>
> The last select gives me this result:
> A 1 100
> A 1 101
> A 2 100
> A 2 101
> B 3 102
> B 4 102
> In the above, I have duplicates (A-1 matches with both 100 and 101, so
does A-2). I don't care
> which way I get them, I just need to match them up and get each record in
T2 once and only once. A
> good result (but not the only) would be:
> A 1 100
> A 2 101
> B 4 102
> In this example, I only have one row in T2 for group B, so I can assign it
to B-3 or B-4 in T1, it
> doesn't matter to me.
>
> Is there a clever way to join this to get the desired result? I should be
able to do it with a
> cursor, but it is pretty far down my list of alternatives :-)
>
> Thanks!
>|||Beautiful - works great! I still keep forgetting those tricks with count(*)
to match up the items.
I was starting down the path of the second choice with the temp tables and i
dentity. I think the
largest group size I will be seeing is 8-10, so I think the first query will
work well.
Thanks for the help!
Jami
On Fri, 5 Aug 2005 21:04:16 +0300, "Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQu
alityLearning.com>
wrote:

>Here's a deterministic solution which matches rows based on key order withi
n
>the group:
>select t1.grp, k1, k2
>from t1 join t2
> on t1.grp = t2.grp
> and (select count(*) from t1 as t1b
> where t1b.grp = t1.grp and t1b.k1 <= t1.k1) =
> (select count(*) from t2 as t2b
> where t2b.grp = t2.grp and t2b.k2 <= t2.k2)
>It's pretty fast if the group size is fairly small (only several rows in a
>group), and you have an index on (grp, key).
>If the group size is large (dozens and up), it will be even slower than a
>cursor-based solution.
>In such a case, a fast solution would be to populate temporary tables with
>identity values like so:
>create table #T1
>(
> rn int not null identity,
> grp varchar(5),
> k1 int
> )
>create table #T2
>(
> rn int not null identity,
> grp varchar(5),
> k2 int
> )
>insert into #t1(grp, k1)
> select * from t1
> order by grp, rand(checksum(newid()))
>insert into #t2(grp, k2)
> select * from t2
> order by grp, rand(checksum(newid()))
>select t1.grp, k1, k2
>from (select rn - mnrn + 1 as rn, t.grp, t.k1
> from #t1 as t
> join (select grp, min(rn) as mnrn from #t1 group by grp) as g
> on t.grp = g.grp) as t1
> join
> (select rn - mnrn + 1 as rn, t.grp, t.k2
> from #t2 as t
> join (select grp, min(rn) as mnrn from #t2 group by grp) as g
> on t.grp = g.grp) as t2
> on t1.grp = t2.grp
> and t1.rn = t2.rn
>grp k1 k2
>-- -- --
>A 1 101
>A 2 100
>B 4 102
>I used random sorting within the group to generate row numbers, but if you
>want a deterministic result, simply sort by grp, key.
>Cheers,|||Not quite. That gives me a duplicate copy of k1 = 1, so I will lose the row
where k1 = 2.
Thanks!
Jami
On Fri, 5 Aug 2005 15:19:23 -0400, "Brian Selzer" <brian@.selzer-software.com
> wrote:

>Is this what you want?
>select T2.grp, T2.k2, min(T1.k1) as k1
> from T1 join T2 on (T1.grp = T2.grp)
> group by T2.grp, T2.k2
>"Jami Bradley" <jbradley@.isa-og.com> wrote in message
> news:cm87f1tvutjsom61a8151uq74mmfi6f6f1@.
4ax.com...
>answer :-)
>join to get a 1-1 match for
>*one* row in my destination
>does A-2). I don't care
>T2 once and only once. A
>to B-3 or B-4 in T1, it
>able to do it with a
>

No comments:

Post a Comment