Thursday, March 22, 2012

Architectural question

Hello all, I have a general architecture question.

We have a system that essentially does multiple processes/operations
against a single table.

The objective is to increase performance (i.e. amount of throughput)
and to reduce latency. I can sacrifice accuracy/consistency if
necessary. Batches of about 100-1000 records at a time hit the table
at a time, are read and sent to another app for processing, are
retrieved and the results are updated to the same table, which is then
read again by yet another process.

In essence, whenever work is done against the information in that
table, the working process will update a status column in the table.
Unfortunately, this leads to a lot of contention as multiple procs are
attempting to update data sets and stepping on each other. The system
generated a lot of deadlocks until various forced nolock, rowlock, and
deadlock hints were added to the code. Now, of course, I'm seeing a
lot of server resources being burned up by locking efforts, and
duration is increased due to blocking on occasion, not to mention
various headaches and timeouts due to using these locking hints.

To fix, this, I have split up the central table vertically and removed
the need for various processes to update the table, now the system does
more reads and more inserts, but less updates. In essence, it moves
the key from table to table, and whether or not it is in a table
determines its status (and thus what next needs to be done)

So, First question: I believe that inserts take less time than updates,
but I don't know by how much. Can anyone point me towards something
that talks to this? I know, for example, that locks become much less
of a liability, but I'm a little concerned that the writes will cost
more than I save from removing the locking.

Second, It became clear that in moving to this kind of architecture it
will be necessary to delete records from one of the tables in realtime.
I'm shooting for a capacity of 100 million records/day, and one of
the tables needs to be short when comparing to the other table to get
it's status. So, given that the table isn't ever going to be more
than 8 pages long, will it being continually inserted into and then
deleted from matter? Can I expect to see lock contention in this
scenario (i.e, multiple sprocs inserting into, with another sproc
reading from then deleting from the table)?Hi,
I am not sure if you vertically split the tables the original problem
will go away. I think when these types of issues arise you might
consider partitioning the database (which is horizontal). That might
increase performance etc and solve locking problems too.

Amaxen1@.gmail.com wrote:

Quote:

Originally Posted by

Hello all, I have a general architecture question.
>
We have a system that essentially does multiple processes/operations
against a single table.
>
The objective is to increase performance (i.e. amount of throughput)
and to reduce latency. I can sacrifice accuracy/consistency if
necessary. Batches of about 100-1000 records at a time hit the table
at a time, are read and sent to another app for processing, are
retrieved and the results are updated to the same table, which is then
read again by yet another process.
>
>
In essence, whenever work is done against the information in that
table, the working process will update a status column in the table.
Unfortunately, this leads to a lot of contention as multiple procs are
attempting to update data sets and stepping on each other. The system
generated a lot of deadlocks until various forced nolock, rowlock, and
deadlock hints were added to the code. Now, of course, I'm seeing a
lot of server resources being burned up by locking efforts, and
duration is increased due to blocking on occasion, not to mention
various headaches and timeouts due to using these locking hints.
>
>
To fix, this, I have split up the central table vertically and removed
the need for various processes to update the table, now the system does
more reads and more inserts, but less updates. In essence, it moves
the key from table to table, and whether or not it is in a table
determines its status (and thus what next needs to be done)
>
So, First question: I believe that inserts take less time than updates,
but I don't know by how much. Can anyone point me towards something
that talks to this? I know, for example, that locks become much less
of a liability, but I'm a little concerned that the writes will cost
more than I save from removing the locking.
>
Second, It became clear that in moving to this kind of architecture it
will be necessary to delete records from one of the tables in realtime.
I'm shooting for a capacity of 100 million records/day, and one of
the tables needs to be short when comparing to the other table to get
it's status. So, given that the table isn't ever going to be more
than 8 pages long, will it being continually inserted into and then
deleted from matter? Can I expect to see lock contention in this
scenario (i.e, multiple sprocs inserting into, with another sproc
reading from then deleting from the table)?

sql

No comments:

Post a Comment