CAFE

RDBMS

MySQL(MariaDB)의 Materialized Views

작성자운영자|작성시간16.03.15|조회수5,126 목록 댓글 0

원본: http://www.fromdual.com/mysql-materialized-views#what_is


MySQL의 Materialized Views

Materialized View란?

Materialized View(MV)는 쿼리 결과가 미리 저장된 뷰이다. Materialized View의 결과와 일반 VIEW의 결과는 다르다. Materialized View가 필요한 경우는 데이터 양이 많으나 즉각적으로 Query결과가 나와야 할 때 사용 되어 진다. Materialized View의 Data는 고정되어 있으므로 한번 씩 갱신되어야 할 필요 성이 있다. 이 주기는 Materialized view가 얼마나 자주 갱신 되어야 할지에 대해 필요에 따라 설정해야 한다. 기본적으로 Materialized view는 바로 또는 지연시켜서 갱신 시킬 수 있다. MySQL은 Materialized View를 공식적으로 지원하지 않는다. 다만 사용자가 직접 Materialized View를 생성 시킬 수 있다.

직접 Materialized View 구현해 보기

이 작업을 수행 수있는 방법에 대한 간단한 예는 다음 쿼리와 같다:

SELECT COUNT(*)
  FROM MyISAM_table;

MyISAM 엔진의 테이블은 카운터 테이블 헤더에 저장되기 때문에 즉각적인 결과를 반환한다. 다음 쿼리는 수초 또는 수분 정도 걸릴 수 있다 :

SELECT COUNT(*)
 FROM innodb_huge;

InnoDB 엔진을 사용한 테이블에서 이렇게 오래 걸리는 것을 해결하기 위해서는 InnoDB row에 카운트를 저장시키는 방법 일 것입니다

CREATE TABLE innodb_row_count (
    id          INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , schema_name VARCHAR(64)  NOT NULL
  , table_name  VARCHAR(64)  NOT NULL
  , row_count   INT UNSIGNED NOT NULL
);

이 테이블의 정확성을 필요에 따라 테이블을 새로 갱신 시킬 수 있다. 하루에 한 번 (시스템의 자원을 가장 적게 사용하지만 결과가 하루에 한번밖에 갱신 안되는 가장 큰 단점이 존재), 또는 매 시간 마다 또는 데이터가 변경될때 마다(너무 느릴거다) 갱신 시킬 수 있다.
다른 방법은 인포메이션 스키마로부터 데이터를 얻는 건데, 이 정보는 최대 20 % 정도 잘못 된 정보를 얻을 수 있다.

SELECT table_schema, table_name, table_rows FROM information_schema.tables WHERE table_type = 'BASE TABLE';


Materialized view 갱신하기

Materialized View는 다양한 방법으로 갱신 시킬 수 있다. 그 방법은 다음과 같다:

  • never (시작시, 오직 한번만 갱신한다. 데이터가 완전히 고정되어 있는 경우에 적합하다)
  • on demand (예를 들어 하루 한번, 새벽에 갱신 시킬 때 적합하다.)
  • immediately (즉시적으로 갱신시킬 때 적합하나 매우 느리다.)

갱신은 다음 절차에 따라 완료 된다:

  • completely (느림, 처음부터 전체)
  • deferred (빠름, 로그 테이블에 의해)

변경 정보를 저장함으로써, 또한 일부 스냅샷 또는 지연된 상태가 처리될 수 있다:

  • refresh up to date
  • refresh full

실습

상세하게 이해하기 위해서는 아마도 가장 쉬운 예부터 접근하는 것이 좋을 것이다. 아래 판매 테이블 생성한다고 가정 하자:

CREATE TABLE sales (
    sales_id       INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , product_name   VARCHAR(128) NOT NULL
  , product_price  DECIMAL(8,2) NOT NULL
  , product_amount SMALLINT     NOT NULL
);

INSERT INTO sales VALUES
  (NULL, 'Apple', 1.25, 1), (NULL, 'Apple', 2.40, 2)
, (NULL, 'Apple', 4.05, 3), (NULL, 'Pear', 6.30, 2)
, (NULL, 'Pear', 12.20, 4), (NULL, 'Plum', 4.85, 3)
;

SELECT * FROM sales;

그리고 지금 우리 판매된 가격과 제품 별 번 돈 알고 싶다 :

EXPLAIN
SELECT product_name
     , SUM(product_price) AS price_sum, SUM(product_amount) AS amount_sum
     , AVG(product_price) AS price_avg, AVG(product_amount) amount_agg
     , COUNT(*)
  FROM sales
 GROUP BY product_name
 ORDER BY price_sum;

+-------------+-------+------+---------------+------+---------------------------------+
| select_type | table | type | possible_keys | rows | Extra                           |
+-------------+-------+------+---------------+------+---------------------------------+
| SIMPLE      | sales | ALL  | NULL          |    6 | Using temporary; Using filesort |
+-------------+-------+------+---------------+------+---------------------------------+

이러한 작은 테이블상에서는 이것이 꽤 빠르게 작동하지만 만약 수백 개의 제품이 있고 수백만의 판매 내역이 있다면 이것은 수분에서 수시간이 걸릴 것이다!

자신만의 Materialized View 생성하기

DROP TABLE sales_mv;
CREATE TABLE sales_mv (
    product_name VARCHAR(128)  NOT NULL
  , price_sum    DECIMAL(10,2) NOT NULL
  , amount_sum   INT           NOT NULL
  , price_avg    FLOAT         NOT NULL
  , amount_avg   FLOAT         NOT NULL
  , sales_cnt    INT           NOT NULL
  , UNIQUE INDEX product (product_name)
);

INSERT INTO sales_mv
SELECT product_name
    , SUM(product_price), SUM(product_amount)
    , AVG(product_price), AVG(product_amount)
    , COUNT(*)
  FROM sales
GROUP BY product_name;

(맨처음에 만들었던 테이블에 원하는 내용을 쿼리로 뽑아내서  그 결과를 이번에 만든 sales_mv에 insert 시킴)

이것이 지금까지 가장 쉬운 부분이었고, 결과는 예상대로 잘 나왔다: 나타났다:

mysql> SELECT * FROM sales_mv;
+--------------+-----------+------------+-----------+------------+-----------+
| product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt |
+--------------+-----------+------------+-----------+------------+-----------+
| Apple        |      7.70 |          6 |   2.56667 |          2 |         3 |
| Pear         |     18.50 |          6 |      9.25 |          3 |         2 |
| Plum         |      4.85 |          3 |      4.85 |          3 |         1 |
+--------------+-----------+------------+-----------+------------+-----------+
3 rows in set (0.00 sec)

이것이 한번 만들고나면 끝이고 갱신하려면 다시 만들어야하는 "NEVER" 갱신 모드이다. 하지만 일반적으로 우리가 이 것을 원했던 것은 아닐 것이다.

필요 시 마다 Materialized View 갱신하기

다음과 같이 필요 시 마다 Materialized View를 갱신시키는 것은 스토어 프로시져(Stroed Procedure)를 통해 구현할 수 있다:

DROP PROCEDURE refresh_mv_now;

DELIMITER $$

CREATE PROCEDURE refresh_mv_now (
    OUT rc INT
)
BEGIN

  TRUNCATE TABLE sales_mv;

  INSERT INTO sales_mv
  SELECT product_name
      , SUM(product_price), SUM(product_amount)
      , AVG(product_price), AVG(product_amount)
      , COUNT(*)
    FROM sales
  GROUP BY product_name;

  SET rc = 0;
END;
$$

DELIMITER ;

다음 문장을 사용해서 잘 동작되는지 확인해 보자:

CALL refresh_mv_now(@rc);

SELECT * FROM sales_mv;

+--------------+-----------+------------+-----------+------------+-----------+
| product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt |
+--------------+-----------+------------+-----------+------------+-----------+
| Apple        |      7.70 |          6 |   2.56667 |          2 |         3 |
| Pear         |     18.50 |          6 |      9.25 |          3 |         2 |
| Plum         |      4.85 |          3 |      4.85 |          3 |         1 |
+--------------+-----------+------------+-----------+------------+-----------+

INSERT INTO sales VALUES
  (NULL, 'Apple', 2.25, 3), (NULL, 'Plum', 3.35, 1)
, (NULL, 'Pear', 1.80, 2);

CALL refresh_mv_now(@rc);

SELECT * FROM sales_mv;

+--------------+-----------+------------+-----------+------------+-----------+
| product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt |
+--------------+-----------+------------+-----------+------------+-----------+
| Apple        |      9.95 |          9 |    2.4875 |       2.25 |         4 |
| Pear         |     20.30 |          8 |   6.76667 |    2.66667 |         3 |
| Plum         |      8.20 |          4 |       4.1 |          2 |         2 |
+--------------+-----------+------------+-----------+------------+-----------+

조금 더 멋진 출력결과를 만들기 위해 우리는 Materialized View를 참고하는  뷰를 다음과 같이 추가 할 수 있다:

CREATE VIEW sales_v AS
SELECT product_name, price_sum, amount_sum, price_avg, amount_avg
  FROM sales_mv;

Materialized View 즉시 갱신하기

각 구문 이후에 모든 부분을 갱신하는 것은 맞지 않다. 하지만 우리는 적당한 결과를 갖을 수는 있다. 이 것을 하기 위해서는 약간 복잡할 것이다.

Sales 테이블에 모든 INSERT에 대하여 우리의 Materialized View에도 업데이트 시켜줘야 한다. 우리는 INSERT/UPDATE/DELETE 트리거를 sales 테이블 상에 투명하게 구현할 수 있다:

필요한 트리거들을 생성해 보자:

DELIMITER $$

CREATE TRIGGER sales_ins
AFTER INSERT ON sales
FOR EACH ROW
BEGIN

  SET @old_price_sum = 0;
  SET @old_amount_sum = 0;
  SET @old_price_avg = 0;
  SET @old_amount_avg = 0;
  SET @old_sales_cnt = 0;

  SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0)
       , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0)
    FROM sales_mv
   WHERE product_name = NEW.product_name
    INTO @old_price_sum, @old_amount_sum, @old_price_avg
       , @old_amount_avg, @old_sales_cnt
  ;

  SET @new_price_sum = @old_price_sum + NEW.product_price;
  SET @new_amount_sum = @old_amount_sum + NEW.product_amount;
  SET @new_sales_cnt = @old_sales_cnt + 1;
  SET @new_price_avg = @new_price_sum / @new_sales_cnt;
  SET @new_amount_avg = @new_amount_sum / @new_sales_cnt;

  REPLACE INTO sales_mv
  VALUES(NEW.product_name, @new_price_sum, @new_amount_sum, @new_price_avg
       , @new_amount_avg, @new_sales_cnt)
  ;

END;
$$

DELIMITER ;

DELIMITER $$

CREATE TRIGGER sales_del
AFTER DELETE ON sales
FOR EACH ROW
BEGIN

  SET @old_price_sum = 0;
  SET @old_amount_sum = 0;
  SET @old_price_avg = 0;
  SET @old_amount_avg = 0;
  SET @old_sales_cnt = 0;

  SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0)
       , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0)
    FROM sales_mv
   WHERE product_name = OLD.product_name
    INTO @old_price_sum, @old_amount_sum, @old_price_avg
       , @old_amount_avg, @old_sales_cnt
  ;

  SET @new_price_sum = @old_price_sum - OLD.product_price;
  SET @new_amount_sum = @old_amount_sum - OLD.product_amount;
  SET @new_price_avg = @new_price_sum / @new_amount_sum;
  SET @new_sales_cnt = @old_sales_cnt - 1;
  SET @new_amount_avg = @new_amount_sum / @new_sales_cnt;

  REPLACE INTO sales_mv
  VALUES(OLD.product_name, @new_price_sum, @new_amount_sum
       , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0)
       , @new_sales_cnt)
  ;

END;
$$

DELIMITER ;

DELIMITER $$

CREATE TRIGGER sales_upd
AFTER UPDATE ON sales
FOR EACH ROW
BEGIN

  SET @old_price_sum = 0;
  SET @old_amount_sum = 0;
  SET @old_price_avg = 0;
  SET @old_amount_avg = 0;
  SET @old_sales_cnt = 0;

  SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0)
       , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0)
    FROM sales_mv
   WHERE product_name = OLD.product_name
    INTO @old_price_sum, @old_amount_sum, @old_price_avg
       , @old_amount_avg, @old_sales_cnt
  ;

  SET @new_price_sum = @old_price_sum + (NEW.product_price - OLD.product_price);
  SET @new_amount_sum = @old_amount_sum + (NEW.product_amount
                      - OLD.product_amount);
  SET @new_sales_cnt = @old_sales_cnt;
  SET @new_price_avg = @new_price_sum / @new_sales_count;
  SET @new_amount_avg = @new_amount_sum / @new_sales_cnt;

  REPLACE INTO sales_mv
  VALUES(OLD.product_name, @new_price_sum, @new_amount_sum
       , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0)
       , @new_sales_cnt)
  ;

END;
$$

DELIMITER ;

그리고 결과를 보자:

INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1);
INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2);
INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3);
INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2);
INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4);
INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3);

DELETE FROM sales WHERE sales_id = 5;
DELETE FROM sales WHERE sales_id = 4;

UPDATE sales SET product_amount = 3 where sales_id = 2;

SELECT * from sales_v;

Materialized Views with snapshotting functionality

The difference to the example above is, that the changes will not be applied immediately. The changes will be stored in a log table and the Materialized View is refreshed after a certain time period for a certain amount of time.

In addition to the example above we need an additional log table:

CREATE TABLE sales_mvl (
    product_name   VARCHAR(128) NOT NULL
  , product_price  DECIMAL(8,2) NOT NULL
  , product_amount SMALLINT     NOT NULL
  , sales_id       INT UNSIGNED NOT NULL
  , product_ts     TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP()
);

Instead of updating the mv the log is filled:

DROP TRIGGER sales_ins;

DELIMITER $$

CREATE TRIGGER sales_ins
AFTER INSERT ON sales
FOR EACH ROW
BEGIN

  INSERT INTO sales_mvl
  VALUES (NEW.product_name, NEW.product_price, NEW.product_amount
        , NEW.sales_id, NULL);

END;
$$

DELIMITER ;

DROP TRIGGER sales_del;

DELIMITER $$

CREATE TRIGGER sales_del
AFTER DELETE ON sales
FOR EACH ROW
BEGIN

  DELETE FROM sales_mvl
  WHERE sales_id = OLD.sales_id;

END;
$$

DELIMITER ;

DROP TRIGGER sales_upd;

DELIMITER $$

CREATE TRIGGER sales_upd
AFTER UPDATE ON sales
FOR EACH ROW
BEGIN

  UPDATE sales_mvl
     SET product_name = NEW.product_name
       , product_price = NEW.product_price
       , product_amount = NEW.product_amount
       , sales_id = NEW.sales_id
       , product_ts = CURRENT_TIMESTAMP()
  WHERE sales_id = OLD.sales_id;

END;
$$

DELIMITER ;

And a Stored Procedure for refreshing the Materialized View is built. Possible modes are:

  • REFRESH FULL (up to now)
  • REFRESH (up to a certain time stamp)
  • REBUILD (fully rebuild and clean MV log table)
DELIMITER $$

CREATE PROCEDURE refresh_mv (
    IN method VARCHAR(16)
  , IN ts TIMESTAMP
  , OUT rc INT
)
BEGIN

IF UPPER(method) = 'REBUILD' THEN

  TRUNCATE TABLE sales_mvl;
  TRUNCATE TABLE sales_mv;

  INSERT INTO sales_mv
  SELECT product_name
      , SUM(product_price), SUM(product_amount), AVG(product_price)
      , AVG(product_amount), COUNT(*)
    FROM sales
  GROUP BY product_name
  ;

ELSEIF UPPER(method) = 'REFRESH FULL' THEN

  REPLACE INTO sales_mv
  SELECT product_name, SUM(price_sum), SUM(amount_sum)
       , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt)
       , SUM(sales_cnt)
    FROM (
          SELECT product_name, price_sum,  amount_sum, sales_cnt
            FROM sales_mv
          UNION ALL
          SELECT product_name
              , SUM(product_price), SUM(product_amount), COUNT(*)
            FROM sales_mvl
          GROUP BY product_name
        ) x
  GROUP BY product_name
  ;

  TRUNCATE TABLE sales_mvl;

  SET rc = 0;
ELSEIF UPPER(method) = 'REFRESH' THEN

  REPLACE INTO sales_mv
  SELECT product_name, SUM(price_sum), SUM(amount_sum)
       , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt)
       , SUM(sales_cnt)
    FROM (
          SELECT product_name, price_sum,  amount_sum, sales_cnt
            FROM sales_mv
          UNION ALL
          SELECT product_name
              , SUM(product_price), SUM(product_amount), COUNT(*)
            FROM sales_mvl
          WHERE product_ts < ts
          GROUP BY product_name
        ) x
  GROUP BY product_name
  ;

  DELETE
    FROM sales_mvl
   WHERE product_ts < ts
  ;

  SET rc = 0;
ELSE
  SET rc = 1;
END IF;

END;
$$

DELIMITER ;

And now let us test if it works correctly...

INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1);
wait some time

INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2);
wait some time

INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3);
wait some time

INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2);
wait some time

INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4);
wait some time

INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3);

SELECT * from sales_mvl;
SELECT * from sales_v;

CALL refresh_mv('REFRESH', '2006-11-06 10:57:55', @rc);
SELECT * from sales_v;

CALL refresh_mv('REFRESH FULL', NULL, @rc);
SELECT * from sales_v;

CALL refresh_mv('REBUILD', NULL, @rc);
SELECT * from sales_v;

Some performance benchmarks for our Materialized Views:

For the performance benchmark a set of 100 k sales rows was created. This set of rows should simulate a continuous INSERT flow into our database. The whole test was done when all files and all database table were cache in memory. To avoid side effects during measurement no other activities should happen on this machine.

Load into table sales without any triggers as baseline:

Testtime [s]Ref.
LOAD DATA INFILE0.90[1]
Multi row INSERT2.85[2]
Single row INSERT13.2[3]
Single row INSERT without LOCK TABLE15.9[4]
FULL REFRESH of sales_mv0.64[5]
SELECT on MV :-)0.00[6]

Load into table sales with some Materialized View functionality:

Testtime [s]Ref.
LOAD DATA INFILE
with REFRESH IMMEDIATE
40.8[1]
Single row INSERT without LOCK TABLE
with REFRESH IMMEDIATE
109[4]
Single rows INSERT without LOCK TABLE
with REFRESH DEFERRED
22.8[4]
Refresh MV with about 40% of the data0.82
Refresh MV with next 40% of the data0.98
Refresh MV with last 20% of the data0.14

Outlook

  • If you have a continuous data flow and/or concurrent transactions it may be better to use InnoDB instead of MyISAM tables.
  • Locking the table in the Triggers/Stored Procedure may prevent wrong data this has to be tested.
  • Delayed INSERT may help to speed up the load process.
  • Eventually it makes sense to build/refresh the Materialized Vied in parallel?

Conclusion

  • Triggers in MySQL (5.0.27) are not terribly fast.
  • Materialized Views can help to speed up queries which rely heavily on some aggregated results.
  • If INSERT speed is not a matter this feature can help to decrease read load on the system.
  • It is a little tricky to implement.
  • MV come only in play if huge amount of data are used and database are not memory based anymore.

Literature

[ 1 ]
LOAD DATA
INFILE '/tmp/sales_outfile_100k.dmp'
INTO TABLE sales;

[ 2 ]
time mysql -u root test << EOF;
TRUNCATE TABLE sales;
SOURCE /tmp/sales_multirowinsert_100k.sql
EOF

[ 3 ]
time mysql -u root test << EOF;
TRUNCATE TABLE sales;
SOURCE tmp/sales_extended_100k.sql
EOF

[ 4 ]
time mysql -u root test << EOF;
TRUNCATE TABLE sales;
SOURCE /tmp/sales_extended_nolock_100k.sql
EOF

[ 5 ]
INSERT INTO sales_mv
SELECT product_name
    , SUM(product_price), SUM(product_amount)
    , AVG(product_price), AVG(product_amount)
    , COUNT(*)
  FROM sales
GROUP BY product_name;

[ 6 ]
SELECT *
  FROM sales_mv;

[ 7 ] Wikipedia: Materialized Views


다음검색
현재 게시글 추가 기능 열기

댓글

댓글 리스트
맨위로

카페 검색

카페 검색어 입력폼