원본: http://www.fromdual.com/mysql-materialized-views#what_is
MySQL의 Materialized Views
Materialized View란?
Materialized View(MV)는 쿼리 결과가 미리 저장된 뷰이다. Materialized View의 결과와 일반 VIEW의 결과는 다르다. Materialized View가 필요한 경우는 데이터 양이 많으나 즉각적으로 Query결과가 나와야 할 때 사용 되어 진다. Materialized View의 Data는 고정되어 있으므로 한번 씩 갱신되어야 할 필요 성이 있다. 이 주기는 Materialized view가 얼마나 자주 갱신 되어야 할지에 대해 필요에 따라 설정해야 한다. 기본적으로 Materialized view는 바로 또는 지연시켜서 갱신 시킬 수 있다. MySQL은 Materialized View를 공식적으로 지원하지 않는다. 다만 사용자가 직접 Materialized View를 생성 시킬 수 있다.
직접 Materialized View 구현해 보기
이 작업을 수행 할 수있는 방법에 대한 간단한 예는 다음 쿼리와 같다:
SELECT COUNT(*) FROM MyISAM_table;
MyISAM 엔진의 테이블은 카운터 테이블 헤더에 저장되기 때문에 즉각적인 결과를 반환한다. 다음 쿼리는 수초 또는 수분 정도 걸릴 수 있다 :
SELECT COUNT(*) FROM innodb_huge;
InnoDB 엔진을 사용한 테이블에서 이렇게 오래 걸리는 것을 해결하기 위해서는 InnoDB row에 카운트를 저장시키는 방법 일 것입니다
CREATE TABLE innodb_row_count ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , schema_name VARCHAR(64) NOT NULL , table_name VARCHAR(64) NOT NULL , row_count INT UNSIGNED NOT NULL );
이 테이블의 정확성을 필요에 따라 테이블을 새로 갱신 시킬 수 있다. 하루에 한 번 (시스템의 자원을 가장 적게 사용하지만 결과가 하루에 한번밖에 갱신 안되는 가장 큰 단점이 존재), 또는 매 시간 마다 또는 데이터가 변경될때 마다(너무 느릴거다) 갱신 시킬 수 있다.
다른 방법은 인포메이션 스키마로부터 데이터를 얻는 건데, 이 정보는 최대 20 % 정도 잘못 된 정보를 얻을 수 있다.
SELECT table_schema, table_name, table_rows FROM information_schema.tables WHERE table_type = 'BASE TABLE';
Materialized view 갱신하기
Materialized View는 다양한 방법으로 갱신 시킬 수 있다. 그 방법은 다음과 같다:
- never (시작시, 오직 한번만 갱신한다. 데이터가 완전히 고정되어 있는 경우에 적합하다)
- on demand (예를 들어 하루 한번, 새벽에 갱신 시킬 때 적합하다.)
- immediately (즉시적으로 갱신시킬 때 적합하나 매우 느리다.)
갱신은 다음 절차에 따라 완료 된다:
- completely (느림, 처음부터 전체)
- deferred (빠름, 로그 테이블에 의해)
변경 정보를 저장함으로써, 또한 일부 스냅샷 또는 지연된 상태가 처리될 수 있다:
- refresh up to date
- refresh full
실습
상세하게 이해하기 위해서는 아마도 가장 쉬운 예부터 접근하는 것이 좋을 것이다. 아래 판매 테이블 생성한다고 가정 하자:
CREATE TABLE sales ( sales_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , product_name VARCHAR(128) NOT NULL , product_price DECIMAL(8,2) NOT NULL , product_amount SMALLINT NOT NULL ); INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1), (NULL, 'Apple', 2.40, 2) , (NULL, 'Apple', 4.05, 3), (NULL, 'Pear', 6.30, 2) , (NULL, 'Pear', 12.20, 4), (NULL, 'Plum', 4.85, 3) ; SELECT * FROM sales;
그리고 지금 우리는 판매된 가격과 제품 별 번 돈을 알고 싶다 :
EXPLAIN SELECT product_name , SUM(product_price) AS price_sum, SUM(product_amount) AS amount_sum , AVG(product_price) AS price_avg, AVG(product_amount) amount_agg , COUNT(*) FROM sales GROUP BY product_name ORDER BY price_sum; +-------------+-------+------+---------------+------+---------------------------------+ | select_type | table | type | possible_keys | rows | Extra | +-------------+-------+------+---------------+------+---------------------------------+ | SIMPLE | sales | ALL | NULL | 6 | Using temporary; Using filesort | +-------------+-------+------+---------------+------+---------------------------------+
이러한 작은 테이블상에서는 이것이 꽤 빠르게 작동하지만 만약 수백 개의 제품이 있고 수백만의 판매 내역이 있다면 이것은 수분에서 수시간이 걸릴 것이다!
자신만의 Materialized View 생성하기
DROP TABLE sales_mv; CREATE TABLE sales_mv ( product_name VARCHAR(128) NOT NULL , price_sum DECIMAL(10,2) NOT NULL , amount_sum INT NOT NULL , price_avg FLOAT NOT NULL , amount_avg FLOAT NOT NULL , sales_cnt INT NOT NULL , UNIQUE INDEX product (product_name) ); INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount) , AVG(product_price), AVG(product_amount) , COUNT(*) FROM sales GROUP BY product_name;
(맨처음에 만들었던 테이블에 원하는 내용을 쿼리로 뽑아내서 그 결과를 이번에 만든 sales_mv에 insert 시킴)
이것이 지금까지 가장 쉬운 부분이었고, 결과는 예상대로 잘 나왔다: 나타났다:
mysql> SELECT * FROM sales_mv; +--------------+-----------+------------+-----------+------------+-----------+ | product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt | +--------------+-----------+------------+-----------+------------+-----------+ | Apple | 7.70 | 6 | 2.56667 | 2 | 3 | | Pear | 18.50 | 6 | 9.25 | 3 | 2 | | Plum | 4.85 | 3 | 4.85 | 3 | 1 | +--------------+-----------+------------+-----------+------------+-----------+ 3 rows in set (0.00 sec)
이것이 한번 만들고나면 끝이고 갱신하려면 다시 만들어야하는 "NEVER" 갱신 모드이다. 하지만 일반적으로 우리가 이 것을 원했던 것은 아닐 것이다.
필요 시 마다 Materialized View 갱신하기
다음과 같이 필요 시 마다 Materialized View를 갱신시키는 것은 스토어 프로시져(Stroed Procedure)를 통해 구현할 수 있다:
DROP PROCEDURE refresh_mv_now; DELIMITER $$ CREATE PROCEDURE refresh_mv_now ( OUT rc INT ) BEGIN TRUNCATE TABLE sales_mv; INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount) , AVG(product_price), AVG(product_amount) , COUNT(*) FROM sales GROUP BY product_name; SET rc = 0; END; $$ DELIMITER ;
다음 문장을 사용해서 잘 동작되는지 확인해 보자:
CALL refresh_mv_now(@rc); SELECT * FROM sales_mv; +--------------+-----------+------------+-----------+------------+-----------+ | product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt | +--------------+-----------+------------+-----------+------------+-----------+ | Apple | 7.70 | 6 | 2.56667 | 2 | 3 | | Pear | 18.50 | 6 | 9.25 | 3 | 2 | | Plum | 4.85 | 3 | 4.85 | 3 | 1 | +--------------+-----------+------------+-----------+------------+-----------+ INSERT INTO sales VALUES (NULL, 'Apple', 2.25, 3), (NULL, 'Plum', 3.35, 1) , (NULL, 'Pear', 1.80, 2); CALL refresh_mv_now(@rc); SELECT * FROM sales_mv; +--------------+-----------+------------+-----------+------------+-----------+ | product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt | +--------------+-----------+------------+-----------+------------+-----------+ | Apple | 9.95 | 9 | 2.4875 | 2.25 | 4 | | Pear | 20.30 | 8 | 6.76667 | 2.66667 | 3 | | Plum | 8.20 | 4 | 4.1 | 2 | 2 | +--------------+-----------+------------+-----------+------------+-----------+
조금 더 멋진 출력결과를 만들기 위해 우리는 Materialized View를 참고하는 뷰를 다음과 같이 추가 할 수 있다:
CREATE VIEW sales_v AS SELECT product_name, price_sum, amount_sum, price_avg, amount_avg FROM sales_mv;
Materialized View 즉시 갱신하기
각 구문 이후에 모든 부분을 갱신하는 것은 맞지 않다. 하지만 우리는 적당한 결과를 갖을 수는 있다. 이 것을 하기 위해서는 약간 복잡할 것이다.
Sales 테이블에 모든 INSERT에 대하여 우리의 Materialized View에도 업데이트 시켜줘야 한다. 우리는 INSERT/UPDATE/DELETE 트리거를 sales 테이블 상에 투명하게 구현할 수 있다:
필요한 트리거들을 생성해 보자:
DELIMITER $$ CREATE TRIGGER sales_ins AFTER INSERT ON sales FOR EACH ROW BEGIN SET @old_price_sum = 0; SET @old_amount_sum = 0; SET @old_price_avg = 0; SET @old_amount_avg = 0; SET @old_sales_cnt = 0; SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0) , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0) FROM sales_mv WHERE product_name = NEW.product_name INTO @old_price_sum, @old_amount_sum, @old_price_avg , @old_amount_avg, @old_sales_cnt ; SET @new_price_sum = @old_price_sum + NEW.product_price; SET @new_amount_sum = @old_amount_sum + NEW.product_amount; SET @new_sales_cnt = @old_sales_cnt + 1; SET @new_price_avg = @new_price_sum / @new_sales_cnt; SET @new_amount_avg = @new_amount_sum / @new_sales_cnt; REPLACE INTO sales_mv VALUES(NEW.product_name, @new_price_sum, @new_amount_sum, @new_price_avg , @new_amount_avg, @new_sales_cnt) ; END; $$ DELIMITER ;
DELIMITER $$ CREATE TRIGGER sales_del AFTER DELETE ON sales FOR EACH ROW BEGIN SET @old_price_sum = 0; SET @old_amount_sum = 0; SET @old_price_avg = 0; SET @old_amount_avg = 0; SET @old_sales_cnt = 0; SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0) , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0) FROM sales_mv WHERE product_name = OLD.product_name INTO @old_price_sum, @old_amount_sum, @old_price_avg , @old_amount_avg, @old_sales_cnt ; SET @new_price_sum = @old_price_sum - OLD.product_price; SET @new_amount_sum = @old_amount_sum - OLD.product_amount; SET @new_price_avg = @new_price_sum / @new_amount_sum; SET @new_sales_cnt = @old_sales_cnt - 1; SET @new_amount_avg = @new_amount_sum / @new_sales_cnt; REPLACE INTO sales_mv VALUES(OLD.product_name, @new_price_sum, @new_amount_sum , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0) , @new_sales_cnt) ; END; $$ DELIMITER ;
DELIMITER $$ CREATE TRIGGER sales_upd AFTER UPDATE ON sales FOR EACH ROW BEGIN SET @old_price_sum = 0; SET @old_amount_sum = 0; SET @old_price_avg = 0; SET @old_amount_avg = 0; SET @old_sales_cnt = 0; SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0) , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0) FROM sales_mv WHERE product_name = OLD.product_name INTO @old_price_sum, @old_amount_sum, @old_price_avg , @old_amount_avg, @old_sales_cnt ; SET @new_price_sum = @old_price_sum + (NEW.product_price - OLD.product_price); SET @new_amount_sum = @old_amount_sum + (NEW.product_amount - OLD.product_amount); SET @new_sales_cnt = @old_sales_cnt; SET @new_price_avg = @new_price_sum / @new_sales_count; SET @new_amount_avg = @new_amount_sum / @new_sales_cnt; REPLACE INTO sales_mv VALUES(OLD.product_name, @new_price_sum, @new_amount_sum , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0) , @new_sales_cnt) ; END; $$ DELIMITER ;
그리고 결과를 보자:
INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1); INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2); INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3); INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2); INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4); INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3); DELETE FROM sales WHERE sales_id = 5; DELETE FROM sales WHERE sales_id = 4; UPDATE sales SET product_amount = 3 where sales_id = 2; SELECT * from sales_v;
Materialized Views with snapshotting functionality
The difference to the example above is, that the changes will not be applied immediately. The changes will be stored in a log table and the Materialized View is refreshed after a certain time period for a certain amount of time.
In addition to the example above we need an additional log table:
CREATE TABLE sales_mvl ( product_name VARCHAR(128) NOT NULL , product_price DECIMAL(8,2) NOT NULL , product_amount SMALLINT NOT NULL , sales_id INT UNSIGNED NOT NULL , product_ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP() );
Instead of updating the mv the log is filled:
DROP TRIGGER sales_ins; DELIMITER $$ CREATE TRIGGER sales_ins AFTER INSERT ON sales FOR EACH ROW BEGIN INSERT INTO sales_mvl VALUES (NEW.product_name, NEW.product_price, NEW.product_amount , NEW.sales_id, NULL); END; $$ DELIMITER ;
DROP TRIGGER sales_del; DELIMITER $$ CREATE TRIGGER sales_del AFTER DELETE ON sales FOR EACH ROW BEGIN DELETE FROM sales_mvl WHERE sales_id = OLD.sales_id; END; $$ DELIMITER ;
DROP TRIGGER sales_upd; DELIMITER $$ CREATE TRIGGER sales_upd AFTER UPDATE ON sales FOR EACH ROW BEGIN UPDATE sales_mvl SET product_name = NEW.product_name , product_price = NEW.product_price , product_amount = NEW.product_amount , sales_id = NEW.sales_id , product_ts = CURRENT_TIMESTAMP() WHERE sales_id = OLD.sales_id; END; $$ DELIMITER ;
And a Stored Procedure for refreshing the Materialized View is built. Possible modes are:
- REFRESH FULL (up to now)
- REFRESH (up to a certain time stamp)
- REBUILD (fully rebuild and clean MV log table)
DELIMITER $$ CREATE PROCEDURE refresh_mv ( IN method VARCHAR(16) , IN ts TIMESTAMP , OUT rc INT ) BEGIN IF UPPER(method) = 'REBUILD' THEN TRUNCATE TABLE sales_mvl; TRUNCATE TABLE sales_mv; INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount), AVG(product_price) , AVG(product_amount), COUNT(*) FROM sales GROUP BY product_name ; ELSEIF UPPER(method) = 'REFRESH FULL' THEN REPLACE INTO sales_mv SELECT product_name, SUM(price_sum), SUM(amount_sum) , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt) , SUM(sales_cnt) FROM ( SELECT product_name, price_sum, amount_sum, sales_cnt FROM sales_mv UNION ALL SELECT product_name , SUM(product_price), SUM(product_amount), COUNT(*) FROM sales_mvl GROUP BY product_name ) x GROUP BY product_name ; TRUNCATE TABLE sales_mvl; SET rc = 0; ELSEIF UPPER(method) = 'REFRESH' THEN REPLACE INTO sales_mv SELECT product_name, SUM(price_sum), SUM(amount_sum) , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt) , SUM(sales_cnt) FROM ( SELECT product_name, price_sum, amount_sum, sales_cnt FROM sales_mv UNION ALL SELECT product_name , SUM(product_price), SUM(product_amount), COUNT(*) FROM sales_mvl WHERE product_ts < ts GROUP BY product_name ) x GROUP BY product_name ; DELETE FROM sales_mvl WHERE product_ts < ts ; SET rc = 0; ELSE SET rc = 1; END IF; END; $$ DELIMITER ;
And now let us test if it works correctly...
INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1); wait some time INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2); wait some time INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3); wait some time INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2); wait some time INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4); wait some time INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3); SELECT * from sales_mvl; SELECT * from sales_v; CALL refresh_mv('REFRESH', '2006-11-06 10:57:55', @rc); SELECT * from sales_v; CALL refresh_mv('REFRESH FULL', NULL, @rc); SELECT * from sales_v; CALL refresh_mv('REBUILD', NULL, @rc); SELECT * from sales_v;
Some performance benchmarks for our Materialized Views:
For the performance benchmark a set of 100 k sales rows was created. This set of rows should simulate a continuous INSERT flow into our database. The whole test was done when all files and all database table were cache in memory. To avoid side effects during measurement no other activities should happen on this machine.
Load into table sales without any triggers as baseline:
Test | time [s] | Ref. |
---|---|---|
LOAD DATA INFILE | 0.90 | [1] |
Multi row INSERT | 2.85 | [2] |
Single row INSERT | 13.2 | [3] |
Single row INSERT without LOCK TABLE | 15.9 | [4] |
FULL REFRESH of sales_mv | 0.64 | [5] |
SELECT on MV :-) | 0.00 | [6] |
Load into table sales with some Materialized View functionality:
Test | time [s] | Ref. |
---|---|---|
LOAD DATA INFILE with REFRESH IMMEDIATE | 40.8 | [1] |
Single row INSERT without LOCK TABLE with REFRESH IMMEDIATE | 109 | [4] |
Single rows INSERT without LOCK TABLE with REFRESH DEFERRED | 22.8 | [4] |
Refresh MV with about 40% of the data | 0.82 | |
Refresh MV with next 40% of the data | 0.98 | |
Refresh MV with last 20% of the data | 0.14 |
Outlook
- If you have a continuous data flow and/or concurrent transactions it may be better to use InnoDB instead of MyISAM tables.
- Locking the table in the Triggers/Stored Procedure may prevent wrong data this has to be tested.
- Delayed INSERT may help to speed up the load process.
- Eventually it makes sense to build/refresh the Materialized Vied in parallel?
Conclusion
- Triggers in MySQL (5.0.27) are not terribly fast.
- Materialized Views can help to speed up queries which rely heavily on some aggregated results.
- If INSERT speed is not a matter this feature can help to decrease read load on the system.
- It is a little tricky to implement.
- MV come only in play if huge amount of data are used and database are not memory based anymore.
Literature
[ 1 ]LOAD DATA INFILE '/tmp/sales_outfile_100k.dmp' INTO TABLE sales;
[ 2 ]
time mysql -u root test << EOF; TRUNCATE TABLE sales; SOURCE /tmp/sales_multirowinsert_100k.sql EOF
[ 3 ]
time mysql -u root test << EOF; TRUNCATE TABLE sales; SOURCE tmp/sales_extended_100k.sql EOF
[ 4 ]
time mysql -u root test << EOF; TRUNCATE TABLE sales; SOURCE /tmp/sales_extended_nolock_100k.sql EOF
[ 5 ]
INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount) , AVG(product_price), AVG(product_amount) , COUNT(*) FROM sales GROUP BY product_name;
[ 6 ]
SELECT * FROM sales_mv;
[ 7 ] Wikipedia: Materialized Views