아름다운프로'의 JTOP.org 백업사이트 :: INTERMEDIA TEXT

INTERMEDIA TEXT - XML 문서 검색 방법

2003. 8. 28. 16:27

No. 17052

INTERMEDIA TEXT - XML 문서 검색 방법
====================================

PURPOSE
-------
XML 형식의 문서를 Intermedia text 를 통해 어떻게 검색할 수 있는 지
알아보자.

Explanation
-----------
Intermeida text 에서는 XML_SECTION_GROUP과 AUTO_SECTION_GROUP 을
통해서 XML 문서의 검색을 실행할 수 있다. 이 두 가지 GROUP에 대해
확인해 보자.

1. XML_SECTION_GROUP

Oracle 8.1.6 intermedia text 부터 xml 지원이 강화되어
XML_SECTION_GROUP 의 기능이 향상되었다. 다음과 같은 기능들을 이용할
수 있다.

(1) 이름공간을 이용하여 서로 다른 DTD 의 태그들을 구분할 수 있다.
  => 구분된 태그에 대해 질의어가 가능하다 .

EX) ctx_ddl.add_field_section('my_section_group','section_name','tag_name');

(2) XML 속성 (Attribute)에 대해서 인덱스를 생성할 수 있고
  속성의 값을 이용하여 검색할 수 있다 .

EX) ctx_ddl.add_attr_section('my_section_group','section_name','tag_name@attr_name');

(3) XML 문서가 변경된 경우, 예를 들어 새로운 태그가 추가되거나 새로운
  doctype 으로 변경되는 경우, 인덱스를 재구축하지 않고 새로운 섹션를
  추가할 수 있다 .

EX) alter index <index_name> rebuild parameters
       ('add (zone|field) section <section_name> tag <tag_name>');

(4) XML 내의 한글 검색이 지원된다.

2. AUTO_SECTION_GROUP

8.1.6 에서는 XML 검색을 위한 AUTO_SECTION_GROUP 이 새롭게 추가되었다.
AUTO_SECTION_GROUP은 XML_SECTION_GROUP과 비슷하지만 다른 점은 섹션
(section)을 미리 정의하지 않아도 non-empty 태그가 자동적으로 태그와
같은 섹션이름을 가지고 zone section 으로 인덱싱된다는 점이다.

예를 들어, 다음의 XML 문서를 보자.

<book>
<author>Neal Stephenson</author>
<title>The Diamon Age</title>
<description> Decades into our future...</description>
</book>

위 예에서 auto sectioner는 book, author, title, description 의 이름을
가진 zone section 을 생성하게 되어 "diamond within description"과
같은 질의어 처리를 가능하게 된다.
여기서 주의할 점은 섹션 이름이 case-insensitive 하다는 점이다. XML에서
태그 이름은 case-sensitive 하기 때문에 "diamond within description"과
"diamond within DESCRIPTION" 은 차이가 있어야 함에도 불구하고 위 두개의
질의어가 같이 취급된다.

AUTO_SECTION_GROUP은 모든 태그들에 대해 인덱싱을 하기 때문에 실제로
질의에 사용되지 않은 태그들에 대해서는 인덱싱을 할 필요가 없다.
이를 위해서 다음과 같이 add_stop_section 을 사용한다 .

ctx_ddl.add_stop_section(‘mysg’,’author’)

또는

alter index <index name> rebuild
parameters (‘add stop section author’)

을 사용할 수 있다.

Example
-------
1. Test Script 1

rem -----------------------------------------------
rem 이름 충돌이 발생하지 않도록 section 을 만든 경우
rem -----------------------------------------------

set echo on
drop table xmlsect;

create table xmlsect(
id number(5) primary key,
text varchar2(200) );

set define off

insert into xmlsect values (1, '<!DOCTYPE contact> <contact>
<address>506 Blue Pool Road</address> <email>dudeman@radical.com</email>
</contact>');

insert into xmlsect values (2, '<!DOCTYPE mail> <mail> <address>
dudeman@radical.com </address></mail>');

commit;
set echo on
drop index xmlsect_idx;

exec ctx_ddl.drop_section_group('my_xml_section_group');
begin
ctx_ddl.create_section_group ('my_xml_section_group','xml_section_group');
ctx_ddl.add_field_section ('my_xml_section_group','email','email');
ctx_ddl.add_field_section ('my_xml_section_group','address','address');
ctx_ddl.add_field_section ('my_xml_section_group','email','(mail)address');
end;
/

create index xmlsect_idx on xmlsect ( text )
indextype is ctxsys.context
parameters ( 'section group my_xml_section_group' );

위의 스크립트는 이름공간을 이용하여 서로 다른 DTD 의 태그들을 구분할 수
있는 예를 보인 것이다. 따라서, 다음의 질의 처리 시 email section 내의
'radical'을 모두 찾아 주게 된다.

select * from xmlsect where contains (text, 'radical within email') > 0;

2. Test script 2:

rem ------------------------------------------------------------
rem Attribute 를 가진 경우와 Attribute 에 한글을 포함한 경우의 시험
rem ------------------------------------------------------------

set echo on
drop table xmlsect;

ctx_ddl.drop_preference('my_han_pref');

create table xmlsect(
id number(5) primary key,
text varchar2(200) );

set define off

rem (1) Normal Attribute 의 경우 --> OK
rem insert into xmlsect values ( 1,
rem '<comment author="jeeves"> I really like InterMedia Text </comment>');
rem (2) Attribute 가 한글 일 경우 --> OK
rem insert into xmlsect values ( 2,
rem '<comment 작가 ="jeeves"> I really like InterMedia Text </comment>');
rem (3) Attribute 가 한글 일 경우 --> OK

insert into xmlsect values ( 3,
'<comment 작가 ="제임스 "> I really like InterMedia Text </comment>');
commit;

set echo on
drop index xmlsect_idx;

exec ctx_ddl.drop_section_group('my_xml_section_group');
begin
ctx_ddl.create_section_group ( 'my_xml_section_group', 'xml_section_group');
-- (1)의 remark 가 제거된 경우
-- ctx_ddl.add_attr_section('my_xml_section_group', 'author', 'comment@author');
-- (2)와 (3)의 remark 가 제거된 경우
ctx_ddl.add_attr_section ( 'my_xml_section_group', '작가 ', 'comment@작가 ');
end;/

rem KOREAN_LEXER 를 사용하면 한글 처리에 있어 유사형태의 한글 token 을 생성해준다 .
rem exec ctx_ddl.create_preference('my_han_pref','KOREAN_LEXER');

create index xmlsect_idx on xmlsect ( text )
indextype is ctxsys.context
parameters ( 'section group my_xml_section_group');

rem 한글 LEXER 를 사용할 때 위의 ctx_ddl.create_preference('my_han...')의 remark 제거
rem parameters ( 'section group my_xml_section_group lexer my_han_pref' );

XML 속성 (Attribute)에 대해서 인덱스를 생성할 수 있고 속성의 값을
이용하여 검색할 수 있는 예를 보인 스크립트이다. 그리고, 한글일 경우에
대해서도 함께 테스트를 해 보았다. 따라서 아래의 질의어 수행이 가능하다.

¨ 위의 Test script1 에서 (1)의 경우
rem select * from xmlsect where contains (text, 'jeeves within author') > 0;

¨ 위의 Test script1 에서 (2)의 경우
rem select * from xmlsect where contains (text, 'jeeves within 작가 ') > 0;

¨ 위의 Test script1 에서 (3)의 경우
rem select * from xmlsect where contains (text, '제임 within 작가 ') > 0;

3. Test script 3

rem --------------------------------------------------------
rem 이름 충돌이 발생하지 않도록 section 을 만든 경우
rem --------------------------------------------------------

set echo on
drop table xmlsect;

create table xmlsect(
id number(5) primary key,
text varchar2(200) );

set define off

insert into xmlsect values (
1, '<!DOCTYPE contact> <contact> <address>506 Blue Pool Road</address>
<email>dudeman@radical.com</email></contact>');
commit;

set echo on
drop index xmlsect_idx;

exec ctx_ddl.drop_section_group('my_xml_section_group');
begin
ctx_ddl.create_section_group ( 'my_xml_section_group', 'xml_section_group');
ctx_ddl.add_field_section ( 'my_xml_section_group', 'email', 'email');
ctx_ddl.add_field_section ( 'my_xml_section_group', 'address', 'address');
end;
/

create index xmlsect_idx on xmlsect ( text )
indextype is ctxsys.context
parameters ( 'section group my_xml_section_group' );

insert into xmlsect values (
2, '<!DOCTYPE contact> <contact><title>Hello World</title> <address>BSKO
</address><email>bsko@oracle.com</email></contact>');

alter index xmlsect_idx rebuild
parameters ('add field section mytest tag title');

select token_text from dr$xmlsect_idx$i;
select * from xmlsect where contains (text,'Hello within title')>0;

XML 문서가 변경된 경우, 예를 들어 새로운 태그가 추가되거나 새로운
doctype 으로 변경되는 경우, 인덱스를 재구축하지 않고 새로운 섹션을
추가할 수 있다. 위의 스크립트는 이미 XMLSECT_IDX 인덱스를 만든 상태에서
XML 문서에 새로운 태그인 'title'을 추가한 예이다. 물론, 태그의 변경사항
은 인덱스가 이미 구성된 문서에 대해서는 적용되지 않는다 .

Reference Documents
-------------------
Oracle8i Intermedia Text Reference Manual




***** 아름다운프로님에 의해서 게시물 복사 + 카테고리변경되었습니다 (2003-12-18 16:49)