DBA Data[Home] [Help]

PACKAGE: SYS.UTL_URL

Source


1 PACKAGE utl_url AS
2 
3   /*********************************************************************
4 
5   A Uniform Resource Locator (URL) is a string that identifies a resource
6   (such as a Web page or a picture) on the Web that can be accessed usually
7   via the HyperText Transfer Protocol (HTTP).  For example, the URL of the
8   front-page of Oracle's Web site is "http://www.oracle.com/".
9 
10   Normally, a URL contains English alphabets, digits, and some punctuation
11   characters.  They are called the unreserved characters.  Any other
12   characters (including multi-byte characters) or binary octet codes in a
13   URL must be escaped in order that it can be safely handled by a
14   Web browser or a Web server.  Some punctuation characters, such as
15   "$", "?", ":", and "=", are reserved as delimiters in a URL.
16   They are called the reserved characters.  If such characters are to be
17   taken literally instead of being treated as delimiters, they must be
18   escaped as well.
19 
20   The unreserved characters consist of:
21     - "A" - "Z", "a" - "z",
22     - "0" - "9",
23     - "-", "_", ".", "!", "~", "*", "'", "(", and ")"
24 
25   The reserved characters consist of:
26     - ";", "/", "?", ":". "@", "&", "=". "+", "$", ",", "[", "]"
27 
28   This package provides two functions that escape and unescape characters
29   in a URL.  The escape function should be used to escape a URL before the URL
30   is used fetch a Web page via the UTL_HTTP package.  The unescape function
31   should be used to unescape an escaped URL before information is extracted
32   from the URL.
33 
34   For more information, refer to the Request For Comments (RFC) document
35   RFC2396. Note that this URL escape and unescape mechanism is different from
36   the x-www-form-urlencoded encoding mechanism described in the HTML
37   specification:
38 
39     http://www.w3.org/TR/html
40 
41   You can implement the x-www-form-urlencoded encoding using the
42   UTL_URL.ESCAPE function as follows:
43 
44     CREATE OR REPLACE FUNCTION form_url_encode(
45       data    IN VARCHAR2,
46       charset IN VARCHAR2) RETURN VARCHAR2 AS
47     BEGIN
48       RETURN utl_url.escape(data, TRUE, charset); -- note use of TURE
49     END;
50 
51   Notice that this form_url_encode function encodes space characters in "%HH"
52   hex code format instead of "+" as stipulated by the form-URL-encode scheme.
53   However, this function will cause no noticeable difference to applications
54   that depend on the form-URL-encode scheme to submit data to a Web server in
55   most cases. Most Web servers will be able to decode the submitted data
56   correctly. If the user's Web server does not accept space characters encoded
57   in "%HH" hex code format, the user will have to modify the form_url_encode
58   function to selectively encode space characters as "+" and encode the
59   remaining characters using the UTL_URL.ESCAPE function.
60 
61   For decoding data encoded with the form-URL-encode scheme, the following
62   function implements the decording scheme:
63 
64     CREATE OR REPLACE FUNCTION form_url_decode(
65       data    IN VARCHAR2,
66       charset IN VARCHAR2) RETURN VARCHAR2 AS
67     BEGIN
68       RETURN utl_url.unescape(replace(data, '+', ' '), charset);
69     END;
70 
71   *********************************************************************/
72 
73   -- Exceptions
74   bad_url                  EXCEPTION; -- URL contains badly formed escape code
75   bad_fixed_width_charset  EXCEPTION; -- Fixed-width multibyte character set
76                                       -- not allowed for a URL
77   PRAGMA EXCEPTION_INIT(bad_url,                 -29262);
78   PRAGMA EXCEPTION_INIT(bad_fixed_width_charset, -29274);
79 
80   /**
81    * Returns the URL with illegal characters (and optionally reserved
82    * characters) escaped using "%2-digit-hex-code" format.
83    *
84    * PARAMETERS
85    *   url                    The URL to escape
86    *   escape_reserved_chars  Escape the reserved characters as well or not?
87    *   url_charset            When escaping a URL, what is the character
88    *                          set that URL should be converted to before
89    *                          the URL is escaped in %hex-code format?
90    *                          If url_charset is NULL, the database
91    *                          charset is assumed and no character set
92    *                          conversion will occur.  The default value is
93    *                          the current default body character set of the
94    *                          UTL_HTTP package, whose default value is
95    *                          "ISO-8859-1".  The character set can be named
96    *                          in Internet Assigned Numbers Authority (IANA) or
97    *                          Oracle naming convention.
98    * EXCEPTIONS
99    *   bad_fixed_width_charset  when the url_charset is a fixed-width
100    *                            multibyte character set that is not allowed as
101    *                            an encoding of a URL as the character set
102    *                            does not contain the "%" or other single-byte
103    *                            characters.
104    *   + plus miscellaneous runtime exceptions.
105    * NOTES
106    *   Normally, a user will escape the whole URL, which contains the
107    * reserved characters (delimiters) that should not be escaped.
108    * For example,
109    *
110    *   utl_url.escape('http://www.acme.com/a url with space.html')
111    *
112    * will return
113    *
114    *   'http://foo.com/a%20url%20with%20space.html'
115    *
116    *   In other situations, a user may want to send a query string with a
117    * value that contains reserved characters.  In that case, he should escape
118    * just the value fully (with escape_reserved_chars set to TRUE) and then
119    * concatenate it with the rest of the URL.  For example,
120    *
121    *   url := 'http://www.acme.com/search?check=' ||
122    *             utl_url.escape('Is the use of the "$" sign okay?', TRUE);
123    *
124    * That will escape the "?", "$", and space characters in
125    * 'Is the use of the "$" sign okay?' but not the "?" after "search" in the
126    * URL that denotes the use of a query string.
127    *
128    *   Note that the Web server that a user intends to fetch Web pages from
129    * may use a character set that is different from that of the user's
130    * database.  In that case, the user must specify the url_charset
131    * as the Web server's character set so that the characters that need
132    * to be escaped are escaped in the URL character set.  For example,
133    * a user of an EBCDIC database who wants to access an ASCII Web server
134    * should escape the URL using "US7ASCII" so that a space is escaped
135    * as "%20" (hex code of a space in ASCII) instead of "%40" (hex code
136    * of a space in EBCDIC).  When the url_charset is specified, the
137    * escape function will convert the URL to the URL character set,
138    * escape the URL, and convert the escaped URL from the URL character set
139    * back to the database character set.
140    *
141    *   This function does not validate a URL for the proper URL format.
142    */
143   FUNCTION escape(url                   IN VARCHAR2 CHARACTER SET ANY_CS,
144                   escape_reserved_chars IN BOOLEAN  DEFAULT FALSE,
145                   url_charset           IN VARCHAR2 DEFAULT
146                                                     utl_http.get_body_charset)
147                   RETURN VARCHAR2 CHARACTER SET url%CHARSET;
148 
149   /**
150    * Unescapes the escape character sequences to their original form in an URL,
151    * namely to convert "%XX" escape character sequences to the original
152    * characters.
153    *
154    * PARAMETERS
155    *   url              The URL to unescape
156    *   url_charset      When unescaping a URL, what is the character
157    *                    set that URL should be converted to before
158    *                    the URL is unescaped from %hex-code format?
159    *                    If url_charset is NULL, the database
160    *                    charset is assumed and no character set
161    *                    conversion will occur.  The default value is
162    *                    the current default body character set of the
163    *                    UTL_HTTP package, whose default value is
164    *                    "ISO-8859-1".  The character set can be named
165    *                    in Internet Assigned Numbers Authority (IANA) or
166    *                    Oracle naming convention.
167    *
168    * EXCEPTIONS
169    *   bad_url                  when the URL contains badly-formed
170    *                            escape codes.
171    *   bad_fixed_width_charset  when the url_charset is a fixed-width
172    *                            multibyte character set that is not allowed as
173    *                            an encoding of a URL as the character set
174    *                            does not contain the "%" or other single-byte
175    *                            characters.
176    *   + plus miscellaneous runtime exceptions.
177    * NOTES
178    *   Note that the Web server that a user receives the URL from
179    * may use a character set that is different from that of the user's
180    * database.  In that case, the user must specify the url_charset
181    * as the Web server's character set so that the characters that need
182    * to be unescaped are unescaped in the URL character set.  For example,
183    * user of an EBCDIC database who receives a URL from an ASCII Web server
184    * should unescape the URL using "US7ASCII" so that "%20" is unescaped as
185    * a space (0x20 is the hex code of a space in ASCII) instead of a "?"
186    * (because 0x20 is not a valid character in EBCDIC).  When the
187    * url_charset is specified, the unescape function converts the URL to
188    * the URL character set, unescape the URL, and convert the unescaped URL
189    * from the URL character set back to the database character set.
190    *
191    *   This function does not validate a URL for the proper URL format.
192    */
193   FUNCTION unescape(url         IN VARCHAR2 CHARACTER SET ANY_CS,
194                     url_charset IN VARCHAR2 DEFAULT
195                                             utl_http.get_body_charset)
196                     RETURN VARCHAR2 CHARACTER SET url%CHARSET;
197 
198 END;