source: Dev/branches/rest-dojo-ui/elasticsearch/README.textile @ 408

Last change on this file since 408 was 365, checked in by jkraaijeveld, 13 years ago
File size: 7.7 KB
RevLine 
[365]1h1. ElasticSearch
2
3h2. A Distributed RESTful Search Engine
4
5h3. "http://www.elasticsearch.org":http://www.elasticsearch.org
6
7ElasticSearch is a distributed RESTful search engine built for the cloud. Features include:
8
9* Distributed and Highly Available Search Engine.
10** Each index is fully sharded with a configurable number of shards.
11** Each shard can have one or more replicas.
12** Read / Search operations performed on either one of the replica shard.
13* Multi Tenant with Multi Types.
14** Support for more than one index.
15** Support for more than one type per index.
16** Index level configuration (number of shards, index storage, ...).
17* Various set of APIs
18** HTTP RESTful API
19** Native Java API.
20** All APIs perform automatic node operation rerouting.
21* Document oriented
22** No need for upfront schema definition.
23** Schema can be defined per type for customization of the indexing process.
24* Reliable, Asynchronous Write Behind for long term persistency.
25* (Near) Real Time Search.
26* Built on top of Lucene
27** Each shard is a fully functional Lucene index
28** All the power of Lucene easily exposed through simple configuration / plugins.
29* Per operation consistency
30** Single document level operations are atomic, consistent, isolated and durable.
31* Open Source under Apache 2 License.
32
33h2. Getting Started
34
35First of all, DON'T PANIC. It will take 5 minutes to get the gist of what ElasticSearch is all about.
36
37h3. Installation
38
39* "Download":http://www.elasticsearch.org/download and unzip the ElasticSearch official distribution.
40* Run @bin/elasticsearch -f@ on unix, or @bin/elasticsearch.bat@ on windows.
41* Run @curl -X GET http://localhost:9200/@.
42* Start more servers ...
43
44h3. Indexing
45
46Lets try and index some twitter like information. First, lets create a twitter user, and add some tweets (the @twitter@ index will be created automatically):
47
48<pre>
49curl -XPUT 'http://localhost:9200/twitter/user/kimchy' -d '{ "name" : "Shay Banon" }'
50
51curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '
52{
53    "user": "kimchy",
54    "postDate": "2009-11-15T13:12:00",
55    "message": "Trying out Elastic Search, so far so good?"
56}'
57
58curl -XPUT 'http://localhost:9200/twitter/tweet/2' -d '
59{
60    "user": "kimchy",
61    "postDate": "2009-11-15T14:12:12",
62    "message": "Another tweet, will it be indexed?"
63}'
64</pre>
65
66Now, lets see if the information was added by GETting it:
67
68<pre>
69curl -XGET 'http://localhost:9200/twitter/user/kimchy?pretty=true'
70curl -XGET 'http://localhost:9200/twitter/tweet/1?pretty=true'
71curl -XGET 'http://localhost:9200/twitter/tweet/2?pretty=true'
72</pre>
73
74h3. Searching
75
76Mmm search..., shouldn't it be elastic?
77Lets find all the tweets that @kimchy@ posted:
78
79<pre>
80curl -XGET 'http://localhost:9200/twitter/tweet/_search?q=user:kimchy&pretty=true'
81</pre>
82
83We can also use the JSON query language ElasticSearch provides instead of a query string:
84
85<pre>
86curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty=true' -d '
87{
88    "query" : {
89        "text" : { "user": "kimchy" }
90    }
91}'
92</pre>
93
94Just for kicks, lets get all the documents stored (we should see the user as well):
95
96<pre>
97curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -d '
98{
99    "query" : {
100        "matchAll" : {}
101    }
102}'
103</pre>
104
105We can also do range search (the @postDate@ was automatically identified as date)
106
107<pre>
108curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -d '
109{
110    "query" : {
111        "range" : {
112            "postDate" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:00:00" }
113        }
114    }
115}'
116</pre>
117
118There are many more options to perform search, after all, its a search product no? All the familiar Lucene queries are available through the JSON query language, or through the query parser.
119
120h3. Multi Tenant - Indices and Types
121
122Maan, that twitter index might get big (in this case, index size == valuation). Lets see if we can structure our twitter system a bit differently in order to support such large amount of data.
123
124ElasticSearch support multiple indices, as well as multiple types per index. In the previous example we used an index called @twitter@, with two types, @user@ and @tweet@.
125
126Another way to define our simple twitter system is to have a different index per user (though note that an index has an overhead). Here is the indexing curl's in this case:
127
128<pre>
129curl -XPUT 'http://localhost:9200/kimchy/info/1' -d '{ "name" : "Shay Banon" }'
130
131curl -XPUT 'http://localhost:9200/kimchy/tweet/1' -d '
132{
133    "user": "kimchy",
134    "postDate": "2009-11-15T13:12:00",
135    "message": "Trying out Elastic Search, so far so good?"
136}'
137
138curl -XPUT 'http://localhost:9200/kimchy/tweet/2' -d '
139{
140    "user": "kimchy",
141    "postDate": "2009-11-15T14:12:12",
142    "message": "Another tweet, will it be indexed?"
143}'
144</pre>
145
146The above index information into the @kimchy@ index, with two types, @info@ and @tweet@. Each user will get his own special index.
147
148Complete control on the index level is allowed. As an example, in the above case, we would want to change from the default 5 shards with 1 replica per index, to only 1 shard with 1 replica per index (== per twitter user). Here is how this can be done (the configuration can be in yaml as well):
149
150<pre>
151curl -XPUT http://localhost:9200/another_user/ -d '
152{
153    "index" : {
154        "numberOfShards" : 1,
155        "numberOfReplicas" : 1
156    }
157}'
158</pre>
159
160Search (and similar operations) are multi index aware. This means that we can easily search on more than one
161index (twitter user), for example:
162
163<pre>
164curl -XGET 'http://localhost:9200/kimchy,another_user/_search?pretty=true' -d '
165{
166    "query" : {
167        "matchAll" : {}
168    }
169}'
170</pre>
171
172Or on all the indices:
173
174<pre>
175curl -XGET 'http://localhost:9200/_search?pretty=true' -d '
176{
177    "query" : {
178        "matchAll" : {}
179    }
180}'
181</pre>
182
183{One liner teaser}: And the cool part about that? You can easily search on multiple twitter users (indices), with different boost levels per user (index), making social search so much simpler (results from my friends rank higher than results from my friends friends).
184
185h3. Distributed, Highly Available
186
187Lets face it, things will fail....
188
189ElasticSearch is a highly available and distributed search engine. Each index is broken down into shards, and each shard can have one or more replica. By default, an index is created with 5 shards and 1 replica per shard (5/1). There are many topologies that can be used, including 1/10 (improve search performance), or 20/1 (improve indexing performance, with search executed in a map reduce fashion across shards).
190
191In order to play with Elastic Search distributed nature, simply bring more nodes up and shut down nodes. The system will continue to serve requests (make sure you use the correct http port) with the latest data indexed.
192
193h3. Where to go from here?
194
195We have just covered a very small portion of what ElasticSearch is all about. For more information, please refer to: .
196
197h3. Building from Source
198
199ElasticSearch uses "Maven":http://maven.apache.org for its build system.
200
201In order to create a distribution, simply run the @mvn package -DskipTests@ command in the cloned directory.
202
203The distribution will be created under @target/releases@.
204
205h1. License
206
207<pre>
208This software is licensed under the Apache 2 license, quoted below.
209
210Copyright 2009-2011 Shay Banon and ElasticSearch <http://www.elasticsearch.org>
211
212Licensed under the Apache License, Version 2.0 (the "License"); you may not
213use this file except in compliance with the License. You may obtain a copy of
214the License at
215
216    http://www.apache.org/licenses/LICENSE-2.0
217
218Unless required by applicable law or agreed to in writing, software
219distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
220WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
221License for the specific language governing permissions and limitations under
222the License.
223</pre>
Note: See TracBrowser for help on using the repository browser.