TOPIC: case insensetive search and index use
#1647
case insensetive search and index use 9 Months ago
Hi,

I am wonder if i need search a lot with case insensetive, do i need normal index(attribute) or index(lower(attribute)) like i do with oracle? So maybe i need 2 indexes for an attribute, 1 for normal case + 1 for all lower case? Thank you....
 
 
#1648
Re:case insensetive search and index use 9 Months ago
If you want to search both ways, you would need both indexes.

Warning: lower indexes are not well tested - we do not use them in our applications yet. Let me know if you run into any problems.

I do not think AccessControl Select will automatically use lower indexes so they will only be beneficial in your own queries.

This area needs more work, contributions are welcome.
 
 
andrew
 
#1656
Re:case insensetive search and index use 9 Months ago
Maybe the need of lower index depends on performance, looking fast with normal index. Is there test results of speed of database searching? I try making a load test but difficult to make a good one, also benchmark is good if want to convince people of using suneido database (not a famouse brand...). In case i search lower case in index with mixed case is then index ignore or still used? Thank you...
 
 
#1659
Re:case insensetive search and index use 8 Months, 4 Weeks ago
I do not have any good speed comparisons. The database is usually quite fast as long as you have enough memory relative to the size of the database.

If you have a normal index (not lower) with mixed upper and lower case, and the user wants to search ignoring case, then it is almost impossible to use the index.

Currently, lower indexes are not completely implemented. There is no easy way to specify "ignore case" in a query, so queries can not take advantage of lower indexes.

Currently, the best way to implement this (if you need it) is to make an extra field that contains the lower case version of another field. Then you can index this extra field. If you want to ignore case, search on this new field.
 
 
andrew
 
#1663
Re:case insensetive search and index use 8 Months, 3 Weeks ago
Postgresql answer for this question is to use function indexes, and the the query optimizer will use indexes as long as they use the same signature, example:

create table mylist(id, name, phone)
key(id)
index(lower(name))

query:
mylist where name = 'andersen'
// the above query will not use index(lower(name)) because the field name signature isn't the same as the available index "name <> lower(name)"

but query:
mylist where lower(name) = 'andersen'
// the above query will use the index index(lower(name)) because the field name signature is the same as the available index "lower(name) = lower(name)"

The PostgreSQL documentation have a better explanation of this.

The implementation of function indexes open a lot of possibilities, in my case with Spanish language we have accented chars that adds to the problem of searching, in postgresql I solved it with function indexes and in sqlite3 with collate functions.

Now mentioning sqlite I remember collate specifications could be another way of solving the problem.
 
 
Last Edit: 2009/11/28 20:39 By mingodad.
 
#1668
Re:case insensetive search and index use 8 Months, 3 Weeks ago
That seems like a good approach.

One question I have with function indexes is how to handle when the function is changed. You would need to recreate the index (assuming it stored the result of the function) but how does the database reliably know whether the function code has changed. Especially since the change might be in a function that is called by the index function.

For our applications, I think a separate search "index" might be best. In the help we use clucene, but that is quite "heavy". Perhaps we could implement a lightweight version of clucene within Suneido, using a database table to store the search index.
 
 
andrew
 
#1676
Re:case insensetive search and index use 8 Months, 3 Weeks ago
In my app i get a search like: document where name =~ "(?i)(?q)abc"
The ?i or ?q seam to mean some special option like ignore case, right? could this not be used to connect to an index?
Example: "index(lower(name)) as ?i" will mean that if option ?i is use then take this index as prefered index.
 
 
#1679
Re:case insensetive search and index use 8 Months, 3 Weeks ago
Yes, it could be smart enough to look inside regular expressions. For best efficiency it would need to be "(?i)(?q)^..." (prefix)

Currently regular expressions do not use indexes fully. If the field is indexed, it can use the index, but only to do a full linear scan of the index. In this case a lower index is not necessary because you can just do: where Lower(field) == "lower case value" The only difference is that Lower(field) will be done for each record, but that will be minor compared to disk access.

If you use equals/greater/less/in then it can just read a range of the index which is faster.
 
 
andrew
 
#1680
Re:case insensetive search and index use 8 Months, 3 Weeks ago
Use of equals, greater... depend on user choice. Anyway i am going to set up load test to do some test for index, combined index, etc. probably better to see stuff work in context of real app. Any tip for measure timing?
 
 
#1681
Re:case insensetive search and index use 8 Months, 3 Weeks ago
There is a Timer function you can use (see the help)

Or you can insert Print(Date()) into standard code like AccessControl
 
 
andrew
 
#1691
Re:case insensetive search and index use 8 Months, 3 Weeks ago
Today I start the load test. First result look good but get problem when inserting new test records. database freese and now can not start/restore db. also backup not starting. Got fatal error: cant create file mapping for database. Can i save my work? seem like db and backup are lost :-(
 
 
#1692
Re:case insensetive search and index use 8 Months, 3 Weeks ago
Have you tried restarting Windows?

Can't create file mapping usually means you've run out of virtual address space. Suneido needs 1gb of virtual address space to memory map the database file.

How big is the database? I think the current limit is 32gb.
 
 
andrew
 
#1698
Re:case insensetive search and index use 8 Months, 3 Weeks ago
Thanks for information. I already fix it, had old rescue backup from before :-D renamed file to suneido.db and it works.
Problem was: running from small memorystick, was full and vista give no warning just freezing, breaking of seem to corrupt the file somehow and restoring fails. Now just run from harddrive, 4 gb ram.
First load tests look good: 1 million records, search case insensitive, no indexes set: only average 4 seconds, with index: under 1 sec. But need to setup good test set first. Will keep you update...
 
 
#1702
Re:case insensetive search and index use 8 Months, 3 Weeks ago
I try do test with 1 million rows in table. Test on search 3 columns ignoring case and partial search has average 4 sec. searching. No index yet. Then i adds the indexes and search time go up, not down. Do I miss something after add index? Oracle need statistics update, also for suneido?
 
 
#1703
Re:case insensetive search and index use 8 Months, 2 Weeks ago
No, Suneido does not require updating statistics.

That is strange that the time went up. The indexes will increase the size of the database file, but that should not have much effect on speed unless it causes more swapping.

Maybe it is just because of the after-effects of creating the indexes. Did you test multiple times?

In the worst case, creating more indexes can hurt performance - if Suneido chooses the wrong index for the query. For example, before it might have been using a small primary key index. Now it might be using a larger description index which might actually be slower depending on what percentage of records your query is selecting.

Try timing a query that does not match any records.

Try searching with "equals" on an indexed field - that should be faster.

NOTE: If you are searching with "contains" then indexes may not help much, unless the data records are large.
 
 
andrew
 
#1704
Re:case insensetive search and index use 8 Months, 2 Weeks ago
table:
doc(nr,name,author,subject)
key(nr)
index(name)
index(author)
index(subject)
content: 1 million rows

tests like:
1. doc where name =~ '(?i)document_naam3000' and author =~ '(?i)auteur300' and subject =~ '(?i)subjecT0' (8 rows)
2. doc where name =~ 'document_naam3000' and author =~ 'auteur300' and subject =~ 'subject0' (2rows)
3. doc where name = 'document_naam3000' or (author = 'auteur300' and subject = 'subject0') (2 rows)
4. doc where name = 'document_naam3ooo' or (author = 'auteur3oo' and subject = 'subjecto') (0 rows)

tested 10 times in the row:
Timer(){
for (i = 0; i < 10; ++i)
{QueryApply("doc where name = 'document_naam3oo' or (author = 'auteur3oo' and subject = 'subjecto')")
{|x|}}
Date()}

all result around 40 secs per 10, number of index make no change. also complex index and no index is same result.

PS: i use suneido071021 on vista 4gb rams
 
 
#1705
Re:case insensetive search and index use 8 Months, 2 Weeks ago
None of those queries is "good" for indexing.

(1) and (2) use regular expressions (=~)

(3) and (4) use "or"

A "good" query for indexing would use equals and "and", for example:

doc where author = 'auteur3oo' and subject = 'subjecto'

It might help if you can post the query strategy. You can get this with QueryStrategy(query) or with the "?" button in QueryView.

You could try a more recent suneido.exe - there have been a few changes to the query optimization, although mostly for more complex queries. One change was to disable a certain kind of index use because the costs were higher than the benefits. This could be affecting some of your queries (look for "%" in the strategy).
 
 
andrew
 
#1708
Re:case insensetive search and index use 8 Months, 2 Weeks ago
Hi, thanks for inormation. Seem like it is system problem not suneido. I try old and recent versions and now both have good timings, only some differens in strategies. Now partly using of the regular expression is also fast. Not sure what was wrong...
 
 
#1715
Re:case insensetive search and index use 8 Months, 2 Weeks ago
Right, lowercase index works but no control which index used. Use second colun with lower case with index. Can i use index on the derive column? then can save memory...
 
 
#1721
Re:case insensetive search and index use 8 Months, 2 Weeks ago
Sorry, you can not index a derived rule column.
 
 
andrew
 
#1734
Re:case insensetive search and index use 8 Months, 2 Weeks ago
I set up test scenarios, but have issue with timing. I try first:
Timer()
{
for (i = 0; i < 10; ++i)
{
QueryApply(query)
{|x|}
}
}
This works. When try 10x is +/- 10x longer as 1 time.

now i try:
Timer()
{
for (i = 0; i < 10; ++i)
{
t = Transaction(read:)
t.Query(query)
t.Complete()
}
}
This is much faster, now 10x is almost same as 1x or 100x i need to do 1000000x to see difference. What can make diference? query cache? maybe version 1 do dummy action on result, version 2 only get query data? Not sure what to use for usefull test. Both?
 
 
#1737
Re:case insensetive search and index use 8 Months, 2 Weeks ago
The second example is not actually reading the data.

Try:

q = t.Query(query)
while false isnt x = q.Next()
{}
 
 
andrew