How do I apply URL normalization rules in PHP?

The Pear Net_URL2 library looks like it’ll do at least part of what you want. It’ll remove dot segments, fix capitalization and get rid of the default port: include(“Net/URL2.php”); $url = new Net_URL2(‘HTTP://example.com:80/a/../b/c’); print $url->getNormalizedURL(); emits: http://example.com/b/c I doubt there’s a general purpose mechanism for adding trailing slashes to directories because you need a way … Read more

MongoDB normalization, foreign key and joining

MongoDB doesn’t support server side foreign key relationships, normalization is also discouraged. You should embed your child object within parent objects if possible, this will increase performance and make foreign keys totally unnecessary. That said it is not always possible, so there is a special construct called DBRef which allows to reference objects in a … Read more

how do I normalise a solr/lucene score?

To quote http://wiki.apache.org/lucene-java/ScoresAsPercentages: People frequently want to compute a “Percentage” from Lucene scores to determine what is a “100% perfect” match vs a “50%” match. This is also somethings called a “normalized score” Don’t do this. Seriously. Stop trying to think about your problem this way, it’s not going to end well. That page does … Read more

How does unicodedata.normalize(form, unistr) work?

I find the documentation pretty clear, but here are a few code examples: from unicodedata import normalize print ‘%r’ % normalize(‘NFD’, u’\u00C7′) # decompose: convert Ç to “C + ̧” print ‘%r’ % normalize(‘NFC’, u’C\u0327′) # compose: convert “C + ̧” to Ç Both ‘D’ (=decompose) forms convert a single combined character (like ä) into … Read more

Minimum no of tables that exists after decomposing relation R into 1NF?

If all the candidate keys of a relation contain multivalued attributes: Introduce a surrogate attribute for at least one multivalued attribute. For each attribute you deem “composite” (having heterogeneous components, like a tuple): For each attribute component that can be missing: Add a relation with attributes of some multivalue-free candidate key and an attribute for … Read more

How can I normalize a URL in python

Have a look at this module: werkzeug.utils. (now in werkzeug.urls) The function you are looking for is called “url_fix” and works like this: >>> from werkzeug.urls import url_fix >>> url_fix(u’http://de.wikipedia.org/wiki/Elf (Begriffsklärung)’) ‘http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29′ It’s implemented in Werkzeug as follows: import urllib import urlparse def url_fix(s, charset=”utf-8″): “””Sometimes you get an URL by a user that just … Read more

File.listFiles() mangles unicode names with JDK 6 (Unicode Normalization issues)

Using Unicode, there is more than one valid way to represent the same letter. The characters you’re using in your Tricky Name are a “latin small letter i with circumflex” and a “latin small letter a with ring above”. You say “Note the %CC versus %C3 character representations”, but looking closer what you see are … Read more