Thursday, January 10, 2013

How to implement really small and fast ORM with PHP (Part 7: IDE)

Fork me on GitHub

Queries are gaining more and more complexity, data is getting bigger and bigger. Most optimizations in database technology are done in the database server. This is an approach to optimize queries on the client side.

With this ORM, queries ...
  • don't select more data than needed
  • contain less joins when data is expected to be consistent
  • can be written manually in pure SQL
  • are not written in a new query language

Performance of integer casting

Lessons learned:
  • $var+0 is as fast as (int)$var
  • intval($var) is 2 times slower than (int)$var
  • is_numeric($var) is 2 times slower than (int)$var
  • is_numeric($var) is 7 percent faster than intval($var)
  • settype($var) is 3 times slower than (int)$var

Wednesday, January 2, 2013

Make PDFs searchable

I searched for a good way to make scanned documents searchable. Most newer scanning software already has some OCR built-in, but what about all the old documents? Using pdfsandwich and Tesseract, we recover the text from each page of a PDF and put it behind each page as an invisible layer. That way, we can search the PDF with a normal PDF reader or upload it to Google translate to get a translated version. To get a text-only version, pdftotext can be used.

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)