[R-pkgs] package bit64 with new functionality

From: Jens Oehlschlägel <Jens.Oehlschlaegel_at_truecluster.com>
Date: Thu, 08 Nov 2012 21:03:20 +0100

Dear R community,

The new version of package 'bit64' - which extends R with fast 64-bit integers - now has fast (single-threaded) implementations of the most important univariate algorithmic operations (those based on hashing and sorting). Package 'bit64' now has methods for 'match', '%in%',
'duplicated', 'unique', 'table', 'sort', 'order', 'rank', 'quantile',
'median' and 'summary'. Regarding data management it has novel generics
'unipos' (positions of the unique values), 'tiepos' (positions of ties),
'keypos' (positions of values in a sorted unique table) and derived
methods 'as.factor' and 'as.ordered'. This 64-bit functionality is implemented carefully to be not slower than the respective 32-bit operations in Base R and also to avoid excessive execution times observed with 'order', 'rank' and 'table' (speedup factors 20/16/200 respective). This increases the dataset size with wich we can work truly interactive. The speed is achieved by simple heuristic optimizers: the mentioned high-level functions choose the best from multiple low-level algorithms and further take advantage of a novel optional caching method. In an example R session using a couple of these operations the 64-bit integers performed 22x faster than base 32-bit integers, hash-caching improved this to 24x amortized, sortorder-caching was most efficient with 38x (caching both, hashing and sorting is not worth it with 32x at duplicated RAM consumption).

Since the package covers the most important functions for (univariate) data exploration and data management, I think it is now appropriate to claim that R has sound 64-bit integer support, for example for working with keys or counts imported from large databases. For details concerning approach, implementation and roadmap please check the ANNOUNCEMENT-0.9-Details.txt file and the package help files.

Kind regards

Jens Oehlschlägel
Munich, 8.11.2012

R-packages mailing list
https://stat.ethz.ch/mailman/listinfo/r-packages Received on Sun 11 Nov 2012 - 00:01:32 EST

This archive was generated by hypermail 2.2.0 : Sun 11 Nov 2012 - 00:10:02 EST