NoSQL company Basho loses CEO and CTO

Originally posted on Gigaom:

Basho, a NoSQL startup whose Riak database competes against the likes of Cassandra in scale-out environments, has lost its CEO Greg Collins, CTO Justin Sheehy and Chief Architect Andy Gross. In an interview with the Register , Sheehy said the departures aren’t as bad as they look and that the company is in good hands. Perhaps, although whoever replaces Collins will be the company’s fourth CEO since it was founded in 2007, and neither of the company’s co-founders remain. Basho has raised more than $31 million in venture capital, with its last funding round of $11.1 million coming in July 2012 .

View original

Posted in database

Connecting to SQL Server with R using RJDBC

Download the Microsoft from here

Save the files to a convenient location: I chose C:\jdbc\sqljdbc_4.0\

Many posts show the class name as “com.microsoft.jdbc.sqlserver.SQLServerDriver” this is incorrect.

com.microsoft.jdbc.sqlserver.SQLServerDriver # incorrect class name
com.microsoft.sqlserver.jdbc.SQLServerDriver # correct class name

My Machine Setup:

  • Windows 7  Enterprise – 64 Bit
  • R Studio Version 0.97.551
  • R version 3.0.1 (2013-05-16), platform x86_64-w64-mingw32
  • Microsoft SQL Server 2008, 2012 Installed

If you use a tool like 7-zip to explode the jar file you will notice the class files are located at:

“C:\jdbc\sqljdbc_4.0\enu\sqljdbc4\com\microsoft\sqlserver\jdbc\SQLServerDriver.class”


# reference document on RJDBC
# http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf
# install.packages("RJDBC",dep=TRUE)
library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver" , "C:/jdbc/sqljdbc_4.0/enu/sqljdbc4.jar" ,identifier.quote="`")
conn <- dbConnect(drv, "jdbc:sqlserver://SERVERNAME:55158;databaseName=master", "sa", "password")
d <- dbGetQuery(conn, "select * from sys.databases where database_id <= 4 ")
summary(d)

You can download this script from here

I also tested using the Microsoft Driver and connecting to the same SQL Server using Ubuntu.

I have tested the connecting to sql server using R from Windows, Ubuntu and OS X. Below are links to the gists which contain the code.

Windows

Ubuntu

Mac OS X 

For the code used in that example look at my gist

Posted in data science, jdbc, R

High Quality Grepping – Highlighting matches in grep

Add the option –color to grep to make your matches stand out.

e.g.

cat customers.txt | grep -i -e “john” –color

Posted in database

Understanding I/O Performance for SQL Server on Amazon EC2 / AWS.

Having run SQL Server on EC2, be advised EC2 is a very stable platform, however you are required to pay for the performance you need. I have done several repeated tests, and my experience is that Amazon gives you exactly what you pay for.

If you are having a performance problem, you need to look at your entire infrastructure and ensure you have not over provisioned one aspect of the system, while under-provisioning another.

The AWS blog tells the Dedicated network throughput for each instance size. I have chosen to expand on this to show you the best choices you have for EBS volume configuration for these instance types. If your only objective is maxing out I/O performance the table below should be sufficient. If however you need large amounts of space, you may choose to increase your drive count, and reduce your Provisioned IOPs per volume.

 

There are several key things, I have learned and wanted to state for you:

1) Your instance size determines Guaranteed Network Throughput

2) Everything that goes off your server traverses that single NIC ( Disk I/O, Network I/O, everything)

3) 1000 IOPS is the approximately 16 MB per second on AWS, the block size is 16K.

4)  If you are running SQL with Terabytes of Data you most likely need an instance size is rated as High for Network Performance and potentially one of the newer instances which promise 10Gigabit throughput.

5) Please do configure your server to be EBS optimized.

6) EBS volumes are currently limited to 1TB in size, so to create larger disks use Software RAID in your Operating System. (p.s. Amazon will deliver the IOPS of all volumes in the RAID Array — just remember you cannot exceed the Dedicated Throughput of your NIC).

7) If your backups are running terribly long and you can’t seem to figure out why… you most likely have an I/O bottleneck related to your server configuration. Amazon is NOT the problem.

Below is the table, hope it works for you..

 

Any questions / comments that may assist in improving this post are appreciated.

 

 

Instance Type Dedicated Throughput Dedicated Through Put (MB/second)  Max IOPs Through Put for EBS Purchasing   Most Optimized Purchase Size   Min Drive Size (GB) @ MAX IOPS   Min Drive Count to Get Maximum IOPS 
m1.large 500 Mbps 62.5                3,906.25                    4,000                             400                   1
 m1.xlarge 1000 Mbps 125                7,812.50                    8,000                             800                   2
 m2.2xlarge (new) 500 Mbps 62.5                3,906.25                    4,000                             400                   1
 m2.4xlarge 1000 Mbps 125                7,812.50                    8,000                             800                   2
 m3.xlarge (new) 500 Mbps 62.5                3,906.25                    4,000                             400                   1
 m3.2xlarge (new) 1000 Mbps 125                7,812.50                    8,000                             800                   2
 c1.xlarge (new) 1000 Mbps 125                7,812.50                    8,000                             800                   2
Instance Type Dedicated Throughput Dedicated Through Put (MB/second)  Max IOPs Through Put for EBS Purchasing   Most Optimized Purchase Size   Min Drive Size (GB) @ MAX IOPS   Min Drive Count to Get Maximum IOPS 
m1.large 500 Mbps 62.5                3,906.25                    4,000                             400                   1
 m1.xlarge 1000 Mbps 125                7,812.50                    8,000                             800                   2
 m2.2xlarge (new) 500 Mbps 62.5                3,906.25                    4,000                             400                   1
 m2.4xlarge 1000 Mbps 125                7,812.50                    8,000                             800                   2
 m3.xlarge (new) 500 Mbps 62.5                3,906.25                    4,000                             400                   1
 m3.2xlarge (new) 1000 Mbps 125                7,812.50                    8,000                             800                   2
 c1.xlarge (new) 1000 Mbps 125                7,812.50                    8,000                             800                   2
Tagged with: , , , , , , , , , , , , , ,
Posted in database

Blackberry Got Something Right Years Ago

Having switched from blackberry to iPhone, the one thing I can say I miss dearly is the concept of being docked.

Blackberry clearly understood that when you were at your desk, you did not need to check your phone for emails, since you were there at your computer (docked) and probably saw the message already on your desktop.

Tagged with:
Posted in database

SSRS Identifying Reports to Tune – Top 10 Offenders

To discover reports which are candidates for tuning I use the following query to identify the reports which take the longest time to perform their data retrieval tasks.

The query I use is

DECLARE @ReportPath VARCHAR(MAX)
SET @ReportPath = '%'


SELECT TOP 10
c.[Path],c.[Name]
,[ReportID]
,COUNT(1) as UseCount
,SUM([TimeDataRetrieval]) as [TimeDataRetrieval]
,SUM([TimeProcessing]) as [TimeProcessing]
,SUM([TimeRendering]) as [TimeRendering]
,(SUM([TimeDataRetrieval])/(SUM([TimeDataRetrieval]+[TimeProcessing]+[TimeRendering])*1.00)) *100 as TimeDataRetrieval_TotalTime
,(SUM([TimeProcessing])   /(SUM([TimeDataRetrieval]+[TimeProcessing]+[TimeRendering])*1.00)) *100 as TimeProcessing_TotalTime
,(SUM([TimeRendering])    /(SUM([TimeDataRetrieval]+[TimeProcessing]+[TimeRendering])*1.00)) *100 as TimeRendering_TotalTime
,SUM([TimeDataRetrieval]+[TimeProcessing]+[TimeRendering]) as TotalTime
,SUM([ByteCount]) as [ByteCount]
,SUM([RowCount]) as [RowCount]
,MIN([TimeStart]) as FirstUsed
,MAX([TimeStart]) as LastUsed
,AVG([TimeDataRetrieval]*1.00)  as [Avg_TimeDataRetrieval]
,AVG([TimeProcessing]*1.00)        as [Avg_TimeProcessing]
,AVG([TimeRendering]*1.00)        as [Avg_TimeRendering]
,AVG([RowCount])                as [Avg_RowCount]
,AVG([TimeDataRetrieval]/(case when [RowCount] = 0 then 0.01 else [RowCount] end *1.00))  as [Avg_TimeDataRetrieval_Per_Row]  -- rudimentary hack to prevent divide by 0 and also geta number even when no data returned.
FROM
[dbo].[catalog] c WITH(NOLOCK)
LEFT OUTER JOIN [dbo].[ExecutionLog] el  WITH(NOLOCK) ON (c.ItemID = el.ReportID)
WHERE
1=1
-- AND c.[Path] LIKE @ReportPath
GROUP BY c.[Path],c.[Name],[ReportID]
HAVING COUNT(1) > 5
ORDER BY
SUM([TimeDataRetrieval]+[TimeProcessing]+[TimeRendering]) DESC,AVG([TimeDataRetrieval]) DESC

This gives me a list of the reports on the server in the order of longest time taken for data retrieval. Once I have this list, I then look at the reports and see what I can do to make the retrieval of each data set faster / more efficient by reviewing execution plans, and re-writing the queries / stored procedures.

Tagged with: , ,
Posted in Uncategorized

SSIS 2005 Description: Connect to SSIS Service on machine failed: Library not registered. Could not load package because of error 0xC00160AA.

After Applying Service Pack 4 to SQL Server 2005, I was unable to connect to SSIS from Management Studio.
I was also unable to run packages from the command line using DTEXEC.

Upon further investigation it see

c:\windows\system32\regsvr32 “C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DTS.dll”

c:\windows\system32\regsvr32 “C:\Program Files\Microsoft SQL Server\90\DTS\Binn\MsDtsSrvrUtil.dll”

  • The SQL Server 2005 Integration Services hotfix package is installed. However, the SQL Server 2005 Tools hotfix package is not installed.
  • You install the SQL Server 2005 Tools hotfix package before you install the SQL Server 2005 Integration Services hotfix package.
  • You are running two instances of SQL Server 2005 on the computer. Additionally, the versions of both instances of SQL Server are earlier versions than Microsoft SQL Server 2005 Service Pack 2 (SP2).
  • The computer is running an instance of SQL Server 2005 with SP2. Additionally, you install a post-SP2 hotfix on this instance. Then you install a second instance of SQL Server 2005. On the second instance, you install SQL Server 2005 SP2.

The issue is described on Microsoft site in the following KB http://support.microsoft.com/kb/919224

Tagged with: ,
Posted in Uncategorized
Follow

Get every new post delivered to your Inbox.