Friday, August 16, 2013

Pandas: presence of a NaN/None in a DataFrame forces column to float

import pandas as pd
a = [[1,2],[3,4]]
df = pd.DataFrame(a)

df-> 
   0  1
0  1  2
1  3  4

df.values ->
array([[1, 2],
       [3, 4]])

df.ix[1].values ->
array([3, 4])

a = [[1,None],[3,4]]
df = pd.DataFrame(a)

df->
   0   1
0  1 NaN
1  3   4

df.values ->
array([[  1.,  nan],
       [  3.,   4.]])

df[0].values ->
array([1, 3])

df[1].values ->
array([ nan,   4.])

df.ix[1].values ->
array([ 3.,  4.])

df[0][1] -> 3
df[1][1] -> 4.0
This threw me because I have a data structure that is all ints, but I have a few Nones on one column and that column was suddenly returned as floats.
As you can see it's just the relevant column that is forced to float.

No comments:

Post a Comment