What's new

Need Help With Python And Panda

baqai

SENIOR MEMBER
Joined
Sep 28, 2006
Messages
2,830
Reaction score
4
Country
Pakistan
Location
Pakistan
Hey guys and girls

anyone who knows Python? for an ongoing project i am forced to take a crash course in it, i have programming know how and used to be quiet good in c++ back in 2004, the task in hand is that i need to parse web data / excel data / csv data through python and do calculations on customer records based on conditional statements, so far i have been able to load excel and print few values using panda using YouTube tuotirals but i am not getting grasp especially with deadlines

what i want to do

read one row, check age of the person, depending on age define a new column age_points and give a value, the same goes for gender, education, work experience, locality etc

so each customer will have a grand_total based on all the above values and than condition statement will be applied to see those who are above 100 points will be saved in another sheet and emailed to CSR as urgent cases while the rest will discarded

am i making sense here?
 
.
IFF i understand correctly, you need to do calculations on single record and then save it to another CSV file. For that read a row in a while function and do calculation on fields on that particular record and then save the result in another csv file all in one while function.
its a simple task.
 
.
Hey guys and girls

anyone who knows Python? for an ongoing project i am forced to take a crash course in it, i have programming know how and used to be quiet good in c++ back in 2004, the task in hand is that i need to parse web data / excel data / csv data through python and do calculations on customer records based on conditional statements, so far i have been able to load excel and print few values using panda using YouTube tuotirals but i am not getting grasp especially with deadlines

what i want to do

read one row, check age of the person, depending on age define a new column age_points and give a value, the same goes for gender, education, work experience, locality etc

so each customer will have a grand_total based on all the above values and than condition statement will be applied to see those who are above 100 points will be saved in another sheet and emailed to CSR as urgent cases while the rest will discarded

am i making sense here?
You will read a row from an exiting file, right? If so, then simply Google file read write functions.
I don't have much experience in python, but after C++, you have python as a most supported language on internet.
@graphican
 
.
# Reading an excel file using Python
1. import xlrd
2. from xlwt import Workbook
# Give the location of the file
2. loc = ("path of file")

# To open Workbook
3. wb = xlrd.open_workbook(loc)
4. sheet = wb.sheet_by_index(0)


# add_sheet is used to create sheet.
5. sheet1 = wb.add_sheet('Sheet 1')

As you said, you have done till 3 and 4 . All you need to do is use while loop to loop through each sheet.row and then take each cell value u desire.

for example 1st column of each row is name of person and 2nd is age, then in each while loop code or its inner statement you need to do following:

Note: lets say its 2nd row for now in loop since 1st row is headings
-save each value like this:
  1. name = sheet.cell_value(RowIndex, ColIndex) e.g. sheet.cell_value(1, 0)
  2. age = sheet.cell_value(RowIndex, ColIndex) e.g. sheet.cell_value(1, 1)
So, in while loop each record is then saved like this
sheet1.write(1, 0, name )
sheet1.write(1, 1, age )
wb.save('xlwt example.xls')
 
.
# Reading an excel file using Python
1. import xlrd
2. from xlwt import Workbook
# Give the location of the file
2. loc = ("path of file")

# To open Workbook
3. wb = xlrd.open_workbook(loc)
4. sheet = wb.sheet_by_index(0)


# add_sheet is used to create sheet.
5. sheet1 = wb.add_sheet('Sheet 1')

As you said, you have done till 3 and 4 . All you need to do is use while loop to loop through each sheet.row and then take each cell value u desire.

for example 1st column of each row is name of person and 2nd is age, then in each while loop code or its inner statement you need to do following:

Note: lets say its 2nd row for now in loop since 1st row is headings
-save each value like this:
  1. name = sheet.cell_value(RowIndex, ColIndex) e.g. sheet.cell_value(1, 0)
  2. age = sheet.cell_value(RowIndex, ColIndex) e.g. sheet.cell_value(1, 1)
So, in while loop each record is then saved like this
sheet1.write(1, 0, name )
sheet1.write(1, 1, age )
wb.save('xlwt example.xls')

right now i have already come up with working logic system which takes input from command line and does the calculations now the next challenge was to have that data from a xlsx or csv file

i am using panda as i think (atleast found) it to be easier than xldr, here is my code to open and read files

Code:
import pandas as pd
df = pd.read_excel ('test.xlsx')
print (df)

as simple as that, now data extract for row 0 is as follows

Code:
0  Test 43 4 Years

now i want to put a check on the age which is column 2

Code:
if age >= 18 and age =< 35:
          age_points = 5
# Put another Data Point As Age Point In The Sheet With Value          
          df['Age Points'] = 5

if age > 35 and age =< 45:
          age_points = 10
# Put another Data Point As Age Point In The Sheet With Value          
          df['Age Points'] = 10

if  age >45:
           age_points = 0
# Put another Data Point As Age Point In The Sheet With Value          
          df['Age Points'] = 0

this is just one example of a check and data manipulation, i need to do such calculations on different columns and than create points based on the system

once i have done with this than i will look into putting more detailed data manipulation, thanks a lot for the help, you guys rock
 
.
You will read a row from an exiting file, right? If so, then simply Google file read write functions.
I don't have much experience in python, but after C++, you have python as a most supported language on internet.
@graphican

Hi @fitpOsitive, I am in the software industry but I am not a developer so cannot offer much help here.
 
.
For some reason , this is what i was made to believe this thread is about ..

1602530919187.png

and
1602530939465.png
 
. .
right now i have already come up with working logic system which takes input from command line and does the calculations now the next challenge was to have that data from a xlsx or csv file

i am using panda as i think (atleast found) it to be easier than xldr, here is my code to open and read files

Code:
import pandas as pd
df = pd.read_excel ('test.xlsx')
print (df)

as simple as that, now data extract for row 0 is as follows

Code:
0  Test 43 4 Years

now i want to put a check on the age which is column 2

Code:
if age >= 18 and age =< 35:
          age_points = 5
# Put another Data Point As Age Point In The Sheet With Value         
          df['Age Points'] = 5

if age > 35 and age =< 45:
          age_points = 10
# Put another Data Point As Age Point In The Sheet With Value         
          df['Age Points'] = 10

if  age >45:
           age_points = 0
# Put another Data Point As Age Point In The Sheet With Value         
          df['Age Points'] = 0

this is just one example of a check and data manipulation, i need to do such calculations on different columns and than create points based on the system

once i have done with this than i will look into putting more detailed data manipulation, thanks a lot for the help, you guys rock
Is the age data not in the same column ? I don't understand. You have finished your work . What's your problem ?
 
.
Code:
#sets dataframe of age from string to integer 
df['Age'] = df['Age'].astype(int)

makes a variable to get all the values from column 'Age' in it
given_age = df['Age']

if (given_age[0] < 40):
        age_diff = given_age[0] / 2 
        age_points = 7 + age_diff
        df['Age Points'] = age_points

i dont know why cant i do

Code:
if df['Age'] < 18:

anyways so far so good, i am able to get values and do comparisons, now i have to figure out

1. how to run this in a loop so that all the columns are processed, in c++ it would have been something like (for a=0;a<=columns;a++) i have to figure this out in python

2. df['Age_Points'] is not writing different age points based on calculations and giving only value for the first condition, need to figure that out

thanks a lot again for the input
 
.
The advantage of using pandas data frame is that you can leverage some of the database like query functions to select data based on conditions. Doing a row by row operation is not the right approach.

Try df.loc[df[‘age’] >= 18 and df[‘age’] <= 38, ‘age point’]] = 5

Ditto for other conditions.
 
Last edited:
.
Code:
#sets dataframe of age from string to integer
df['Age'] = df['Age'].astype(int)

makes a variable to get all the values from column 'Age' in it
given_age = df['Age']

if (given_age[0] < 40):
        age_diff = given_age[0] / 2
        age_points = 7 + age_diff
        df['Age Points'] = age_points

i dont know why cant i do

Code:
if df['Age'] < 18:

anyways so far so good, i am able to get values and do comparisons, now i have to figure out

1. how to run this in a loop so that all the columns are processed, in c++ it would have been something like (for a=0;a<=columns;a++) i have to figure this out in python

2. df['Age_Points'] is not writing different age points based on calculations and giving only value for the first condition, need to figure that out

thanks a lot again for the input

Cpp
(for a=0;a<=columns;a++)
{array[a]}

python
Python:
for elem in array:
    elem

Do you mean iterate over an array?
if array[][]

Python:
for line in array:
    for elem in line:
        line
if array[][2]
Python:
for line in array:
    line[2]
 
.
Managed to do it and MashaAllah it's running :) now next step for me would be to open excel files as per user input either via OS or online, baby steps i am taking :)

Code:
df['Age'] = df['Age'].astype(int)
#Since the 'Age' column was coming in as string it was not possible to do any compraison or mathematical calculations on it 

    given_age = df['Age']
#put value of 'Age' in new variable given_age, again this is mainly because I was not able to find a way to directly run any calculations directly on 'Age' 

    ncols = len(df)
#define variable ncols which gets the total amount of rows in the data field 
    count = 0
#define variable count as 0 as starting point
    while (count < ncols):  
                print (count)
#print the row number
                count = count + 1
#increment the count for the next loop
 
.
Hey guys and girls

anyone who knows Python? for an ongoing project i am forced to take a crash course in it, i have programming know how and used to be quiet good in c++ back in 2004, the task in hand is that i need to parse web data / excel data / csv data through python and do calculations on customer records based on conditional statements, so far i have been able to load excel and print few values using panda using YouTube tuotirals but i am not getting grasp especially with deadlines

what i want to do

read one row, check age of the person, depending on age define a new column age_points and give a value, the same goes for gender, education, work experience, locality etc

so each customer will have a grand_total based on all the above values and than condition statement will be applied to see those who are above 100 points will be saved in another sheet and emailed to CSR as urgent cases while the rest will discarded

am i making sense here?

Not necessarily complex to pick up from what I know (not expert but I've been dabbling i Python).

a) Checkout matplotpib.pyplot (data visualzation)
b) Pandas
c) numpy for array operations
 
.

Latest posts

Pakistan Defence Latest Posts

Back
Top Bottom